Tag

Training

422 articles archived under #training · RSS

arXiv — NLP / Computation & Language research 1mo ago

Toward LLMs Beyond English-Centric Development

arXiv:2605.15613v1 Announce Type: new Abstract: Through an analysis of sequences generated by open-weight large language models (LLMs), we demonstrate that LLMs are heavily biased toward English. While continual pre-training is commonly used to adapt LLMs to a target language,…

19
arXiv — NLP / Computation & Language research 1mo ago

Reference-Free Reinforcement Learning Fine-Tuning for MT: A Seq2Seq Perspective

arXiv:2605.15976v1 Announce Type: new Abstract: Production machine translation relies overwhelmingly on encoder-decoder Seq2Seq models, yet reinforcement learning approaches to MT fine-tuning have largely targeted decoder-only LLMs at $\geq$7B parameters, with limited systematic…

17
arXiv — NLP / Computation & Language research 1mo ago

From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery

arXiv:2605.15412v1 Announce Type: cross Abstract: Modern quantitative trading increasingly relies on systematic models to extract predictive signals from large-scale financial data, where alpha factor discovery plays a central role in transforming market observations into…

13
arXiv — NLP / Computation & Language research 1mo ago

Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training

arXiv:2506.01732v3 Announce Type: replace Abstract: Large Language Models (LLMs) are pre-trained on large amounts of data from different sources and domains. Such datasets often contain trillions of tokens, including large portions of copyrighted or proprietary content, which…

11
r/LocalLLaMA community 1mo ago

Gemma-4-Gembrain-31B-it-uncensored-heretic Is Out Now, a Merge of Multiple Gemma 4 31B it Finetunes Designed to Boost Logical and Lateral Thinking for Improved Adherence, Increased Swipe Variety and Enhanced Creative Prose, With KLD of 0.0186 and 13/100 Refusals!

Provided in both Safetensors and GGUFs. Safetensors: llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic GGUFs: llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic-GGUF:…

19
r/LocalLLaMA community 1mo ago

G4-Meromero-31B-Uncensored-Heretic Is Out Now, a Finetune of Gemma 4 31B It Designed for Creative Tasks, With Kld of 0.0100 and 15/100 Refusals!

Provided in both Safetensors and GGUFs. Safetensors: llmfan46/G4-MeroMero-31B-uncensored-heretic: https://huggingface.co/llmfan46/G4-MeroMero-31B-uncensored-heretic GGUFs: llmfan46/G4-MeroMero-31B-uncensored-heretic-GGUF:…

29
r/LocalLLaMA community 1mo ago

gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic is Out Now, A Writing Finetune that Aims to Improve Gemma 4 31B it Writing Quality with More Natural English and Better Prose, Good for Creative Writings, Translations and RPs!

Provided in both Safetensors and GGUFs. llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic…

38
r/LocalLLaMA community 1mo ago

LLM Phone Home: Reliable Apps that can deliver inference from local backend

Hello all, I’m wondering what suggestions there are for an ios app that can serve an openai compatible endpoint. I am using 3sparks which works GREAT for that specific use, BUT, there is no mcp, no web search, etc. I want to show people that a local model with web search on your…

25
r/LocalLLaMA community 1mo ago

Best dataset for model pre-training

Well, alright, i want ~100M parameters . on a NVIDIA L4 (24GB VRAM) . any good dataset (and quanity of tokens ) to pretrain ?   submitted by   /u/Ok-Type-7663 [link]   [comments]

15
Hugging Face Daily Papers research 1mo ago

Long Context Pre-Training with Lighthouse Attention

Abstract Lighthouse Attention enables efficient training of causal transformers at long sequences by using hierarchical selection-based attention that reduces computational complexity while maintaining model performance. AI-generated summary Training causal transformers at…

33
Hugging Face Daily Papers research 1mo ago

Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance

Abstract FEST is a few-shot demonstration-guided reinforcement learning algorithm that achieves strong performance with minimal supervised fine-tuning data by combining supervised signals, on-policy learning, and weighted training to prevent overfitting. AI-generated summary…

22
r/LocalLLaMA community 1mo ago

[FOUNDING] SupraLabs - real open-source AI models for you!

https://preview.redd.it/k6lub2ypva1h1.png?width=1500&format=png&auto=webp&s=cd44452c86b5216fec17113a72f43bbf169edafb Hey r/LocalLLaMA ! We founded SupraLabs , and it's huge! What we do? We train, finetune and explore small models with good results to revolutionize small AI…

30
Hugging Face Daily Papers research 1mo ago

Dynamic Latent Routing

Abstract Temporal composition of sub-policies in MDPs with time-varying rewards enables optimal policy recovery through generalized Dijkstra search, which inspires a dynamic latent routing method for language model fine-tuning that outperforms traditional supervised approaches.…

34
arXiv — Machine Learning research 1mo ago

Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning

arXiv:2605.13936v1 Announce Type: new Abstract: The recent success of large language models (LLMs) has been largely driven by vast public datasets. However, the next frontier for LLM development lies beyond public data. Much of the world's most valuable information is private,…

33
arXiv — Machine Learning research 1mo ago

ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization

arXiv:2605.14497v1 Announce Type: new Abstract: Offline-to-online reinforcement learning harnesses the stability of offline pretraining and the flexibility of online fine-tuning. A key challenge lies in the non-stationary distribution shift between offline datasets and the…

34
arXiv — NLP / Computation & Language research 1mo ago

PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts

arXiv:2605.14055v1 Announce Type: new Abstract: Parameter-Efficient Fine-Tuning (PEFT) is widely used for adapting Large Language Models (LLMs) for various tasks. Recently, there has been an increasing demand for fine-tuning a single LLM for multiple tasks because it requires…

4
arXiv — NLP / Computation & Language research 1mo ago

To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model

arXiv:2605.14291v1 Announce Type: cross Abstract: The rapid advancement of Large Vision-Language Models (LVLMs) is increasingly accompanied by unauthorized scraping and training on multimodal web data, posing severe copyright and privacy risks to data owners. Existing…

8
Hugging Face Daily Papers research 1mo ago

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

Abstract Current multimodal models struggle to match human expert aesthetic judgment in comparative image selection tasks, as demonstrated by the Visual Aesthetic Benchmark which reveals significant performance gaps and shows that fine-tuning on expert examples can improve…

14
Vercel — AI dev-tools 1mo ago

Trace any Vercel request from the CLI

You can now generate Session Traces through the Vercel CLI. Use the new vercel curl --trace command to generate an OpenTelemetry trace to the specified endpoint from the terminal. Use the new vercel traces get command to fetch the generated trace by request ID. Available on all…

38
r/LocalLLaMA community 1mo ago

Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO)

Hello I have been working on creating a LLM from ground up. It is based on deepseek architecture with heavily VRAM footprint reduced optimized(GUM+muon) Currently this is the json schema I am using which should suffice as to what currently is being pretrained. Training on a…

7
r/LocalLLaMA community 1mo ago

Dropping learning rate fixed my Qlora fine-tune more than anything else i tried

Been fine-tuning llama 3.1 8b with Qlora for a classification task using about 8k samples. I was getting bad eval results for a while and kept thinking something was wrong with my data. Tried cleaning the dataset, tried different prompt templates, messed with rank and alpha.…

35
Hugging Face Daily Papers research 1mo ago

Learning Agentic Policy from Action Guidance

Abstract Agentic reinforcement learning for large language models leverages action data from human interactions as reference guidance to improve exploration and reduce dependence on costly supervised fine-tuning. AI-generated summary Agentic reinforcement learning (RL) for Large…

21
Hugging Face Daily Papers research 1mo ago

Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition

Abstract Research identifies studio-bias in multilingual ASR fine-tuning and proposes R-MFT method to improve spontaneous speech performance while maintaining efficiency. AI-generated summary Fine-tuning multilingual ASR models like Whisper for low-resource languages often…

20
arXiv — Machine Learning research 1mo ago

ODRPO: Ordinal Decompositions of Discrete Rewards for Robust Policy Optimization

arXiv:2605.12667v1 Announce Type: new Abstract: The alignment of Large Language Models (LLMs) utilizes Reinforcement Learning from AI Feedback (RLAIF) for non-verifiable domains such as long-form question answering and open-ended instruction following. These domains often rely…

37
arXiv — Machine Learning research 1mo ago

Early Data Exposure Improves Robustness to Subsequent Fine-Tuning

arXiv:2605.12705v1 Announce Type: new Abstract: How can we train models whose post-trained capabilities survive subsequent fine-tuning? Rather than focusing on downstream interventions to mitigate forgetting of upstream capabilities, we study how upstream training choices - that…

30
arXiv — Machine Learning research 1mo ago

Low-Rank Adapters Initialization via Gradient Surgery for Continual Learning

arXiv:2605.12752v1 Announce Type: new Abstract: LoRA is widely adopted for continual fine-tuning of Large Language Models due to its parameter efficiency, modularity across tasks, and compatibility with replay strategies. However, LoRA-based continual learning remains vulnerable…

20
arXiv — Machine Learning research 1mo ago

Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer

arXiv:2605.12798v1 Announce Type: new Abstract: Fine-tuning LLMs on narrow harmful datasets can induce Emergent Misalignment (EM), where models exhibit misaligned behavior far beyond the fine-tuning distribution. We argue that emergent misalignment can be better understood as a…

19
arXiv — Machine Learning research 1mo ago

Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning

arXiv:2605.12906v1 Announce Type: new Abstract: Data selection during supervised fine-tuning (SFT) can critically change the behavior of large language models (LLMs). Although existing work has studied the effect of selecting data based on heuristics such as perplexity,…

24
arXiv — Machine Learning research 1mo ago

From Instance Selection to Fixed-Pool Data Recipe Search for Supervised Fine-Tuning

arXiv:2605.12944v1 Announce Type: new Abstract: Supervised fine-tuning (SFT) data selection is commonly formulated as instance ranking: score each example and retain a top-$k$ subset. However, effective SFT training subsets are often produced through ordered curation recipes,…

18
arXiv — Machine Learning research 1mo ago

Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy

arXiv:2605.12991v1 Announce Type: new Abstract: LLM-based multi-agent pipelines flip from correct to incorrect answers under simulated peer disagreement at rates we term yield, a vulnerability widely attributed to RLHF-induced sycophancy. We test this attribution across four…

21
arXiv — Machine Learning research 1mo ago

Continual Fine-Tuning of Large Language Models via Program Memory

arXiv:2605.13162v1 Announce Type: new Abstract: Parameter-Efficient Fine-Tuning (PEFT), particularly Low-Rank Adaptation (LoRA), has become a standard approach for adapting Large Language Models (LLMs) under limited compute. However, in continual settings where models are…

33
Hugging Face Daily Papers research 1mo ago

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

Abstract Long-context continued pre-training enhances vision-language models' ability to handle extended documents while maintaining performance across diverse contexts through strategic data mixture design. AI-generated summary Long-context modeling is becoming a core…

24
Hugging Face Daily Papers research 1mo ago

Revisiting DAgger in the Era of LLM-Agents

Abstract DAgger-style training for long-horizon language model agents combines supervised fine-tuning and reinforcement learning benefits by using teacher-student policy interpolation with on-policy interactions. AI-generated summary Long-horizon LM agents learn from multi-turn…

20
Hacker News — AI on Front Page community 1mo ago

Princeton mandates proctoring for in-person exams, upending 133 year precedent

Article URL: https://www.dailyprincetonian.com/article/2026/05/princeton-news-adpol-proctoring-in-person-examinations-passed-faculty-133-years-precedent Comments URL: https://news.ycombinator.com/item?id=48126848 Points: 226 # Comments: 310

9
r/LocalLLaMA community 1mo ago

I made a UI and server for using Anthropic's new Natural Language Autoencoders locally with llama.cpp

Anthropic's first open weight models, Natural Language Autoencoders , are just finetunes of popular open weight models. They do not modify architecture and modeling code so inference with llama.cpp is mostly trivial. I packaged every feature of NLAs (namely activation…

34
Hugging Face Daily Papers research 1mo ago

Efficient Pre-Training with Token Superposition

Abstract Token-Superposition Training (TST) improves pre-training efficiency by combining contiguous tokens into bags during a superposition phase with multi-hot cross-entropy objective, achieving faster training times without architectural changes. AI-generated summary…

30
Hugging Face Daily Papers research 1mo ago

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

Abstract ORBIT addresses catastrophic forgetting in large language model fine-tuning for generative retrieval by tracking parameter distances and employing weight averaging to maintain model performance. AI-generated summary Despite the rapid advancements in large language model…

7
TechCrunch — AI news-outlet 1mo ago

Adaption aims big with AutoScientist, an AI tool that helps models train themselves

Adaption's new AutoScientist tool is designed to let models adapt to specific capabilities quickly through an automated approach to conventional fine-tuning.

17
Hugging Face Daily Papers research 1mo ago

L2P: Unlocking Latent Potential for Pixel Generation

Abstract Latent-to-Pixel transfer paradigm efficiently leverages pre-trained latent diffusion models to create pixel-space models with minimal training overhead and high-resolution generation capabilities. AI-generated summary Pixel diffusion models have recently regained…

14
Hugging Face Daily Papers research 1mo ago

FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning

Abstract Training framework FocuSFT improves long-context language model performance by addressing attention allocation issues through bilevel optimization with parametric memory that focuses attention on semantically relevant content. AI-generated summary Large language models…

25
arXiv — Machine Learning research 1mo ago

Rotation-Preserving Supervised Fine-Tuning

arXiv:2605.10973v1 Announce Type: new Abstract: Supervised fine-tuning (SFT) improves in-domain performance but can degrade out-of-domain (OOD) generalization. Prior work suggests that this degradation is related to changes in dominant singular subspaces of pretrained weight…

22
arXiv — Machine Learning research 1mo ago

$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin

arXiv:2605.10981v1 Announce Type: new Abstract: Reference-free preference optimization has emerged as an efficient alternative to reinforcement learning from human feedback, with Simple Preference Optimization(SimPO) demonstrating strong performance by eliminating the explicit…

23
arXiv — Machine Learning research 1mo ago

Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training

arXiv:2605.11134v1 Announce Type: new Abstract: Preference learning methods such as Direct Preference Optimization (DPO) are known to induce reliance on spurious correlations, leading to sycophancy and length bias in today's language models and potentially severe goal…

13
arXiv — Machine Learning research 1mo ago

Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

arXiv:2605.11235v1 Announce Type: new Abstract: In LLM Reinforcement Fine-Tuning (RFT), curriculum learning drives both efficiency and performance. Yet, current methods externalize curriculum judgment via handcrafted heuristics or auxiliary models, risking misalignment with the…

18
arXiv — Machine Learning research 1mo ago

The tractability landscape of diffusion alignment: regularization, rewards, and computational primitives

arXiv:2605.11361v1 Announce Type: new Abstract: Inference-time reward alignment asks how to turn a pre-trained diffusion model with base law $p$ into a sampler that favors a reward $r$ while remaining close to $p$. Since there is no canonical distributional distance for this…

27
arXiv — Machine Learning research 1mo ago

Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies

arXiv:2605.11387v1 Announce Type: new Abstract: We address the problem of fine-tuning pre-trained generative policies with reinforcement learning (RL) while preserving the multimodality of their action distributions. Existing methods for RL fine-tuning of generative policies…

17
arXiv — Machine Learning research 1mo ago

Efficient Adjoint Matching for Fine-tuning Diffusion Models

arXiv:2605.11480v1 Announce Type: new Abstract: Reward fine-tuning has become a common approach for aligning pretrained diffusion and flow models with human preferences in text-to-image generation. Among reward-gradient-based methods, Adjoint Matching (AM) provides a principled…

30
arXiv — NLP / Computation & Language research 1mo ago

Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training

arXiv:2605.11416v1 Announce Type: new Abstract: Selective layer-wise updates are essential for low-cost continued pre-training of Large Language Models (LLMs), yet determining which layers to freeze or train remains an empirical black-box problem due to the lack of interpretable…

28
arXiv — NLP / Computation & Language research 1mo ago

A Study on Hidden Layer Distillation for Large Language Model Pre-Training

arXiv:2605.11513v1 Announce Type: new Abstract: Knowledge Distillation (KD) is a critical tool for training Large Language Models (LLMs), yet the majority of research focuses on approaches that rely solely on output logits, neglecting semantic information in the teacher's…

25
arXiv — NLP / Computation & Language research 1mo ago

When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models

arXiv:2605.11612v1 Announce Type: new Abstract: Backdoor vulnerabilities widely exist in the fine-tuning of large language models(LLMs). Most backdoor poisoning methods operate mainly at the token level and lack deeper semantic manipulation, which limits stealthiness. In…

25

Toward LLMs Beyond English-Centric Development

Reference-Free Reinforcement Learning Fine-Tuning for MT: A Seq2Seq Perspective

From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery

Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training

Gemma-4-Gembrain-31B-it-uncensored-heretic Is Out Now, a Merge of Multiple Gemma 4 31B it Finetunes Designed to Boost Logical and Lateral Thinking for Improved Adherence, Increased Swipe Variety and Enhanced Creative Prose, With KLD of 0.0186 and 13/100 Refusals!

G4-Meromero-31B-Uncensored-Heretic Is Out Now, a Finetune of Gemma 4 31B It Designed for Creative Tasks, With Kld of 0.0100 and 15/100 Refusals!

gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic is Out Now, A Writing Finetune that Aims to Improve Gemma 4 31B it Writing Quality with More Natural English and Better Prose, Good for Creative Writings, Translations and RPs!

LLM Phone Home: Reliable Apps that can deliver inference from local backend

Best dataset for model pre-training

Long Context Pre-Training with Lighthouse Attention

Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance

[FOUNDING] SupraLabs - real open-source AI models for you!

Dynamic Latent Routing

Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning

ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization

PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts

To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

Trace any Vercel request from the CLI

Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO)

Dropping learning rate fixed my Qlora fine-tune more than anything else i tried

Learning Agentic Policy from Action Guidance

Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition

ODRPO: Ordinal Decompositions of Discrete Rewards for Robust Policy Optimization

Early Data Exposure Improves Robustness to Subsequent Fine-Tuning

Low-Rank Adapters Initialization via Gradient Surgery for Continual Learning

Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer

Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning

From Instance Selection to Fixed-Pool Data Recipe Search for Supervised Fine-Tuning

Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy

Continual Fine-Tuning of Large Language Models via Program Memory

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

Revisiting DAgger in the Era of LLM-Agents

Princeton mandates proctoring for in-person exams, upending 133 year precedent

I made a UI and server for using Anthropic's new Natural Language Autoencoders locally with llama.cpp

Efficient Pre-Training with Token Superposition

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

Adaption aims big with AutoScientist, an AI tool that helps models train themselves

L2P: Unlocking Latent Potential for Pixel Generation

FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning

Rotation-Preserving Supervised Fine-Tuning

$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin

Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training

Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

The tractability landscape of diffusion alignment: regularization, rewards, and computational primitives

Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies

Efficient Adjoint Matching for Fine-tuning Diffusion Models

Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training

A Study on Hidden Layer Distillation for Large Language Model Pre-Training

When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models