Tag

Training

422 articles archived under #training · RSS

arXiv — NLP / Computation & Language research 6d ago

When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs

arXiv:2606.24119v1 Announce Type: cross Abstract: Discrete diffusion language model (DLM) fine-tuning inherits inexpensive diagnostics from denoising-time confidence monitors, but their PEFT-training meaning is untested. We test top-1 argmax concentration as a collapse warning.…

12
arXiv — NLP / Computation & Language research 6d ago

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

arXiv:2606.24133v1 Announce Type: cross Abstract: The composition of training data, governed by the diversity of sources and their mixing strategy, is a cornerstone of Large Language Model (LLM) pre-training. Online Data Mixing (ODM), the technique of adaptively adjusting data…

13
arXiv — NLP / Computation & Language research 6d ago

Matching Tasks to Objectives: Fine-Tuning and Prompt-Tuning Strategies for Encoder-Decoder Pre-trained Language Models

arXiv:2606.24841v1 Announce Type: cross Abstract: Prompt-based learning has emerged as a dominant paradigm in natural language processing. This study explores the impact of diverse pre-training objectives on the performance of encoder-decoder pre-trained language models across…

18
Hugging Face Daily Papers research 6d ago

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

Abstract A novel online data mixing framework called Holistic Data Scheduler uses reinforcement learning with a multi-objective reward function to optimize large language model pre-training efficiency and performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The composition…

38
r/LocalLLaMA community 6d ago

Training a Qwen 3.5 4B/9B agent for multi-tool use: SFT first or go directly to RL?

To train Qwen 3.5 4B or 9B for a custom multi-tool agent workflow and would appreciate guidance from people who have done this successfully. A few questions: SFT → RL or RL-only? - Is it still recommended to first do supervised fine-tuning (tool-calling traces, reasoning…

15
r/LocalLLaMA community 7d ago

Is Gemma 4 going to be the next Mistral (or Qwen3.6) one day? Concerning the lack of finetunes

https://eqbench.com/creative_writing.html#:~:text=gemma%2D4%2D31B,Sample From what I've seen Gemma 4 has better everything (especially long-context adherence) EXCEPT for the raw prosing performance of Mistral... finetunes . Comparing bases only, Mistral Small 3.2 (the…

5
Hugging Face Daily Papers research 10d ago

No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages

Abstract Research addresses code generation challenges for no-resource programming languages by developing benchmarks and proposing a method that combines further pre-training with weight difference transfer to create specialized instruction-following models at reduced…

27
Smol AI News news-outlet 10d ago

not much happened today

**GLM-5.2** emerges as a leading open-weight coding model rivaling **Opus 4.8** and **GPT-5.5** in software engineering tasks, emphasizing the strategic importance of open models for provider competition, on-prem deployment, and fine-tuning rights. Experts like **Patrick…

17
arXiv — Machine Learning research 11d ago

Spectral DPPs via NEPv: A Scalable Continuous Relaxation of Determinantal MAP for Diversity-Aware Data Selection

arXiv:2606.19411v1 Announce Type: new Abstract: Selecting a small, diverse, high-quality subset from a massive pool of candidates is a recurring primitive in modern machine learning -- data curation and coreset selection for training and fine-tuning large models, active-learning…

32
arXiv — Machine Learning research 11d ago

Techniques for Peak Memory Reduction for LoRA Fine-tuning of LLMs on Edge Devices

arXiv:2606.19528v1 Announce Type: new Abstract: Fine-tuning of Large Language Models (LLMs) using Low-Rank Adaptation (LoRA) on an end-user's data offers personalized experiences while keeping data private, but faces severe memory constraints on consumer hardware. Peak memory…

15
arXiv — Machine Learning research 11d ago

Tracking Representation Dynamics in Large Language Models with Persistent Homology

arXiv:2606.19542v1 Announce Type: new Abstract: Large language models are commonly aligned through supervised fine-tuning, yet little is known about how their internal representations evolve during this process. We study alignment dynamics using persistent homology by tracking…

38
arXiv — Machine Learning research 11d ago

Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates

arXiv:2606.19549v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) makes it cheap to train many domain- and task-specific language model adapters, but whether two adapters can be merged is usually discovered only after both have been fully trained and evaluated. This…

7
arXiv — Machine Learning research 11d ago

Uncertainty-Aware Reward Modeling for Stable RLHF

arXiv:2606.19818v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) aligns large language models by training reward models on preference data and optimizing policies to maximize predicted rewards. However, this pipeline faces two fundamental…

4
arXiv — Machine Learning research 11d ago

Multi-Modal Contrastive Learning for Implicit Earth Embeddings via Location Tying

arXiv:2606.20167v1 Announce Type: new Abstract: Spatial prediction tasks are often limited by a lack of high-quality labelled ground-truth observations. To overcome this challenge, self-supervised pre-training is a possible solution, with contrastive learning dominant for…

6
arXiv — NLP / Computation & Language research 11d ago

Disentangling Linguistic Relatedness from Task Alignment in Cross-Lingual Transfer

arXiv:2606.19346v1 Announce Type: new Abstract: We study cross-lingual transfer by fine-tuning seven large language models (4B--671B parameters) on Arabic and evaluating zero-shot reading comprehension on Semitic languages and non-Semitic controls. Across dense and…

6
arXiv — NLP / Computation & Language research 11d ago

Clusters are All You Need: Pre-Training the Tsetlin Machine with Semantic Clusters from Language Models for Interpretability

arXiv:2606.19815v1 Announce Type: new Abstract: Pre-trained language models such as BERT achieve strong text classification performance but lack transparency, limiting their use in high-stakes settings. The Tsetlin Machine (TM) offers fully interpretable, clause-based reasoning…

25
arXiv — NLP / Computation & Language research 11d ago

Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model Families

arXiv:2606.20225v1 Announce Type: new Abstract: Fine-tuning language models on insecure code induces emergent misalignment with poorly understood internal structure. We investigate whether this misalignment corresponds to a causally actionable activation-space direction shared…

31
arXiv — NLP / Computation & Language research 11d ago

MENTOR: Reinforcement Learning via Flexible Teacher-Optimized Rewards for Tool-Use Distillation

arXiv:2510.18383v3 Announce Type: replace Abstract: Distilling the tool-use capabilities of large language models (LLMs) into small language models (SLMs) is essential for their practical application. The predominant approach, supervised fine-tuning (SFT), suffers from poor…

20
arXiv — NLP / Computation & Language research 11d ago

Improving Alignment Between Human and Machine Codes: An Empirical Assessment of Prompt Engineering for Construct Identification in Psychology

arXiv:2512.03818v2 Announce Type: replace Abstract: Due to their architecture and vast pre-training data, large language models (LLMs) demonstrate strong text classification performance. However, LLM output - here, the category assigned to a text - depends heavily on the wording…

33
llama.cpp releases dev-tools 11d ago

b9714

server: add "X-Accel-Buffering": "no" header to streaming endpoints ( #24774 ) server: add "X-Accel-Buffering": "no" header to streaming endpoints This header tells Nginx (as a reverse proxy) to NOT buffer responses. (only affects streaming endpoints) Without it, Nginx will…

11
arXiv — Machine Learning research 12d ago

CODEBLOCK: Learning to Supervise Code at the Right Granularity

arXiv:2606.18286v1 Announce Type: new Abstract: Supervised fine-tuning of code LLMs typically applies uniform cross-entropy loss to all response tokens, implicitly assuming that every token provides equally useful learning signal. Recent token-level selection methods challenge…

34
arXiv — Machine Learning research 12d ago

DRIFT: Refining Instruction Data via On-Policy Data Attribution

arXiv:2606.18307v1 Announce Type: new Abstract: Optimizing the training data distribution for Supervised Fine-Tuning (SFT) dictates the capability of Large Language Models (LLMs). While existing data curation methods excel at accelerating training under constrained budgets, they…

23
arXiv — Machine Learning research 12d ago

Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging

arXiv:2606.18521v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Reward (RLVR) has emerged as a powerful post-training paradigm that surpasses Supervised Fine-Tuning (SFT) in eliciting reasoning intelligence and resisting catastrophic forgetting. Recent…

13
arXiv — Machine Learning research 12d ago

Robust and Interpretable Adaptation of Equivariant Materials Foundation Models via Sparsity-promoting Fine-tuning

arXiv:2606.18691v1 Announce Type: new Abstract: Pre-trained materials foundation models, or machine learning interatomic potentials, leverage general physicochemical knowledge to effectively approximate potential energy surfaces. However, they often require domain-specific…

10
arXiv — Machine Learning research 12d ago

FoMoE: Breaking the Full-Replica Barrier with a Federation of MoEs

arXiv:2606.19025v1 Announce Type: new Abstract: Pre-training Large Language Models (LLMs) typically demands large-scale infrastructure with tightly coupled hardware accelerators. While increasing model and dataset scale remains the dominant driver of performance,…

9
Hugging Face official-blog 12d ago

Beyond LoRA: Can you beat the most popular fine-tuning technique?

Back to Articles a]:hidden"> Beyond LoRA: Can you beat the most popular fine-tuning technique? Published June 18, 2026 Update on GitHub Upvote 6 Benjamin Bossan BenjaminB Sayak Paul sayakpaul Marian hubnemo Kashif Rasul kashif When you plan to fine-tune a model in a…

16
llama.cpp releases dev-tools 12d ago

b9688

server: (router) add model management API ( #23976 ) wip server: (router) add SSE realtime updates API nits wip add download API add download api update docs add delete endpoint fix std::terminate fix crash fix 2 add tests nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…

17
arXiv — Machine Learning research 13d ago

A Risk Decomposition Framework for Pre-Hoc Fine-Tuning Prediction

arXiv:2606.17649v1 Announce Type: new Abstract: The high cost of fine-tuning LLMs poses a significant economic barrier; pre-hoc performance prediction offers a critical solution to substantially reduce this expense. However, the theoretical limits of pre-hoc performance…

11
arXiv — Machine Learning research 13d ago

TuneAhead: Predicting Fine-tuning Performance Before Full Training Begins

arXiv:2606.17660v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) is compute-intensive and error-prone: model performance depends sensitively on data quality and hyperparameter choices, and na\"ive runs can even degrade model performance. This raises a…

21
arXiv — Machine Learning research 13d ago

Handling Feature Heterogeneity with Learnable Graph Patches

arXiv:2606.17667v1 Announce Type: new Abstract: In recent years, the rapid development of foundation models and graph pre-training technologies has spurred increasing interest in constructing a universal pre-trained graph model or Graph Foundation Model (GFM). However, a…

34
arXiv — Machine Learning research 13d ago

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

arXiv:2606.18089v1 Announce Type: new Abstract: Post-training pipelines that combine supervised fine-tuning (SFT) with reinforcement learning (RL) have emerged as the key recipe for transforming large language models (LLMs) into robust reasoners. We argue that this combined…

17
arXiv — NLP / Computation & Language research 13d ago

RepSelect: Robust LLM Unlearning via Representation Selectivity

arXiv:2606.17168v1 Announce Type: new Abstract: Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilities remains a central challenge in unlearning. However, current methods are easily reversed by fine-tuning or…

29
arXiv — NLP / Computation & Language research 13d ago

Improving low-resource ASR using bilingual fine-tuning with language identification: a cross-linguistic evaluation

arXiv:2606.17820v1 Announce Type: new Abstract: This study explores how bilingual fine-tuning affects automatic speech recognition (ASR) in low-resource languages. We evaluate this method across nine linguistically and geographically diverse language pairs, covering a range of…

28
arXiv — NLP / Computation & Language research 13d ago

Learning task-specific subspaces via interventional post-training of speech foundation models

arXiv:2606.17967v1 Announce Type: new Abstract: Speech foundation models, pre-trained on large corpora of unlabelled speech data, produce general-purpose representations which are useful across tasks. However, these representations encode information about salient speech…

5
arXiv — NLP / Computation & Language research 13d ago

Fine-tuning LLMs for Passive Depression Severity Estimation from AI Mental Health Dialogue

arXiv:2606.17973v1 Announce Type: new Abstract: Depression is the leading cause of disability worldwide, and early detection of symptom change is essential for timely intervention. Validated instruments such as the Patient Health Questionnaire-9 (PHQ-9) support symptom…

17
arXiv — NLP / Computation & Language research 13d ago

When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context Learning

arXiv:2606.18033v1 Announce Type: new Abstract: Cross-lingual transfer in multilingual NLP has been widely explored in supervised fine-tuning contexts, where factors like data availability and linguistic similarity largely determine transfer quality. As the field shifts toward…

13
arXiv — NLP / Computation & Language research 13d ago

Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP Backdoors

arXiv:2606.17815v1 Announce Type: cross Abstract: Contrastive Language-Image Pre-training models are widely reused across downstream interfaces, including feature extraction, retrieval, reranking, and selection. Existing CLIP backdoor, however, usually validate attacks on a…

11
r/LocalLLaMA community 13d ago

Be wary of Qwen/Claude distillations - they're often worse than the base model

Just to be clear; I am not attempting to call anybody out or be mean to those who take the time/money to make these models, I just want to inform people about these distills/finetunes since there's clearly some confusion going on. I'm going to assume those of us who often visit…

37
Hugging Face Daily Papers research 13d ago

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Abstract Prompt-Level Distillation extracts reasoning patterns from teacher models to enhance student model performance while maintaining interpretability and reducing latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Advanced reasoning typically requires Chain-of-Thought…

18
Hugging Face Daily Papers research 13d ago

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Abstract Hierarchical Advantage-Weighted Behavior Cloning (HABC) addresses sparse reward challenges in robot learning by separately optimizing viability and efficiency objectives through adaptive critic heads and intervention-aware credit assignment, significantly improving…

9
arXiv — Machine Learning research 14d ago

Zero-order Parameter-free Optimization for LMO-based Methods: Novel Approach for Efficient Fine-tuning

arXiv:2606.14970v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) has become a central application of modern optimization, enabling pretrained models to adapt to diverse downstream tasks and domain-specific data. A major obstacle in large-scale fine-tuning…

35
arXiv — Machine Learning research 14d ago

FastMix: Fast Data Mixture Optimization via Gradient Descent

arXiv:2606.14971v1 Announce Type: new Abstract: While large and diverse datasets have driven recent advances in large models, identifying the optimal data mixture for pre-training and post-training remains a significant open problem. We address this challenge with FASTMIX, a…

23
arXiv — Machine Learning research 14d ago

Greedy Coordinate Diffusion: Effective and Semantically Coherent Adversarial Attacks via Diffusion Guidance

arXiv:2606.15531v1 Announce Type: new Abstract: Fine-tuning aligned language models on benign tasks (e.g. math tutoring) systematically breaks safety guardrails, even when training data contains no harmful content. While mechanistic approaches have shed light on where alignment…

36
arXiv — Machine Learning research 14d ago

Conflict-Aware Federated Fine-Tuning of Large Language Models with Mixture-of-Experts

arXiv:2606.15625v1 Announce Type: new Abstract: The continuous scaling of large language models (LLMs) incurs prohibitive computational costs, making Mixture-of-Experts (MoE) a scalable alternative for efficient fine-tuning via sparse activation. While federated learning (FL)…

11
arXiv — NLP / Computation & Language research 14d ago

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

arXiv:2606.15007v1 Announce Type: new Abstract: We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. We pre-trained Nemotron 3 Ultra on 20 trillion text tokens, then extended the context…

19
Hugging Face Daily Papers research 14d ago

Retrieve, Don't Retrain: Extending Vision Language Action Models to New Tasks at Test Time

Abstract Retrieval-augmented vision-language-action policies eliminate per-task fine-tuning costs by using pre-trained models with indexed demonstrations, enabling efficient cross-embodiment generalization and task adaptation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

26
r/LocalLLaMA community 14d ago

Improving Neural Network Training by Decoupling the Magnitude and Direction of Weight Vectors | Alexander Hägele

This looks very promising in terms of simplifying and accelerating fine-tuning.   submitted by   /u/Thrumpwart [link]   [comments]

37
r/LocalLLaMA community 14d ago

We trained a cybersecurity-focused Mythos like LLM open weights on HuggingFace

We built OpenMythos for the Build Small Hackathon an open-source LLM trained specifically for cybersecurity tasks. Wanted to share our training approach since the RLVR setup was non-trivial and might be interesting to people doing similar domain-specific fine-tuning. The problem…

7
NVIDIA Developer Blog official-blog 14d ago

Fine-Tuning Biological Foundation Models with LoRA Using NVIDIA BioNeMo Recipes

Foundation models are reshaping computational biology. Pretrained on massive corpora of protein or genomic sequences, models such as ESM2 (a protein language...

8
arXiv — Machine Learning research 15d ago

Beyond LoRA: Is Sparsity-Induced Adaptation Better?

arXiv:2606.13767v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) and its variants provide a memory- and compute-efficient alternative to full fine-tuning of pre-trained models. However, questions remain about the comparative generalizability of these approaches and how…

28

When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

Matching Tasks to Objectives: Fine-Tuning and Prompt-Tuning Strategies for Encoder-Decoder Pre-trained Language Models

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

Training a Qwen 3.5 4B/9B agent for multi-tool use: SFT first or go directly to RL?

Is Gemma 4 going to be the next Mistral (or Qwen3.6) one day? Concerning the lack of finetunes

No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages

not much happened today

Spectral DPPs via NEPv: A Scalable Continuous Relaxation of Determinantal MAP for Diversity-Aware Data Selection

Techniques for Peak Memory Reduction for LoRA Fine-tuning of LLMs on Edge Devices

Tracking Representation Dynamics in Large Language Models with Persistent Homology

Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates

Uncertainty-Aware Reward Modeling for Stable RLHF

Multi-Modal Contrastive Learning for Implicit Earth Embeddings via Location Tying

Disentangling Linguistic Relatedness from Task Alignment in Cross-Lingual Transfer

Clusters are All You Need: Pre-Training the Tsetlin Machine with Semantic Clusters from Language Models for Interpretability

Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model Families

MENTOR: Reinforcement Learning via Flexible Teacher-Optimized Rewards for Tool-Use Distillation

Improving Alignment Between Human and Machine Codes: An Empirical Assessment of Prompt Engineering for Construct Identification in Psychology

b9714

CODEBLOCK: Learning to Supervise Code at the Right Granularity

DRIFT: Refining Instruction Data via On-Policy Data Attribution

Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging

Robust and Interpretable Adaptation of Equivariant Materials Foundation Models via Sparsity-promoting Fine-tuning

FoMoE: Breaking the Full-Replica Barrier with a Federation of MoEs

Beyond LoRA: Can you beat the most popular fine-tuning technique?

b9688

A Risk Decomposition Framework for Pre-Hoc Fine-Tuning Prediction

TuneAhead: Predicting Fine-tuning Performance Before Full Training Begins

Handling Feature Heterogeneity with Learnable Graph Patches

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

RepSelect: Robust LLM Unlearning via Representation Selectivity

Improving low-resource ASR using bilingual fine-tuning with language identification: a cross-linguistic evaluation

Learning task-specific subspaces via interventional post-training of speech foundation models

Fine-tuning LLMs for Passive Depression Severity Estimation from AI Mental Health Dialogue

When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context Learning

Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP Backdoors

Be wary of Qwen/Claude distillations - they're often worse than the base model

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Zero-order Parameter-free Optimization for LMO-based Methods: Novel Approach for Efficient Fine-tuning

FastMix: Fast Data Mixture Optimization via Gradient Descent

Greedy Coordinate Diffusion: Effective and Semantically Coherent Adversarial Attacks via Diffusion Guidance

Conflict-Aware Federated Fine-Tuning of Large Language Models with Mixture-of-Experts

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Retrieve, Don't Retrain: Extending Vision Language Action Models to New Tasks at Test Time

Improving Neural Network Training by Decoupling the Magnitude and Direction of Weight Vectors | Alexander Hägele

We trained a cybersecurity-focused Mythos like LLM open weights on HuggingFace

Fine-Tuning Biological Foundation Models with LoRA Using NVIDIA BioNeMo Recipes

Beyond LoRA: Is Sparsity-Induced Adaptation Better?