News / #training Tag Training 422 articles archived under #training · RSS Sign in to follow arXiv — NLP / Computation & Language research 6d ago When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs arXiv:2606.24119v1 Announce Type: cross Abstract: Discrete diffusion language model (DLM) fine-tuning inherits inexpensive diagnostics from denoising-time confidence monitors, but their PEFT-training meaning is untested. We test top-1 argmax concentration as a collapse warning.… 12 arXiv — NLP / Computation & Language research 6d ago Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning arXiv:2606.24133v1 Announce Type: cross Abstract: The composition of training data, governed by the diversity of sources and their mixing strategy, is a cornerstone of Large Language Model (LLM) pre-training. Online Data Mixing (ODM), the technique of adaptively adjusting data… 13 arXiv — NLP / Computation & Language research 6d ago Matching Tasks to Objectives: Fine-Tuning and Prompt-Tuning Strategies for Encoder-Decoder Pre-trained Language Models arXiv:2606.24841v1 Announce Type: cross Abstract: Prompt-based learning has emerged as a dominant paradigm in natural language processing. This study explores the impact of diverse pre-training objectives on the performance of encoder-decoder pre-trained language models across… 18 Hugging Face Daily Papers research 6d ago Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning Abstract A novel online data mixing framework called Holistic Data Scheduler uses reinforcement learning with a multi-objective reward function to optimize large language model pre-training efficiency and performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The composition… 38 r/LocalLLaMA community 6d ago Training a Qwen 3.5 4B/9B agent for multi-tool use: SFT first or go directly to RL? To train Qwen 3.5 4B or 9B for a custom multi-tool agent workflow and would appreciate guidance from people who have done this successfully. A few questions: SFT → RL or RL-only? - Is it still recommended to first do supervised fine-tuning (tool-calling traces, reasoning… 15 r/LocalLLaMA community 7d ago Is Gemma 4 going to be the next Mistral (or Qwen3.6) one day? Concerning the lack of finetunes https://eqbench.com/creative_writing.html#:~:text=gemma%2D4%2D31B,Sample From what I've seen Gemma 4 has better everything (especially long-context adherence) EXCEPT for the raw prosing performance of Mistral... finetunes . Comparing bases only, Mistral Small 3.2 (the… 5 Hugging Face Daily Papers research 10d ago No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages Abstract Research addresses code generation challenges for no-resource programming languages by developing benchmarks and proposing a method that combines further pre-training with weight difference transfer to create specialized instruction-following models at reduced… 27 Smol AI News news-outlet 10d ago not much happened today **GLM-5.2** emerges as a leading open-weight coding model rivaling **Opus 4.8** and **GPT-5.5** in software engineering tasks, emphasizing the strategic importance of open models for provider competition, on-prem deployment, and fine-tuning rights. Experts like **Patrick… 17 arXiv — Machine Learning research 11d ago Spectral DPPs via NEPv: A Scalable Continuous Relaxation of Determinantal MAP for Diversity-Aware Data Selection arXiv:2606.19411v1 Announce Type: new Abstract: Selecting a small, diverse, high-quality subset from a massive pool of candidates is a recurring primitive in modern machine learning -- data curation and coreset selection for training and fine-tuning large models, active-learning… 32 arXiv — Machine Learning research 11d ago Techniques for Peak Memory Reduction for LoRA Fine-tuning of LLMs on Edge Devices arXiv:2606.19528v1 Announce Type: new Abstract: Fine-tuning of Large Language Models (LLMs) using Low-Rank Adaptation (LoRA) on an end-user's data offers personalized experiences while keeping data private, but faces severe memory constraints on consumer hardware. Peak memory… 15 arXiv — Machine Learning research 11d ago Tracking Representation Dynamics in Large Language Models with Persistent Homology arXiv:2606.19542v1 Announce Type: new Abstract: Large language models are commonly aligned through supervised fine-tuning, yet little is known about how their internal representations evolve during this process. We study alignment dynamics using persistent homology by tracking… 38 arXiv — Machine Learning research 11d ago Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates arXiv:2606.19549v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) makes it cheap to train many domain- and task-specific language model adapters, but whether two adapters can be merged is usually discovered only after both have been fully trained and evaluated. This… 7 arXiv — Machine Learning research 11d ago Uncertainty-Aware Reward Modeling for Stable RLHF arXiv:2606.19818v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) aligns large language models by training reward models on preference data and optimizing policies to maximize predicted rewards. However, this pipeline faces two fundamental… 4 arXiv — Machine Learning research 11d ago Multi-Modal Contrastive Learning for Implicit Earth Embeddings via Location Tying arXiv:2606.20167v1 Announce Type: new Abstract: Spatial prediction tasks are often limited by a lack of high-quality labelled ground-truth observations. To overcome this challenge, self-supervised pre-training is a possible solution, with contrastive learning dominant for… 6 arXiv — NLP / Computation & Language research 11d ago Disentangling Linguistic Relatedness from Task Alignment in Cross-Lingual Transfer arXiv:2606.19346v1 Announce Type: new Abstract: We study cross-lingual transfer by fine-tuning seven large language models (4B--671B parameters) on Arabic and evaluating zero-shot reading comprehension on Semitic languages and non-Semitic controls. Across dense and… 6 arXiv — NLP / Computation & Language research 11d ago Clusters are All You Need: Pre-Training the Tsetlin Machine with Semantic Clusters from Language Models for Interpretability arXiv:2606.19815v1 Announce Type: new Abstract: Pre-trained language models such as BERT achieve strong text classification performance but lack transparency, limiting their use in high-stakes settings. The Tsetlin Machine (TM) offers fully interpretable, clause-based reasoning… 25 arXiv — NLP / Computation & Language research 11d ago Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model Families arXiv:2606.20225v1 Announce Type: new Abstract: Fine-tuning language models on insecure code induces emergent misalignment with poorly understood internal structure. We investigate whether this misalignment corresponds to a causally actionable activation-space direction shared… 31 arXiv — NLP / Computation & Language research 11d ago MENTOR: Reinforcement Learning via Flexible Teacher-Optimized Rewards for Tool-Use Distillation arXiv:2510.18383v3 Announce Type: replace Abstract: Distilling the tool-use capabilities of large language models (LLMs) into small language models (SLMs) is essential for their practical application. The predominant approach, supervised fine-tuning (SFT), suffers from poor… 20 arXiv — NLP / Computation & Language research 11d ago Improving Alignment Between Human and Machine Codes: An Empirical Assessment of Prompt Engineering for Construct Identification in Psychology arXiv:2512.03818v2 Announce Type: replace Abstract: Due to their architecture and vast pre-training data, large language models (LLMs) demonstrate strong text classification performance. However, LLM output - here, the category assigned to a text - depends heavily on the wording… 33 llama.cpp releases dev-tools 11d ago b9714 server: add "X-Accel-Buffering": "no" header to streaming endpoints ( #24774 ) server: add "X-Accel-Buffering": "no" header to streaming endpoints This header tells Nginx (as a reverse proxy) to NOT buffer responses. (only affects streaming endpoints) Without it, Nginx will… 11 arXiv — Machine Learning research 12d ago CODEBLOCK: Learning to Supervise Code at the Right Granularity arXiv:2606.18286v1 Announce Type: new Abstract: Supervised fine-tuning of code LLMs typically applies uniform cross-entropy loss to all response tokens, implicitly assuming that every token provides equally useful learning signal. Recent token-level selection methods challenge… 34 arXiv — Machine Learning research 12d ago DRIFT: Refining Instruction Data via On-Policy Data Attribution arXiv:2606.18307v1 Announce Type: new Abstract: Optimizing the training data distribution for Supervised Fine-Tuning (SFT) dictates the capability of Large Language Models (LLMs). While existing data curation methods excel at accelerating training under constrained budgets, they… 23 arXiv — Machine Learning research 12d ago Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging arXiv:2606.18521v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Reward (RLVR) has emerged as a powerful post-training paradigm that surpasses Supervised Fine-Tuning (SFT) in eliciting reasoning intelligence and resisting catastrophic forgetting. Recent… 13 arXiv — Machine Learning research 12d ago Robust and Interpretable Adaptation of Equivariant Materials Foundation Models via Sparsity-promoting Fine-tuning arXiv:2606.18691v1 Announce Type: new Abstract: Pre-trained materials foundation models, or machine learning interatomic potentials, leverage general physicochemical knowledge to effectively approximate potential energy surfaces. However, they often require domain-specific… 10 arXiv — Machine Learning research 12d ago FoMoE: Breaking the Full-Replica Barrier with a Federation of MoEs arXiv:2606.19025v1 Announce Type: new Abstract: Pre-training Large Language Models (LLMs) typically demands large-scale infrastructure with tightly coupled hardware accelerators. While increasing model and dataset scale remains the dominant driver of performance,… 9 Hugging Face official-blog 12d ago Beyond LoRA: Can you beat the most popular fine-tuning technique? Back to Articles a]:hidden"> Beyond LoRA: Can you beat the most popular fine-tuning technique? Published June 18, 2026 Update on GitHub Upvote 6 Benjamin Bossan BenjaminB Sayak Paul sayakpaul Marian hubnemo Kashif Rasul kashif When you plan to fine-tune a model in a… 16 llama.cpp releases dev-tools 12d ago b9688 server: (router) add model management API ( #23976 ) wip server: (router) add SSE realtime updates API nits wip add download API add download api update docs add delete endpoint fix std::terminate fix crash fix 2 add tests nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple… 17 arXiv — Machine Learning research 13d ago A Risk Decomposition Framework for Pre-Hoc Fine-Tuning Prediction arXiv:2606.17649v1 Announce Type: new Abstract: The high cost of fine-tuning LLMs poses a significant economic barrier; pre-hoc performance prediction offers a critical solution to substantially reduce this expense. However, the theoretical limits of pre-hoc performance… 11 arXiv — Machine Learning research 13d ago TuneAhead: Predicting Fine-tuning Performance Before Full Training Begins arXiv:2606.17660v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) is compute-intensive and error-prone: model performance depends sensitively on data quality and hyperparameter choices, and na\"ive runs can even degrade model performance. This raises a… 21 arXiv — Machine Learning research 13d ago Handling Feature Heterogeneity with Learnable Graph Patches arXiv:2606.17667v1 Announce Type: new Abstract: In recent years, the rapid development of foundation models and graph pre-training technologies has spurred increasing interest in constructing a universal pre-trained graph model or Graph Foundation Model (GFM). However, a… 34 arXiv — Machine Learning research 13d ago From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning arXiv:2606.18089v1 Announce Type: new Abstract: Post-training pipelines that combine supervised fine-tuning (SFT) with reinforcement learning (RL) have emerged as the key recipe for transforming large language models (LLMs) into robust reasoners. We argue that this combined… 17 arXiv — NLP / Computation & Language research 13d ago RepSelect: Robust LLM Unlearning via Representation Selectivity arXiv:2606.17168v1 Announce Type: new Abstract: Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilities remains a central challenge in unlearning. However, current methods are easily reversed by fine-tuning or… 29 arXiv — NLP / Computation & Language research 13d ago Improving low-resource ASR using bilingual fine-tuning with language identification: a cross-linguistic evaluation arXiv:2606.17820v1 Announce Type: new Abstract: This study explores how bilingual fine-tuning affects automatic speech recognition (ASR) in low-resource languages. We evaluate this method across nine linguistically and geographically diverse language pairs, covering a range of… 28 arXiv — NLP / Computation & Language research 13d ago Learning task-specific subspaces via interventional post-training of speech foundation models arXiv:2606.17967v1 Announce Type: new Abstract: Speech foundation models, pre-trained on large corpora of unlabelled speech data, produce general-purpose representations which are useful across tasks. However, these representations encode information about salient speech… 5 arXiv — NLP / Computation & Language research 13d ago Fine-tuning LLMs for Passive Depression Severity Estimation from AI Mental Health Dialogue arXiv:2606.17973v1 Announce Type: new Abstract: Depression is the leading cause of disability worldwide, and early detection of symptom change is essential for timely intervention. Validated instruments such as the Patient Health Questionnaire-9 (PHQ-9) support symptom… 17 arXiv — NLP / Computation & Language research 13d ago When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context Learning arXiv:2606.18033v1 Announce Type: new Abstract: Cross-lingual transfer in multilingual NLP has been widely explored in supervised fine-tuning contexts, where factors like data availability and linguistic similarity largely determine transfer quality. As the field shifts toward… 13 arXiv — NLP / Computation & Language research 13d ago Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP Backdoors arXiv:2606.17815v1 Announce Type: cross Abstract: Contrastive Language-Image Pre-training models are widely reused across downstream interfaces, including feature extraction, retrieval, reranking, and selection. Existing CLIP backdoor, however, usually validate attacks on a… 11 r/LocalLLaMA community 13d ago Be wary of Qwen/Claude distillations - they're often worse than the base model Just to be clear; I am not attempting to call anybody out or be mean to those who take the time/money to make these models, I just want to inform people about these distills/finetunes since there's clearly some confusion going on. I'm going to assume those of us who often visit… 37 Hugging Face Daily Papers research 13d ago Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning Abstract Prompt-Level Distillation extracts reasoning patterns from teacher models to enhance student model performance while maintaining interpretability and reducing latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Advanced reasoning typically requires Chain-of-Thought… 18 Hugging Face Daily Papers research 13d ago Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes Abstract Hierarchical Advantage-Weighted Behavior Cloning (HABC) addresses sparse reward challenges in robot learning by separately optimizing viability and efficiency objectives through adaptive critic heads and intervention-aware credit assignment, significantly improving… 9 arXiv — Machine Learning research 14d ago Zero-order Parameter-free Optimization for LMO-based Methods: Novel Approach for Efficient Fine-tuning arXiv:2606.14970v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) has become a central application of modern optimization, enabling pretrained models to adapt to diverse downstream tasks and domain-specific data. A major obstacle in large-scale fine-tuning… 35 arXiv — Machine Learning research 14d ago FastMix: Fast Data Mixture Optimization via Gradient Descent arXiv:2606.14971v1 Announce Type: new Abstract: While large and diverse datasets have driven recent advances in large models, identifying the optimal data mixture for pre-training and post-training remains a significant open problem. We address this challenge with FASTMIX, a… 23 arXiv — Machine Learning research 14d ago Greedy Coordinate Diffusion: Effective and Semantically Coherent Adversarial Attacks via Diffusion Guidance arXiv:2606.15531v1 Announce Type: new Abstract: Fine-tuning aligned language models on benign tasks (e.g. math tutoring) systematically breaks safety guardrails, even when training data contains no harmful content. While mechanistic approaches have shed light on where alignment… 36 arXiv — Machine Learning research 14d ago Conflict-Aware Federated Fine-Tuning of Large Language Models with Mixture-of-Experts arXiv:2606.15625v1 Announce Type: new Abstract: The continuous scaling of large language models (LLMs) incurs prohibitive computational costs, making Mixture-of-Experts (MoE) a scalable alternative for efficient fine-tuning via sparse activation. While federated learning (FL)… 11 arXiv — NLP / Computation & Language research 14d ago Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning arXiv:2606.15007v1 Announce Type: new Abstract: We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. We pre-trained Nemotron 3 Ultra on 20 trillion text tokens, then extended the context… 19 Hugging Face Daily Papers research 14d ago Retrieve, Don't Retrain: Extending Vision Language Action Models to New Tasks at Test Time Abstract Retrieval-augmented vision-language-action policies eliminate per-task fine-tuning costs by using pre-trained models with indexed demonstrations, enabling efficient cross-embodiment generalization and task adaptation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 26 r/LocalLLaMA community 14d ago Improving Neural Network Training by Decoupling the Magnitude and Direction of Weight Vectors | Alexander Hägele This looks very promising in terms of simplifying and accelerating fine-tuning.   submitted by   /u/Thrumpwart [link]   [comments] 37 r/LocalLLaMA community 14d ago We trained a cybersecurity-focused Mythos like LLM open weights on HuggingFace We built OpenMythos for the Build Small Hackathon an open-source LLM trained specifically for cybersecurity tasks. Wanted to share our training approach since the RLVR setup was non-trivial and might be interesting to people doing similar domain-specific fine-tuning. The problem… 7 NVIDIA Developer Blog official-blog 14d ago Fine-Tuning Biological Foundation Models with LoRA Using NVIDIA BioNeMo Recipes Foundation models are reshaping computational biology. Pretrained on massive corpora of protein or genomic sequences, models such as ESM2 (a protein language... 8 arXiv — Machine Learning research 15d ago Beyond LoRA: Is Sparsity-Induced Adaptation Better? arXiv:2606.13767v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) and its variants provide a memory- and compute-efficient alternative to full fine-tuning of pre-trained models. However, questions remain about the comparative generalizability of these approaches and how… 28 Page 2 of 9 · 422 articles ← Newer Older →