Tag

Training

422 articles archived under #training · RSS

arXiv — Machine Learning research 1h ago

A Gravitational Interpretation of Fine-Tuning Reversion

arXiv:2606.28525v1 Announce Type: new Abstract: Fine-tuning on harmless data can partially undo behaviors acquired earlier in training. Safety can erode under benign post-alignment updates, unlearned capabilities can re-emerge, latent traits can transfer through apparently…

27
arXiv — Machine Learning research 1h ago

DLR: Zero-Inference-Cost Latent Residuals for Low-Rank Pre-Training

arXiv:2606.28932v1 Announce Type: new Abstract: Large language models have driven recent progress in language and multimodal AI, yet pre-training them at scale is prohibitively expensive. Low-rank pre-training, which factorizes each weight matrix into a rank-r product to reduce…

35
arXiv — Machine Learning research 1h ago

BaRA: Bayesian Adaptive Rank Allocation for Parameter-Efficient Fine-Tuning

arXiv:2606.29184v1 Announce Type: new Abstract: While Low-rank adaptation (LoRA) enables highly efficient fine-tuning by constraining task-specific updates to fixed low-rank subspaces, this rigid design limits representational flexibility and often results in overconfident…

15
arXiv — Machine Learning research 1h ago

Optimizer Memory Makes Shuffle Order a First-Order Source of Fine-Tuning Noise

arXiv:2606.29554v1 Announce Type: new Abstract: Shuffle order can be a larger source of fine-tuning noise than a memoryless analysis predicts: fixed-clock optimizer memory makes local equal-multiset contrasts first order in the learning rate rather than second order, and the…

8
arXiv — NLP / Computation & Language research 1h ago

The Heterogeneous Safety Impacts of Benign Multilingual Fine-Tuning

arXiv:2606.28843v1 Announce Type: new Abstract: Fine-tuning a large language model is a ubiquitous method for enhancing its capability on a specific downstream task. However, prior work has shown that this increase in capability comes with a cost: it can increase a model's…

18
arXiv — NLP / Computation & Language research 1h ago

PASTA: A Paraphrasing And Self-Training Approach for Knowledge Updating in LLMs

arXiv:2606.28898v1 Announce Type: new Abstract: Knowledge updating in pre-trained Large Language Models (LLMs) remains an important challenge. While continual training provides a potential avenue for knowledge updating, it continues to present substantial technical difficulties.…

20
arXiv — NLP / Computation & Language research 1h ago

Fine-Tuning General-Purpose Large Language Models for Agricultural Applications:A Reproducible Framework and Evaluation Protocol Based on Qwen3-8B

arXiv:2606.28992v1 Announce Type: new Abstract: General-purpose large language models (LLMs) have demonstrated strong abilities in opendomain question answering, information extraction, and text generation. Agricultural applications, however, are domain-specific,…

20
arXiv — NLP / Computation & Language research 1h ago

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

arXiv:2606.29082v1 Announce Type: new Abstract: Would experience designing faster GPU kernels also help close in on a long-standing open mathematical conjecture? Large Language Models (LLMs) integrated into evolutionary search have recently produced state-of-the-art solutions on…

4
arXiv — NLP / Computation & Language research 1h ago

Do We Still Need Fine Tuning? Turkish Sentiment Analysis in the Era of Large Language Model

arXiv:2606.29614v1 Announce Type: new Abstract: This study examines whether supervised fine-tuning remains necessary for Turkish sentiment analysis in the era of large language models. We compare classical machine learning methods, fine-tuned pretrained language models, and…

35
arXiv — NLP / Computation & Language research 1h ago

SrDetection: A Self-Referential Framework for Data Leakage Detection in Code Large Language Models

arXiv:2606.29815v1 Announce Type: new Abstract: Evaluating code large language models (Code LLMs) requires reliable detection of data leakage, where benchmark performance is artificially inflated by exposure to benchmark data during pre-training. Existing approaches either…

7
r/MachineLearning community 13h ago

I'm trying to implement CALM paper, and I have some questions. [P]

Hello, I'm trying to implement the Pocket TTS by kyutai-labs represented by this paper . Since they have didn't released the training/fine-tuning code. I'm trying to implement it on my own for learning some stuff. I have read the paper, tried to implement it with much more…

34
r/LocalLLaMA community 1d ago

Update: First Manual Results from Testing Procedural Skill Transfer in Small Models

Yesterday I posted an idea for testing whether a large model can transfer some of its procedural skill to a smaller model without fine-tuning. The short version of the idea was this: Small models are often not completely lacking knowledge. They know the syntax. They know the…

18
arXiv — Machine Learning research 1d ago

PEBS: Per-rater Empirical-Bayes Shrinkage for RLHF Reward-Model Calibration

arXiv:2606.27578v1 Announce Type: new Abstract: Reward models for Reinforcement Learning from Human Feedback (RLHF) pool preferences across thousands of annotators and fit one global affine calibrator, collapsing raters with systematically different rating-scale offsets and…

36
arXiv — Machine Learning research 1d ago

Retroactive Advantage Correction: Closed-Form V-Trace Bias Correction for Delay-Aware RLHF

arXiv:2606.27580v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) in production does not always have a synchronous reward signal. Code-execution verifiers, slow judge ensembles, and queued human review can return several gradient steps after the…

14
arXiv — Machine Learning research 1d ago

Two-Stage Fine-Tuning for Protein Sequence Generation with Targeted Amino-Acid Composition

arXiv:2606.27939v1 Announce Type: new Abstract: Protein language models are standard priors for biological sequence generation, but steering them toward explicit distributional design targets remains largely unexplored. We study a constrained protein generation problem in which…

24
arXiv — Machine Learning research 1d ago

When One Adapter Speaks for Many: Discovering Low-Rank Redundancy in Continual Fine-Tuning

arXiv:2606.28117v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) has become the standard tool for parameter-efficient fine-tuning of large pretrained models. When applied sequentially across tasks in Continual Learning (CL), the standard assumption is that each new…

38
arXiv — Machine Learning research 1d ago

Qwen-Image-2.0-RL Technical Report

arXiv:2606.27608v1 Announce Type: cross Abstract: We present Qwen-Image-2.0-RL, a post-training pipeline that applies reinforcement learning from human feedback (RLHF) and on-policy distillation (OPD) to improve both the visual quality and instruction-following capability of the…

34
arXiv — NLP / Computation & Language research 1d ago

Causal Connections: Leveraging Multilingual Fine-Tuning for Financial QA@FinCausal 2026

arXiv:2606.27446v1 Announce Type: new Abstract: This paper describes team HSA_CORAL's submission to the FinCausal 2026 shared task on extracting cause-effect relations from financial narratives via extractive question answering in English and Spanish. We compare three modeling…

4
arXiv — NLP / Computation & Language research 1d ago

Low-Agreeableness Persona Conditioning for Safe LLM Fine-Tuning

arXiv:2606.27709v1 Announce Type: new Abstract: Recent work has shown that fine-tuning large language models (LLMs) for social warmth degrades factual reliability and increases sycophancy. We investigate a related but distinct failure mode: warmth fine-tuning also weakens…

22
arXiv — NLP / Computation & Language research 1d ago

HPRO: Hierarchical Progressive Reward Optimization via Preference Extraction for Emotional Text-to-Speech

arXiv:2606.28249v1 Announce Type: cross Abstract: Recently, Large Language Model (LLM)-based Text-to-Speech (TTS) models have achieved remarkable naturalness. However, the standard Supervised Fine-Tuning paradigm often converges to statistically averaged prosody, limiting…

20
arXiv — NLP / Computation & Language research 1d ago

Continual Memorization of Factoids in Language Models

arXiv:2411.07175v3 Announce Type: replace Abstract: As new knowledge rapidly accumulates, language models (LMs) with pretrained knowledge quickly become obsolete. A common approach to updating LMs is fine-tuning them directly on new knowledge. However, recent studies have shown…

27
r/LocalLLaMA community 1d ago

MLX Fine-Tune Example Guide

A Local MLX Fine-Tuning Experiment Just finished a local LoRA fine-tune of a 7B instruction model on Apple Silicon, via MLX, teaching it a high-fantasy literary register (Gene Wolfe and Tolkien). This is a more rigorous version with more data of something I tried two years ago…

14
r/LocalLLaMA community 2d ago

A Blind Visual Paradigm for Testing Skill Transfer in Small Models Without Fine-Tuning

TL;DR: Small models aren't dumb, they're shallow. I designed a cross-domain, blind, visual experiment to see if a large model can compress its "planning discipline" into a reusable scaffold that makes a small model deeper — with zero fine-tuning. Three.js is the testbed because…

28
r/LocalLLaMA community 2d ago

I built a tool to turn your Claude Code sessions into fine-tuning data for local models

If you use Claude Code, every session is already sitting on disk as a .jsonl file under ~/.claude/projects/ . It has real coding conversations: multi-turn edits, tool calls, reasoning traces. That's training data you already generated for free. The problem is the format is not…

36
r/LocalLLaMA community 2d ago

Anyone still doing fine-tunes on consumer grade hardware?

Felt like there used to be a thriving fine-tuning community a few years back - and then once we started getting models that were smart enough and generalist enough (i.e. post Llama-3-8b era) things kind of dropped off a little. Less need for fine-tunes when prompt-tweaking can…

22
r/LocalLLaMA community 2d ago

Are there any qwen finetunes that were genuinely stronger than the base?

It's pretty popular to finetune qwen models but I never hear anyone say anything positive about them.   submitted by   /u/MrMrsPotts [link]   [comments]

30
Hugging Face Daily Papers research 4d ago

How Post-Training Shapes Biological Reasoning Models

Abstract Post-training stages in biological reasoning models differently affect generalization, with continued pre-training aligning models with biological language, supervised fine-tuning improving in-domain performance but reducing out-of-domain generalization, and…

8
arXiv — Machine Learning research 4d ago

SSM Adapters via Hankel Reduced-order Modeling: Injection Site Determines Task Suitability in Long-Context Fine-Tuning

arXiv:2606.26290v1 Announce Type: new Abstract: While parameter-efficient fine-tuning (PEFT) typically targets attention projectors, its efficacy for tasks requiring sequential state accumulation remains under-explored. We examine if PEFT for such tasks can benefit from state…

18
arXiv — Machine Learning research 4d ago

At the Edge of Understanding: Sparse Autoencoders Trace The Limits of Transformer Generalization

arXiv:2606.26396v1 Announce Type: new Abstract: Pre-trained transformers have demonstrated remarkable generalization abilities, at times extending beyond the scope of their training data. Yet, real-world deployments often face unexpected or adversarial data that diverges from…

34
arXiv — Machine Learning research 4d ago

Localizing RL-Induced Tool Use to a Single Crosscoder Feature

arXiv:2606.26474v1 Announce Type: new Abstract: Fine-tuning through RL reshapes the internal representations of language models to enable agentic behaviors such as tool use, yet the mechanistic basis of these changes remains poorly understood. While RL substantially improves…

4
arXiv — Machine Learning research 4d ago

Reasoning Quality Emerges Early: Data Curation for Reasoning Models

arXiv:2606.26797v1 Announce Type: new Abstract: Supervised fine-tuning (SFT) on a small, high-quality set of long reasoning traces is an effective approach for eliciting strong reasoning capabilities in Large Language Models (LLMs). However, existing methods for curating…

14
arXiv — Machine Learning research 4d ago

Designing Reward Signals for Portable Query Generation: A Case Study in Industrial Semantic Job Search

arXiv:2606.27291v1 Announce Type: new Abstract: Job-search platforms rely on low-bandwidth query interfaces that often fail to capture the high-dimensional complexity of candidate profiles. We present an end-to-end RLAIF (Reinforcement Learning from AI Feedback) framework to…

10
arXiv — NLP / Computation & Language research 4d ago

Helpfulness Hurts: Domain-Dependent Degradation of Mid-Trained Compassion Values Under Post-Training

arXiv:2606.26102v1 Announce Type: new Abstract: Standard post-training pipelines apply supervised fine-tuning (SFT) and reinforcement learning (RL) to make language models helpful, but these processes may inadvertently degrade values instilled during pre-training. We investigate…

22
arXiv — NLP / Computation & Language research 4d ago

Closing the Quality Gap in Low-Resource Text-to-Speech: LoRA Fine-Tuning of VoxCPM2 for Khmer and Korean

arXiv:2606.26618v1 Announce Type: new Abstract: Large pretrained text-to-speech (TTS) models sound almost human for well-resourced languages, but much worse for languages that are rare in their training data. We study this quality gap for Khmer and Korean using VoxCPM2, a…

26
arXiv — NLP / Computation & Language research 4d ago

Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization

arXiv:2606.27025v1 Announce Type: new Abstract: Building general-purpose role-playing agents that faithfully portray any character from a natural-language profile remains challenging. The dominant paradigm -- supervised fine-tuning -- encourages behavioral mimicry without deep,…

16
r/LocalLLaMA community 4d ago

When you don't have a data center GPU

Please don't tell me someone is going to (yet again) reply with the longest finetune-merge name in eternity...   submitted by   /u/Iwaku_Real [link]   [comments]

4
Hugging Face official-blog 4d ago

Run a vLLM Server on HF Jobs in One Command

Back to Articles a]:hidden"> Run a vLLM Server on HF Jobs in One Command Published June 26, 2026 Update on GitHub Upvote - Quentin Gallouédec qgallouedec You can spin up a private, OpenAI-compatible LLM endpoint on Hugging Face infrastructure with a single command — no servers…

18
r/LocalLLaMA community 4d ago

Qwen 3.6 27b GLM 5.2 fine-tune?

Hi everyone, Since both models are open weights and GLM seems to find that secret to frontier model reasoning, why don't we see any Qwen GLM finetune yet? Is it because GLM 5.2 is recent and finetune and datasets take time or the community is just not interested in the finetune?…

28
r/LocalLLaMA community 4d ago

DGX Spark OS lifetime?

I think of purchasing 2 DGX Sparks for my office (because a 700+W workstation would be intolerable) for LLM-centric work (inference only, no fine-tuning). I know the OS is based on Ubuntu 24.04. Has Nvidia ever disclosed what is the lifetime of the OS? Meaning, is there a chance…

17
r/MachineLearning community 4d ago

[R] Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost

Token-based billing is causing my company to reevaluate small language models. I came across this paper that shows SLM supervised fine-tuning on traces from orchestration of frontier models can be nearly as performant and much cheaper. Has any tried this in the real world?  …

34
arXiv — Machine Learning research 5d ago

Retrieval-Augmented Personalization with Foundation Models for Wearable Stress Detection

arXiv:2606.24985v1 Announce Type: new Abstract: Personalization in wearable-based stress detection remains challenging due to substantial inter-individual variability in physiological and behavioral responses. While traditional approaches rely on user-specific fine-tuning or…

5
arXiv — Machine Learning research 5d ago

The Geometry of Sequential Learning: Lie-Bracket Prediction of Transfer Order

arXiv:2606.24993v1 Announce Type: new Abstract: Sequential learning is order-dependent: from Pile-style next-token domain adaptation to instruction-SFT and DPO, N candidate sources induce N! possible curricula. We show that the local order effect is governed by a computable…

7
arXiv — NLP / Computation & Language research 5d ago

Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients

arXiv:2606.25008v1 Announce Type: cross Abstract: Neural scaling laws describe how pre-training loss decays as power laws with training time, model size, and compute. This position paper argues that the exponents of these power laws are fixed by generic mechanisms: a one-third…

13
arXiv — NLP / Computation & Language research 5d ago

Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning

arXiv:2606.26036v1 Announce Type: new Abstract: Training-time data poisoning during fine-tuning poses a significant threat to large language models (LLMs) deployed for abstractive text summarization, where small task-specific datasets exert disproportionate influence on model…

26
arXiv — NLP / Computation & Language research 5d ago

Does Translation-Enhanced Speech Encoder Pre-training Affect Speech LLMs?

arXiv:2606.25444v1 Announce Type: cross Abstract: Connecting a pre-trained speech encoder to a Large Language Model (LLM) is the standard architecture for building Speech LLMs. However, a structural misalignment exists between the encoder and the LLM. Unlike encoders based on…

23
arXiv — NLP / Computation & Language research 5d ago

Scale or Reason? A Compute-Equivalent Analysis of Reasoning Distillation

arXiv:2509.22193v2 Announce Type: replace Abstract: Distilling reasoning traces from strong teacher models has become the standard recipe for building capable small language models. Yet reasoning traces are 5-20$\times$ longer than standard instruction fine-tuning (IFT) outputs,…

19
r/LocalLLaMA community 5d ago

Gemma4-26B-A4B & 31B-QAT Uncensored Balanced are out with MTP (35% & 53% speed boost)!

First of all, I'm stoked to announce we are almost at 20 million downloads on HF! (counted only on my own account, no duplicates/quants/finetunes/etc) and almost 5000 members on Discord! Two releases this time, as promised, the bigger Gemma 4 QATs, both Balanced, both with MTP :…

6
r/MachineLearning community 5d ago

I made a superhuman Generals.io agent with self-play RL [P]

Hi everyone, I trained a self-play RL agent for Generals.io that reached superhuman-level and ranked #1 on the human 1v1 leaderboard. It began as my master's thesis where the goal was to beat a prior algorithm based agent. We succeeded using behavior cloning, RL fine-tuning and…

6
Hugging Face official-blog 5d ago

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

Back to Articles a]:hidden"> Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel Enterprise + Article Published June 24, 2026 Upvote - Adil Asif adil-asif nvidia Alexandros Koumparoulis akoumpa nvidia Wenwen Gao wgao2021 nvidia Sylendran Arunagiri Sylendran95 nvidia…

29
arXiv — Machine Learning research 6d ago

Weight-Space Geometry of Offline Reasoning Training

arXiv:2606.23740v1 Announce Type: new Abstract: Offline reinforcement-learning losses (RFT, RIFT, DFT, Offline GRPO, DPO) are widely used to distill reasoning from large teachers into smaller students, and are typically compared on downstream accuracy alone. We ask whether they…

6

A Gravitational Interpretation of Fine-Tuning Reversion

DLR: Zero-Inference-Cost Latent Residuals for Low-Rank Pre-Training

BaRA: Bayesian Adaptive Rank Allocation for Parameter-Efficient Fine-Tuning

Optimizer Memory Makes Shuffle Order a First-Order Source of Fine-Tuning Noise

The Heterogeneous Safety Impacts of Benign Multilingual Fine-Tuning

PASTA: A Paraphrasing And Self-Training Approach for Knowledge Updating in LLMs

Fine-Tuning General-Purpose Large Language Models for Agricultural Applications:A Reproducible Framework and Evaluation Protocol Based on Qwen3-8B

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

Do We Still Need Fine Tuning? Turkish Sentiment Analysis in the Era of Large Language Model

SrDetection: A Self-Referential Framework for Data Leakage Detection in Code Large Language Models

I'm trying to implement CALM paper, and I have some questions. [P]

Update: First Manual Results from Testing Procedural Skill Transfer in Small Models

PEBS: Per-rater Empirical-Bayes Shrinkage for RLHF Reward-Model Calibration

Retroactive Advantage Correction: Closed-Form V-Trace Bias Correction for Delay-Aware RLHF

Two-Stage Fine-Tuning for Protein Sequence Generation with Targeted Amino-Acid Composition

When One Adapter Speaks for Many: Discovering Low-Rank Redundancy in Continual Fine-Tuning

Qwen-Image-2.0-RL Technical Report

Causal Connections: Leveraging Multilingual Fine-Tuning for Financial QA@FinCausal 2026

Low-Agreeableness Persona Conditioning for Safe LLM Fine-Tuning

HPRO: Hierarchical Progressive Reward Optimization via Preference Extraction for Emotional Text-to-Speech

Continual Memorization of Factoids in Language Models

MLX Fine-Tune Example Guide

A Blind Visual Paradigm for Testing Skill Transfer in Small Models Without Fine-Tuning

I built a tool to turn your Claude Code sessions into fine-tuning data for local models

Anyone still doing fine-tunes on consumer grade hardware?

Are there any qwen finetunes that were genuinely stronger than the base?

How Post-Training Shapes Biological Reasoning Models

SSM Adapters via Hankel Reduced-order Modeling: Injection Site Determines Task Suitability in Long-Context Fine-Tuning

At the Edge of Understanding: Sparse Autoencoders Trace The Limits of Transformer Generalization

Localizing RL-Induced Tool Use to a Single Crosscoder Feature

Reasoning Quality Emerges Early: Data Curation for Reasoning Models

Designing Reward Signals for Portable Query Generation: A Case Study in Industrial Semantic Job Search

Helpfulness Hurts: Domain-Dependent Degradation of Mid-Trained Compassion Values Under Post-Training

Closing the Quality Gap in Low-Resource Text-to-Speech: LoRA Fine-Tuning of VoxCPM2 for Khmer and Korean

Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization

When you don't have a data center GPU

Run a vLLM Server on HF Jobs in One Command

Qwen 3.6 27b GLM 5.2 fine-tune?

DGX Spark OS lifetime?

[R] Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost

Retrieval-Augmented Personalization with Foundation Models for Wearable Stress Detection

The Geometry of Sequential Learning: Lie-Bracket Prediction of Transfer Order

Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients

Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning

Does Translation-Enhanced Speech Encoder Pre-training Affect Speech LLMs?

Scale or Reason? A Compute-Equivalent Analysis of Reasoning Distillation

Gemma4-26B-A4B & 31B-QAT Uncensored Balanced are out with MTP (35% & 53% speed boost)!

I made a superhuman Generals.io agent with self-play RL [P]

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

Weight-Space Geometry of Offline Reasoning Training