Tag

Training

422 articles archived under #training · RSS

NVIDIA Developer Blog official-blog 21d ago

Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell

Pre-training frontier LLMs comes down to throughput. When training spans trillions of tokens across thousands of accelerators, every percentage point of step...

34
r/LocalLLaMA community 21d ago

Nex N2 has a funny "few words do trick" reasoning

I've been playing with Nex N2 Pro (Qwen 3.5 397B finetune) locally today. I noticed straight away that it has a pattern of reasoning that is distinct and uses simple words like "need" and "maybe" a lot. Here's a sample of reasoning. We need answer user asks "what is the theory…

16
Hugging Face Daily Papers research 21d ago

LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

Abstract LayerRoute is a lightweight adapter that selectively skips transformer blocks during inference based on input type, achieving compute savings while maintaining or improving model quality through gated routing and LoRA adaptation. Generated by…

19
arXiv — Machine Learning research 22d ago

The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning

arXiv:2606.06920v1 Announce Type: new Abstract: Deploying Small Language Models (SLMs) on edge devices requires efficient fine-tuning strategies that adapt models to new tasks without degrading their general capabilities. In this study, we benchmark five sub-1B models (135M-1B)…

17
arXiv — NLP / Computation & Language research 22d ago

RASFT: Rollout-Adaptive Supervised Fine-Tuning for Reasoning

arXiv:2606.07006v1 Announce Type: cross Abstract: Supervised fine-tuning (SFT) is a prevailing method for adapting large language models to reasoning tasks by imitating offline expert demonstrations, often treating a single expert trajectory as the target behavior. However,…

15
arXiv — NLP / Computation & Language research 22d ago

What Do People Actually Want From AI? Mapping Preference Plurality

arXiv:2606.06674v1 Announce Type: new Abstract: Large Language Models (LLMs) are often fine-tuned through Reinforcement Learning from Human Feedback (RLHF) to align with people's preferences and values. However, this method has known limitations: it aggregates conflicting…

29
arXiv — NLP / Computation & Language research 22d ago

Translate-R1: Cost-Aware Translation Tool Use via Reinforcement Learning

arXiv:2606.06835v1 Announce Type: new Abstract: The performance gap across languages in LLMs is well documented, and closing it natively requires pretraining or fine-tuning on corpora that, for most languages, do not exist. Translation offers an alternative: converting an input…

16
arXiv — NLP / Computation & Language research 22d ago

LLM-Guided Evolution for Medical Decision Pipelines

arXiv:2606.07342v1 Announce Type: new Abstract: Adapting large language models (LLMs) to clinical workflows often requires costly fine-tuning or manual prompt and pipeline engineering. We study LLM-guided MAP-Elites evolution as an inference-time alternative for discovering…

9
r/LocalLLaMA community 24d ago

Github Copilot finally supporting custom endpoints

https://preview.redd.it/082gnmin1l5h1.png?width=1740&format=png&auto=webp&s=2c89f6310c8c654611188183de07857d77cb2417 https://preview.redd.it/169tjrzn1l5h1.png?width=710&format=png&auto=webp&s=9a1fa656ea95037622b0d7ea2e16a23d2122442c I just noticed   submitted by  …

19
Hugging Face Daily Papers research 24d ago

Trust Region Q Adjoint Matching

Abstract Trust Region Q-Adjoint Matching (TRQAM) addresses instability in off-policy reinforcement learning by adaptively controlling path-space KL divergence through projected dual descent, enabling stable fine-tuning of pretrained flow policies. Generated by…

19
Hugging Face Daily Papers research 24d ago

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Abstract Code2LoRA is a hypernetwork framework that generates repository-specific LoRA adapters for code language models, supporting both static and evolving codebases with efficient parameter-efficient fine-tuning. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Code language…

16
arXiv — Machine Learning research 25d ago

Dominant-Layer ZO: A Single Layer Dominates Zeroth-Order Fine-Tuning of LLMs

arXiv:2606.05516v1 Announce Type: new Abstract: Zeroth-order (ZO) optimization enables memory-efficient fine-tuning of large language models (LLMs) using only forward passes, but it remains unclear how useful adaptation is distributed across layers. In this work, we reveal a…

10
arXiv — Machine Learning research 25d ago

Domain-Adapted Small Language Models with Hybrid Post-Processing: Achieving Cost-Efficient, Low-Latency Multi-Label Structured Prediction via LoRA Fine-Tuning on Scarce Data

arXiv:2606.05781v1 Announce Type: new Abstract: Deploying frontier large language models (LLMs) for domain-specific structured evaluation tasks often incurs substantial latency, cost, and data privacy overhead. We present a hybrid framework that combines a fine-tuned small…

34
arXiv — Machine Learning research 25d ago

High-Dimensional Theory of LoRA Fine-Tuning in a Solvable Attention Model

arXiv:2606.05899v1 Announce Type: new Abstract: We develop a high-dimensional statistical theory of low-rank adaptation (LoRA) in attention models, capturing the interplay between pre-training and fine-tuning. We introduce a solvable framework in which a single-head attention…

32
arXiv — Machine Learning research 25d ago

Steering Vectors are an Adversarial Attack Surface

arXiv:2606.05958v1 Announce Type: new Abstract: Activation steering has become a popular way to control Large Language Model (LLM) behavior without fine-tuning. Since the technique is plug-and-play, users share datasets and precomputed vectors to steer model activations.…

25
arXiv — NLP / Computation & Language research 25d ago

Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning

arXiv:2606.05173v1 Announce Type: new Abstract: Masked language modelling (MLM) has been the dominant pre-training objective for text encoders since BERT, yet it encourages representations that are strongly anchored to surface-form token identity rather than deeper semantic…

22
arXiv — NLP / Computation & Language research 25d ago

A Model of Multi-turn Human Persuadability Using Probabilistic Belief Tracing

arXiv:2606.05330v1 Announce Type: new Abstract: Large language models can shift human beliefs across high-stakes domains, but most persuasion studies rely on pre/post belief change. These endpoint measures identify whether persuasion occurred, yet miss where and how beliefs…

24
arXiv — NLP / Computation & Language research 25d ago

Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training

arXiv:2606.05610v1 Announce Type: new Abstract: The efficacy of continued pre-training for Large Language Models (LLMs) hinges upon hyperparameter configurations, such as learning rate and batch size. However, current practices often rely on heuristics or grid searches, leading…

8
Hugging Face Daily Papers research 25d ago

Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning

Abstract Stable-Layers uses reinforcement learning with vision-language model feedback to improve layer decomposition without paired data, employing Flow-GRPO and LoRA adaptation for optimized policy training. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present…

38
arXiv — Machine Learning research 26d ago

EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms

arXiv:2606.04145v1 Announce Type: new Abstract: Cloud LLM fine-tuning platforms increasingly serve RLHF workloads, where a learned reward model is optimized as a proxy for human quality. As Gao et al. (2023) showed, this proxy diverges from world feedback (downstream eval…

24
arXiv — Machine Learning research 26d ago

ADAPTOOD: Uncertainty-Aware Fine-Tuning for Out-of-Distribution ECG Time Series Models

arXiv:2606.04164v1 Announce Type: new Abstract: Data samples used for training often differ from those encountered during fine-tuning and deployment, and while ML models show promise, their performance remains limited when only small annotated datasets are available. Performance…

10
arXiv — Machine Learning research 26d ago

When Autoregressive Consistency Hurts Safety Alignment

arXiv:2606.04168v1 Announce Type: new Abstract: Safety alignment in large language models (LLMs) is fragile in part because it is often shallow: fine-tuning mainly reshapes the model's behavior near the first few output tokens. We argue that this phenomenon can be understood…

21
arXiv — Machine Learning research 26d ago

RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training

arXiv:2606.04272v1 Announce Type: new Abstract: The standard LLM training pipeline applies reinforcement learning (RL) only after pre-training and supervised fine-tuning (SFT). We question this status quo by training a LLM from scratch and applying RL, SFT, and SFT followed by…

28
arXiv — NLP / Computation & Language research 26d ago

Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling

arXiv:2606.04284v1 Announce Type: cross Abstract: Preference modeling plays a central role in reinforcement learning from human feedback (RLHF), enabling large language models (LLMs) to align with human values. However, most existing approaches assume a universal reward…

28
arXiv — Machine Learning research 26d ago

OpenRFM: Dissecting Relational In-Context Learning

arXiv:2606.04320v1 Announce Type: new Abstract: Relational Foundation Models (RFMs) promise a single pre-trained predictor that, given any relational database, returns predictions in one forward pass via relational in-context learning (ICL). Yet a substantial gap separates open…

19
arXiv — Machine Learning research 26d ago

(Mis)generalization of Helpful-only Fine-tuning

arXiv:2606.04413v1 Announce Type: new Abstract: Helpful-only models, that is, models that are trained to always follow user intent, are valuable for dangerous capability evaluations and other areas of AI R&D where refusals would be an obstacle. Little is known about the…

34
arXiv — NLP / Computation & Language research 26d ago

Long Live Fine-Tuning: Task-Specific Transformers Outperform Zero-Shot LLMs for Misinformation Response Classification on Reddit

arXiv:2606.04274v1 Announce Type: new Abstract: As large language models (LLMs) become default tools for online information verification, an implicit assumption follows them: that scale and general capability are sufficient for nuanced classification of misinformation discourse.…

30
arXiv — NLP / Computation & Language research 26d ago

Parameter-Efficient Fine-Tuning with Learnable Rank

arXiv:2606.04325v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) is a popular parameter-efficient fine-tuning (PEFT) method that restricts weight updates to low-rank adapters, introducing a fixed low-rank inductive bias by optimizing in a low-dimensional subspace. In…

16
arXiv — NLP / Computation & Language research 26d ago

StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

arXiv:2606.04246v1 Announce Type: cross Abstract: Automatic generation of RTL code for digital hardware designs remains challenging due to long-horizon reasoning, multi-step dependencies, and strict correctness constraints in Verilog and VHDL. We present StepPRM-RTL, a novel…

8
r/LocalLLaMA community 26d ago

The first Gemma 4 12B finetunes are ready

Now you can start building your Gemma 4 12B collection :) https://huggingface.co/igorls/gemma-4-12B-it-heretic-GGUF https://huggingface.co/ReadyArt/Melody1437-12B-v0.4-GGUF https://huggingface.co/DuoNeural/Gemma4-12B-IT-Abliterated-GGUF…

26
r/LocalLLaMA community 26d ago

gemma-4-12b-it vs Qwen3.5-9B on shared benchmarks: Qwen is overall winner beating gemma in 5/8 benchmarks despite a smaller footprint

I don't really understand the gemma hype. Qwen outperforms gemma gb for gb, and kv cache is lighter. Sure gemma-4-12b-it might be a slight better coder than Qwen3.5-9b, but you could also just use omnicoder-9b (Qwen3.5-9b finetune for coding). Note: Benchmark results come from…

19
r/LocalLLaMA community 26d ago

google/gemma-4-12B · Hugging Face

Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on E2B, E4B, and 12B) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned…

29
Hugging Face Daily Papers research 26d ago

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Abstract Humanoid-GPT is a GPT-style Transformer with causal attention trained on a billion-scale motion corpus that achieves zero-shot generalization to unseen motions and control tasks through scalable pre-training on diverse motion data. Generated by…

29
r/LocalLLaMA community 27d ago

Holo3.1 35B/9B/4B/0.8B (Qwen 3.5 finetunes)

from Hcompany (which seems to be a French company): Holo3.1: Fast & Local Computer Use Agents Model Description Holo3.1 is our latest family of Vision-Language Models (VLMs) for computer use agents. Building on Holo3, it expands support beyond browser and desktop automation to…

25
Hugging Face Daily Papers research 27d ago

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

Abstract Answer-correct long chain-of-thought traces can lead to different fine-tuning outcomes, with post-conclusion continuations identified as harmful to training, characterized by uncertainty-geometry mismatches and addressed through a lightweight boundary proxy method.…

26
arXiv — Machine Learning research 27d ago

Pruning Deep Neural Networks via the Marchenko--Pastur Distribution

arXiv:2606.02608v1 Announce Type: new Abstract: We study a Marchenko--Pastur (MP) random-matrix approach to pruning deep neural networks with very small post-pruning fine-tuning budgets. The main practical contribution is accuracy retention under short calibration and…

34
arXiv — Machine Learning research 27d ago

GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning

arXiv:2606.02857v1 Announce Type: new Abstract: Zeroth-order (ZO) optimization is a memory-efficient alternative to backpropagation for fine-tuning large language models, but its deployment is limited by the high variance of gradient estimation. We propose GRZO, a Group-Relative…

22
arXiv — Machine Learning research 27d ago

BYORn: Bootstrap Your Own Responses to Defend Large Vision-Language Models Against Backdoor Attacks

arXiv:2606.02947v1 Announce Type: new Abstract: Supervised fine-tuning is the predominant approach for adapting autoregressive vision-language models to downstream tasks. Recent work has shown that this paradigm is highly vulnerable to backdoor attacks, and that existing…

17
arXiv — Machine Learning research 27d ago

CoughSense: Five-Class Respiratory Disease Classification via Whisper Encoder Fine-Tuning and Dual-Encoder Cross-Attention Fusion with Balanced Contrastive Learning

arXiv:2606.02998v1 Announce Type: new Abstract: Automated cough analysis offers a path to low-cost respiratory screening, but most existing work stops at binary COVID-19 detection. A practical tool needs to tell apart several respiratory conditions from one cough recording on a…

4
arXiv — Machine Learning research 27d ago

DECA: Decentralizing Block-Wise Adam for Efficient LLM Full-Parameter Fine-Tuning on Non-IID Data

arXiv:2606.03209v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) in privacy-sensitive and resource-constrained environments remains challenging. Since training data are often distributed across multiple clients, decentralized fine-tuning offers a natural…

15
arXiv — Machine Learning research 27d ago

When RLHF Fails: A Mechanistic Taxonomy of Reward Hacking, Collapse, and Evaluator Gaming

arXiv:2606.03238v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) makes large-scale post-training possible by replacing an underspecified human objective with learned and scalable proxies. The same substitution creates a structured failure…

12
arXiv — Machine Learning research 27d ago

Message Tuning Outshines Graph Prompt Tuning: A Prismatic Space Perspective

arXiv:2606.03290v1 Announce Type: new Abstract: Graph Foundation Models (GFMs), built upon the Pre-training and Adaptation paradigm, have emerged as a research hotspot in graph learning. For GNN-based GFMs, graph prompt tuning has become the prevailing adaptation method for…

4
arXiv — NLP / Computation & Language research 27d ago

Regret Pre-training: Bridging Prior and Posterior Views for Enhanced Knowledge Grounding

arXiv:2606.03080v1 Announce Type: new Abstract: Causal language models factorize sequence probabilities using only preceding context, leaving future information unexploited during training despite its availability in the training data. This paper introduces Regret Pre-training,…

31
arXiv — NLP / Computation & Language research 27d ago

The Word and the Way: Strategies for Domain-Specific BERT Pre-Training in German Medical NLP

arXiv:2606.03250v1 Announce Type: new Abstract: Digital healthcare generates vast amounts of clinical text that can support AI-assisted applications, yet German biomedical language models remain limited by older architectures or restricted training data. We present ChristBERT…

33
arXiv — NLP / Computation & Language research 27d ago

From Script to Semantics: Prompting Strategies for African NLI

arXiv:2606.03304v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly evaluated in multilingual settings, yet their inference behavior in low-resource African languages remains underexplored especially under pure prompting without fine-tuning. We present…

38
arXiv — NLP / Computation & Language research 27d ago

Large Language Models Are Overconfident in Their Own Responses

arXiv:2606.03437v1 Announce Type: new Abstract: Prior work has shown that instruction-tuned large language models (LLMs) are less well calibrated than their base pre-trained counterparts. However, little is known about the frequently used chat template's effect on the…

10
arXiv — NLP / Computation & Language research 27d ago

AutoTail-BSFGM: Class-Balance-Aware Fine-Tuning for Chinese Scholarly Text Classification

arXiv:2606.03576v1 Announce Type: new Abstract: Scholarly text classification supports literature organization, subject indexing, and research intelligence, but Chinese scholarly corpora often contain imbalanced and semantically adjacent disciplinary labels. We propose…

5
arXiv — NLP / Computation & Language research 27d ago

Safety Measurements for Fine-tuned LLMs Should be Grounded in Capability

arXiv:2606.03648v1 Announce Type: new Abstract: Adapting foundation large language models to a user's task or preferred style through fine-tuning can result in compromising the model's safety. Previous works examined the effects of fine-tuning on model safety in limited and…

32
r/LocalLLaMA community 27d ago

Remember around 2023-2024 when we did partys (wizardlm, nous capybara and dolphin) and finetunes?

Yes, I remember it. It was peak. Now those models get outpeformed by 2026-era models. I want to revive this era I miss it so bad 😞   submitted by   /u/Ok-Type-7663 [link]   [comments]

29
llama.cpp releases dev-tools 28d ago

b9468

server: real-time reasoning interruption via control endpoint ( #23971 ) server: real-time reasoning interruption via control endpoint Builds on the manual reasoning budget trigger from #23949 . Adds a CONTROL task that mirrors the CANCEL path on the live slot and calls…

17

Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell

Nex N2 has a funny "few words do trick" reasoning

LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning

RASFT: Rollout-Adaptive Supervised Fine-Tuning for Reasoning

What Do People Actually Want From AI? Mapping Preference Plurality

Translate-R1: Cost-Aware Translation Tool Use via Reinforcement Learning

LLM-Guided Evolution for Medical Decision Pipelines

Github Copilot finally supporting custom endpoints

Trust Region Q Adjoint Matching

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Dominant-Layer ZO: A Single Layer Dominates Zeroth-Order Fine-Tuning of LLMs

Domain-Adapted Small Language Models with Hybrid Post-Processing: Achieving Cost-Efficient, Low-Latency Multi-Label Structured Prediction via LoRA Fine-Tuning on Scarce Data

High-Dimensional Theory of LoRA Fine-Tuning in a Solvable Attention Model

Steering Vectors are an Adversarial Attack Surface

Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning

A Model of Multi-turn Human Persuadability Using Probabilistic Belief Tracing

Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training

Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning

EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms

ADAPTOOD: Uncertainty-Aware Fine-Tuning for Out-of-Distribution ECG Time Series Models

When Autoregressive Consistency Hurts Safety Alignment

RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training

Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling

OpenRFM: Dissecting Relational In-Context Learning

(Mis)generalization of Helpful-only Fine-tuning

Long Live Fine-Tuning: Task-Specific Transformers Outperform Zero-Shot LLMs for Misinformation Response Classification on Reddit

Parameter-Efficient Fine-Tuning with Learnable Rank

StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

The first Gemma 4 12B finetunes are ready

gemma-4-12b-it vs Qwen3.5-9B on shared benchmarks: Qwen is overall winner beating gemma in 5/8 benchmarks despite a smaller footprint

google/gemma-4-12B · Hugging Face

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Holo3.1 35B/9B/4B/0.8B (Qwen 3.5 finetunes)

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

Pruning Deep Neural Networks via the Marchenko--Pastur Distribution

GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning

BYORn: Bootstrap Your Own Responses to Defend Large Vision-Language Models Against Backdoor Attacks

CoughSense: Five-Class Respiratory Disease Classification via Whisper Encoder Fine-Tuning and Dual-Encoder Cross-Attention Fusion with Balanced Contrastive Learning

DECA: Decentralizing Block-Wise Adam for Efficient LLM Full-Parameter Fine-Tuning on Non-IID Data

When RLHF Fails: A Mechanistic Taxonomy of Reward Hacking, Collapse, and Evaluator Gaming

Message Tuning Outshines Graph Prompt Tuning: A Prismatic Space Perspective

Regret Pre-training: Bridging Prior and Posterior Views for Enhanced Knowledge Grounding

The Word and the Way: Strategies for Domain-Specific BERT Pre-Training in German Medical NLP

From Script to Semantics: Prompting Strategies for African NLI

Large Language Models Are Overconfident in Their Own Responses

AutoTail-BSFGM: Class-Balance-Aware Fine-Tuning for Chinese Scholarly Text Classification

Safety Measurements for Fine-tuned LLMs Should be Grounded in Capability

Remember around 2023-2024 when we did partys (wizardlm, nous capybara and dolphin) and finetunes?

b9468