News / #training Tag Training 422 articles archived under #training · RSS Sign in to follow NVIDIA Developer Blog official-blog 21d ago Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell Pre-training frontier LLMs comes down to throughput. When training spans trillions of tokens across thousands of accelerators, every percentage point of step... 34 r/LocalLLaMA community 21d ago Nex N2 has a funny "few words do trick" reasoning I've been playing with Nex N2 Pro (Qwen 3.5 397B finetune) locally today. I noticed straight away that it has a pattern of reasoning that is distinct and uses simple words like "need" and "maybe" a lot. Here's a sample of reasoning. We need answer user asks "what is the theory… 16 Hugging Face Daily Papers research 21d ago LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models Abstract LayerRoute is a lightweight adapter that selectively skips transformer blocks during inference based on input type, achieving compute savings while maintaining or improving model quality through gated routing and LoRA adaptation. Generated by… 19 arXiv — Machine Learning research 22d ago The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning arXiv:2606.06920v1 Announce Type: new Abstract: Deploying Small Language Models (SLMs) on edge devices requires efficient fine-tuning strategies that adapt models to new tasks without degrading their general capabilities. In this study, we benchmark five sub-1B models (135M-1B)… 17 arXiv — NLP / Computation & Language research 22d ago RASFT: Rollout-Adaptive Supervised Fine-Tuning for Reasoning arXiv:2606.07006v1 Announce Type: cross Abstract: Supervised fine-tuning (SFT) is a prevailing method for adapting large language models to reasoning tasks by imitating offline expert demonstrations, often treating a single expert trajectory as the target behavior. However,… 15 arXiv — NLP / Computation & Language research 22d ago What Do People Actually Want From AI? Mapping Preference Plurality arXiv:2606.06674v1 Announce Type: new Abstract: Large Language Models (LLMs) are often fine-tuned through Reinforcement Learning from Human Feedback (RLHF) to align with people's preferences and values. However, this method has known limitations: it aggregates conflicting… 29 arXiv — NLP / Computation & Language research 22d ago Translate-R1: Cost-Aware Translation Tool Use via Reinforcement Learning arXiv:2606.06835v1 Announce Type: new Abstract: The performance gap across languages in LLMs is well documented, and closing it natively requires pretraining or fine-tuning on corpora that, for most languages, do not exist. Translation offers an alternative: converting an input… 16 arXiv — NLP / Computation & Language research 22d ago LLM-Guided Evolution for Medical Decision Pipelines arXiv:2606.07342v1 Announce Type: new Abstract: Adapting large language models (LLMs) to clinical workflows often requires costly fine-tuning or manual prompt and pipeline engineering. We study LLM-guided MAP-Elites evolution as an inference-time alternative for discovering… 9 r/LocalLLaMA community 24d ago Github Copilot finally supporting custom endpoints https://preview.redd.it/082gnmin1l5h1.png?width=1740&format=png&auto=webp&s=2c89f6310c8c654611188183de07857d77cb2417 https://preview.redd.it/169tjrzn1l5h1.png?width=710&format=png&auto=webp&s=9a1fa656ea95037622b0d7ea2e16a23d2122442c I just noticed   submitted by  … 19 Hugging Face Daily Papers research 24d ago Trust Region Q Adjoint Matching Abstract Trust Region Q-Adjoint Matching (TRQAM) addresses instability in off-policy reinforcement learning by adaptively controlling path-space KL divergence through projected dual descent, enabling stable fine-tuning of pretrained flow policies. Generated by… 19 Hugging Face Daily Papers research 24d ago Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution Abstract Code2LoRA is a hypernetwork framework that generates repository-specific LoRA adapters for code language models, supporting both static and evolving codebases with efficient parameter-efficient fine-tuning. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Code language… 16 arXiv — Machine Learning research 25d ago Dominant-Layer ZO: A Single Layer Dominates Zeroth-Order Fine-Tuning of LLMs arXiv:2606.05516v1 Announce Type: new Abstract: Zeroth-order (ZO) optimization enables memory-efficient fine-tuning of large language models (LLMs) using only forward passes, but it remains unclear how useful adaptation is distributed across layers. In this work, we reveal a… 10 arXiv — Machine Learning research 25d ago Domain-Adapted Small Language Models with Hybrid Post-Processing: Achieving Cost-Efficient, Low-Latency Multi-Label Structured Prediction via LoRA Fine-Tuning on Scarce Data arXiv:2606.05781v1 Announce Type: new Abstract: Deploying frontier large language models (LLMs) for domain-specific structured evaluation tasks often incurs substantial latency, cost, and data privacy overhead. We present a hybrid framework that combines a fine-tuned small… 34 arXiv — Machine Learning research 25d ago High-Dimensional Theory of LoRA Fine-Tuning in a Solvable Attention Model arXiv:2606.05899v1 Announce Type: new Abstract: We develop a high-dimensional statistical theory of low-rank adaptation (LoRA) in attention models, capturing the interplay between pre-training and fine-tuning. We introduce a solvable framework in which a single-head attention… 32 arXiv — Machine Learning research 25d ago Steering Vectors are an Adversarial Attack Surface arXiv:2606.05958v1 Announce Type: new Abstract: Activation steering has become a popular way to control Large Language Model (LLM) behavior without fine-tuning. Since the technique is plug-and-play, users share datasets and precomputed vectors to steer model activations.… 25 arXiv — NLP / Computation & Language research 25d ago Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning arXiv:2606.05173v1 Announce Type: new Abstract: Masked language modelling (MLM) has been the dominant pre-training objective for text encoders since BERT, yet it encourages representations that are strongly anchored to surface-form token identity rather than deeper semantic… 22 arXiv — NLP / Computation & Language research 25d ago A Model of Multi-turn Human Persuadability Using Probabilistic Belief Tracing arXiv:2606.05330v1 Announce Type: new Abstract: Large language models can shift human beliefs across high-stakes domains, but most persuasion studies rely on pre/post belief change. These endpoint measures identify whether persuasion occurred, yet miss where and how beliefs… 24 arXiv — NLP / Computation & Language research 25d ago Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training arXiv:2606.05610v1 Announce Type: new Abstract: The efficacy of continued pre-training for Large Language Models (LLMs) hinges upon hyperparameter configurations, such as learning rate and batch size. However, current practices often rely on heuristics or grid searches, leading… 8 Hugging Face Daily Papers research 25d ago Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning Abstract Stable-Layers uses reinforcement learning with vision-language model feedback to improve layer decomposition without paired data, employing Flow-GRPO and LoRA adaptation for optimized policy training. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present… 38 arXiv — Machine Learning research 26d ago EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms arXiv:2606.04145v1 Announce Type: new Abstract: Cloud LLM fine-tuning platforms increasingly serve RLHF workloads, where a learned reward model is optimized as a proxy for human quality. As Gao et al. (2023) showed, this proxy diverges from world feedback (downstream eval… 24 arXiv — Machine Learning research 26d ago ADAPTOOD: Uncertainty-Aware Fine-Tuning for Out-of-Distribution ECG Time Series Models arXiv:2606.04164v1 Announce Type: new Abstract: Data samples used for training often differ from those encountered during fine-tuning and deployment, and while ML models show promise, their performance remains limited when only small annotated datasets are available. Performance… 10 arXiv — Machine Learning research 26d ago When Autoregressive Consistency Hurts Safety Alignment arXiv:2606.04168v1 Announce Type: new Abstract: Safety alignment in large language models (LLMs) is fragile in part because it is often shallow: fine-tuning mainly reshapes the model's behavior near the first few output tokens. We argue that this phenomenon can be understood… 21 arXiv — Machine Learning research 26d ago RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training arXiv:2606.04272v1 Announce Type: new Abstract: The standard LLM training pipeline applies reinforcement learning (RL) only after pre-training and supervised fine-tuning (SFT). We question this status quo by training a LLM from scratch and applying RL, SFT, and SFT followed by… 28 arXiv — NLP / Computation & Language research 26d ago Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling arXiv:2606.04284v1 Announce Type: cross Abstract: Preference modeling plays a central role in reinforcement learning from human feedback (RLHF), enabling large language models (LLMs) to align with human values. However, most existing approaches assume a universal reward… 28 arXiv — Machine Learning research 26d ago OpenRFM: Dissecting Relational In-Context Learning arXiv:2606.04320v1 Announce Type: new Abstract: Relational Foundation Models (RFMs) promise a single pre-trained predictor that, given any relational database, returns predictions in one forward pass via relational in-context learning (ICL). Yet a substantial gap separates open… 19 arXiv — Machine Learning research 26d ago (Mis)generalization of Helpful-only Fine-tuning arXiv:2606.04413v1 Announce Type: new Abstract: Helpful-only models, that is, models that are trained to always follow user intent, are valuable for dangerous capability evaluations and other areas of AI R&D where refusals would be an obstacle. Little is known about the… 34 arXiv — NLP / Computation & Language research 26d ago Long Live Fine-Tuning: Task-Specific Transformers Outperform Zero-Shot LLMs for Misinformation Response Classification on Reddit arXiv:2606.04274v1 Announce Type: new Abstract: As large language models (LLMs) become default tools for online information verification, an implicit assumption follows them: that scale and general capability are sufficient for nuanced classification of misinformation discourse.… 30 arXiv — NLP / Computation & Language research 26d ago Parameter-Efficient Fine-Tuning with Learnable Rank arXiv:2606.04325v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) is a popular parameter-efficient fine-tuning (PEFT) method that restricts weight updates to low-rank adapters, introducing a fixed low-rank inductive bias by optimizing in a low-dimensional subspace. In… 16 arXiv — NLP / Computation & Language research 26d ago StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis arXiv:2606.04246v1 Announce Type: cross Abstract: Automatic generation of RTL code for digital hardware designs remains challenging due to long-horizon reasoning, multi-step dependencies, and strict correctness constraints in Verilog and VHDL. We present StepPRM-RTL, a novel… 8 r/LocalLLaMA community 26d ago The first Gemma 4 12B finetunes are ready Now you can start building your Gemma 4 12B collection :) https://huggingface.co/igorls/gemma-4-12B-it-heretic-GGUF https://huggingface.co/ReadyArt/Melody1437-12B-v0.4-GGUF https://huggingface.co/DuoNeural/Gemma4-12B-IT-Abliterated-GGUF… 26 r/LocalLLaMA community 26d ago gemma-4-12b-it vs Qwen3.5-9B on shared benchmarks: Qwen is overall winner beating gemma in 5/8 benchmarks despite a smaller footprint I don't really understand the gemma hype. Qwen outperforms gemma gb for gb, and kv cache is lighter. Sure gemma-4-12b-it might be a slight better coder than Qwen3.5-9b, but you could also just use omnicoder-9b (Qwen3.5-9b finetune for coding). Note: Benchmark results come from… 19 r/LocalLLaMA community 26d ago google/gemma-4-12B · Hugging Face Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on E2B, E4B, and 12B) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned… 29 Hugging Face Daily Papers research 26d ago Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking Abstract Humanoid-GPT is a GPT-style Transformer with causal attention trained on a billion-scale motion corpus that achieves zero-shot generalization to unseen motions and control tasks through scalable pre-training on diverse motion data. Generated by… 29 r/LocalLLaMA community 27d ago Holo3.1 35B/9B/4B/0.8B (Qwen 3.5 finetunes) from Hcompany (which seems to be a French company): Holo3.1: Fast & Local Computer Use Agents Model Description Holo3.1 is our latest family of Vision-Language Models (VLMs) for computer use agents. Building on Holo3, it expands support beyond browser and desktop automation to… 25 Hugging Face Daily Papers research 27d ago Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces Abstract Answer-correct long chain-of-thought traces can lead to different fine-tuning outcomes, with post-conclusion continuations identified as harmful to training, characterized by uncertainty-geometry mismatches and addressed through a lightweight boundary proxy method.… 26 arXiv — Machine Learning research 27d ago Pruning Deep Neural Networks via the Marchenko--Pastur Distribution arXiv:2606.02608v1 Announce Type: new Abstract: We study a Marchenko--Pastur (MP) random-matrix approach to pruning deep neural networks with very small post-pruning fine-tuning budgets. The main practical contribution is accuracy retention under short calibration and… 34 arXiv — Machine Learning research 27d ago GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning arXiv:2606.02857v1 Announce Type: new Abstract: Zeroth-order (ZO) optimization is a memory-efficient alternative to backpropagation for fine-tuning large language models, but its deployment is limited by the high variance of gradient estimation. We propose GRZO, a Group-Relative… 22 arXiv — Machine Learning research 27d ago BYORn: Bootstrap Your Own Responses to Defend Large Vision-Language Models Against Backdoor Attacks arXiv:2606.02947v1 Announce Type: new Abstract: Supervised fine-tuning is the predominant approach for adapting autoregressive vision-language models to downstream tasks. Recent work has shown that this paradigm is highly vulnerable to backdoor attacks, and that existing… 17 arXiv — Machine Learning research 27d ago CoughSense: Five-Class Respiratory Disease Classification via Whisper Encoder Fine-Tuning and Dual-Encoder Cross-Attention Fusion with Balanced Contrastive Learning arXiv:2606.02998v1 Announce Type: new Abstract: Automated cough analysis offers a path to low-cost respiratory screening, but most existing work stops at binary COVID-19 detection. A practical tool needs to tell apart several respiratory conditions from one cough recording on a… 4 arXiv — Machine Learning research 27d ago DECA: Decentralizing Block-Wise Adam for Efficient LLM Full-Parameter Fine-Tuning on Non-IID Data arXiv:2606.03209v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) in privacy-sensitive and resource-constrained environments remains challenging. Since training data are often distributed across multiple clients, decentralized fine-tuning offers a natural… 15 arXiv — Machine Learning research 27d ago When RLHF Fails: A Mechanistic Taxonomy of Reward Hacking, Collapse, and Evaluator Gaming arXiv:2606.03238v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) makes large-scale post-training possible by replacing an underspecified human objective with learned and scalable proxies. The same substitution creates a structured failure… 12 arXiv — Machine Learning research 27d ago Message Tuning Outshines Graph Prompt Tuning: A Prismatic Space Perspective arXiv:2606.03290v1 Announce Type: new Abstract: Graph Foundation Models (GFMs), built upon the Pre-training and Adaptation paradigm, have emerged as a research hotspot in graph learning. For GNN-based GFMs, graph prompt tuning has become the prevailing adaptation method for… 4 arXiv — NLP / Computation & Language research 27d ago Regret Pre-training: Bridging Prior and Posterior Views for Enhanced Knowledge Grounding arXiv:2606.03080v1 Announce Type: new Abstract: Causal language models factorize sequence probabilities using only preceding context, leaving future information unexploited during training despite its availability in the training data. This paper introduces Regret Pre-training,… 31 arXiv — NLP / Computation & Language research 27d ago The Word and the Way: Strategies for Domain-Specific BERT Pre-Training in German Medical NLP arXiv:2606.03250v1 Announce Type: new Abstract: Digital healthcare generates vast amounts of clinical text that can support AI-assisted applications, yet German biomedical language models remain limited by older architectures or restricted training data. We present ChristBERT… 33 arXiv — NLP / Computation & Language research 27d ago From Script to Semantics: Prompting Strategies for African NLI arXiv:2606.03304v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly evaluated in multilingual settings, yet their inference behavior in low-resource African languages remains underexplored especially under pure prompting without fine-tuning. We present… 38 arXiv — NLP / Computation & Language research 27d ago Large Language Models Are Overconfident in Their Own Responses arXiv:2606.03437v1 Announce Type: new Abstract: Prior work has shown that instruction-tuned large language models (LLMs) are less well calibrated than their base pre-trained counterparts. However, little is known about the frequently used chat template's effect on the… 10 arXiv — NLP / Computation & Language research 27d ago AutoTail-BSFGM: Class-Balance-Aware Fine-Tuning for Chinese Scholarly Text Classification arXiv:2606.03576v1 Announce Type: new Abstract: Scholarly text classification supports literature organization, subject indexing, and research intelligence, but Chinese scholarly corpora often contain imbalanced and semantically adjacent disciplinary labels. We propose… 5 arXiv — NLP / Computation & Language research 27d ago Safety Measurements for Fine-tuned LLMs Should be Grounded in Capability arXiv:2606.03648v1 Announce Type: new Abstract: Adapting foundation large language models to a user's task or preferred style through fine-tuning can result in compromising the model's safety. Previous works examined the effects of fine-tuning on model safety in limited and… 32 r/LocalLLaMA community 27d ago Remember around 2023-2024 when we did partys (wizardlm, nous capybara and dolphin) and finetunes? Yes, I remember it. It was peak. Now those models get outpeformed by 2026-era models. I want to revive this era I miss it so bad 😞   submitted by   /u/Ok-Type-7663 [link]   [comments] 29 llama.cpp releases dev-tools 28d ago b9468 server: real-time reasoning interruption via control endpoint ( #23971 ) server: real-time reasoning interruption via control endpoint Builds on the manual reasoning budget trigger from #23949 . Adds a CONTROL task that mirrors the CANCEL path on the live slot and calls… 17 Page 4 of 9 · 422 articles ← Newer Older →