Tag

Training

422 articles archived under #training · RSS

arXiv — NLP / Computation & Language research 15d ago

Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback

arXiv:2606.14368v1 Announce Type: cross Abstract: We study multi-domain LLM training in which two models, each stronger in a different domain, co-evolve by tutoring each other through on-policy feedback. Unlike one-way distillation or single-model fine-tuning, our goal is mutual…

20
arXiv — NLP / Computation & Language research 15d ago

CoRe: A Continuously Reward-Finetuned LLM Query Rewriter for Multi-Stage Context-Aware Relevance in Web-Scale Video Search

arXiv:2606.14127v1 Announce Type: cross Abstract: LLM-based query rewriters in production face a tension: the training reward must reflect how the rewrite is consumed by the production ranker, yet the training procedure must be cheap enough to support continuous redeployment as…

17
r/LocalLLaMA community 16d ago

Dual r9700 ai pro for training llms?

I am a developer and need high vram machine to finetune llms, how has your experience been with finetuning/training on multi gpu on 2x r700 amd ai pro gpus?   submitted by   /u/AppropriatePush6262 [link]   [comments]

13
r/LocalLLaMA community 17d ago

New model on huggingface

https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B A qwen finetune. Looks pretty even with qwen 3.7 plus, except it's actually open source. Disclosure: I work as a researcher for the city government of Rio de Janeiro, which developed this model.   submitted by  …

15
Hugging Face Daily Papers research 17d ago

A Stationary (and Therefore Compatible) Representation is All You Need

Abstract Stationary representations learned through d-Simplex fixed classifiers ensure model compatibility during sequential fine-tuning and updates, enabling continuous retrieval services without reprocessing. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Learning compatible…

25
Hugging Face Daily Papers research 17d ago

Revisiting Articulated Parts Perception in Robot Manipulation

Abstract A new geometric representation called Geometric Primary Structure (GPS) is introduced for articulated parts perception, enabling efficient data collection through VR annotation and achieving high manipulation success rates without fine-tuning. Generated by…

27
arXiv — NLP / Computation & Language research 18d ago

MentalMARBERT: Domain-Adaptive Pre-training and Two-Stage Fine-Tuning for Arabic Mental Health Disorders Detection

arXiv:2606.12649v1 Announce Type: new Abstract: Detecting mental health disorders from Arabic social media text remains challenging due to dialectal variation, informal language, limited high-quality annotated resources, and severe class imbalance. While English mental health…

25
arXiv — NLP / Computation & Language research 18d ago

Small LLMs for Biomedical Claim Verification: Cost-Effective Fine-Tuning, Structural Dataset Shortcuts, and Cross-Domain Generalization

arXiv:2606.12854v1 Announce Type: new Abstract: Large Language Models such as GPT-4o and GPT-5 achieve strong zero-shot performance on biomedical claim verification, but cost and opacity limit scalable use. We fine-tune three small LLMs: Phi-3-mini (3.8B), Qwen2.5-3B, and…

33
arXiv — NLP / Computation & Language research 18d ago

Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study

arXiv:2606.12881v1 Announce Type: new Abstract: We present an approach to fine-tuning large language models using Direct Preference Optimization (DPO), a reinforcement learning technique. Our experimental results demonstrate that DPO simplifies the training pipeline, improves…

24
arXiv — NLP / Computation & Language research 18d ago

PolyAlign: Conditional Human-Distribution Alignment

arXiv:2606.13227v1 Announce Type: new Abstract: Post-training methods such as supervised fine-tuning (SFT) and preference optimization typically align language models toward a single global assistant behavior. While effective for improving average helpfulness, this can suppress…

29
arXiv — NLP / Computation & Language research 18d ago

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

arXiv:2606.13680v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning…

11
arXiv — NLP / Computation & Language research 18d ago

Understanding helpfulness and harmless tension in reward models

arXiv:2606.13209v1 Announce Type: cross Abstract: Reward models are a key component of reinforcement learning from human feedback (RLHF), aligning language models toward both helpful and harmless behaviour. However, the internal mechanisms underlying these objectives and their…

12
Hugging Face Daily Papers research 18d ago

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Abstract Evoflux enables compact language models to execute tool workflows more reliably by using evolutionary search to repair failed plans during inference, significantly improving execution feasibility compared to traditional fine-tuning methods. Generated by…

20
Hugging Face Daily Papers research 18d ago

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

Abstract ART enables parameter-efficient fine-tuning of frozen multimodal language models by optimizing raw visual input through gradient backpropagation, achieving performance comparable to LoRA while supporting pre-compiled computational graphs. Generated by…

8
r/LocalLLaMA community 18d ago

Refiner: Robotics library from the ex-Hugging Face pre-training team

ex-Huggingface pre-training team just announce a new library create for robotics data refinment! It supports ingestion of all robotics formats (Parquet, HDF5, MCAP, Zarr, RLDS, and LeRobot), as well as the common processing flows like visual hand-tracking, subtask annotations…

26
r/LocalLLaMA community 18d ago

AMD R9700 vs GB10

I have a budget of 5K, and want to buy some gpus my requirement is 48gb+ vram, because I finetune small language model, perform DPO, in general tinkering/ development is my usecase. if you where in my shoe which among these would you get, on one hand amd is better bang for buck,…

4
arXiv — Machine Learning research 19d ago

Probabilistic Contrastive Pretraining for Multi-task ADME Property Prediction

arXiv:2606.11508v1 Announce Type: new Abstract: Accurate prediction of absorption, distribution, metabolism, and excretion (ADME) properties is critical to drug discovery, but remains challenging because ADME endpoints are noisy, interdependent, and often data-limited. We…

13
arXiv — Machine Learning research 19d ago

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

arXiv:2606.11854v1 Announce Type: new Abstract: There are two main Parameter-Efficient Fine-Tuning (PEFT) techniques for Large Language Models (LLMs). While Low-Rank Adaptation (LoRA) introduces additional weights between the LLM layers, Soft Prompting introduces additional…

18
arXiv — Machine Learning research 19d ago

Harness In-Context Operator Learning with Chain of Operators

arXiv:2606.12318v1 Announce Type: new Abstract: Neural operators approximate mappings between function spaces, but often generalize poorly to other operators and usually require fine-tuning or retraining. In-Context Operator Networks (ICON) addresses this issue by prompting the…

28
arXiv — NLP / Computation & Language research 19d ago

Compatibility-Aware Dynamic Fine-Tuning for Large Language Models

arXiv:2606.11206v1 Announce Type: new Abstract: Supervised Fine-Tuning (SFT) is the predominant paradigm for aligning large language models (LLMs), yet it suffers from optimization instability and limited generalization. Recent work attributes this issue to pathological gradient…

20
arXiv — NLP / Computation & Language research 19d ago

When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis

arXiv:2606.11375v1 Announce Type: new Abstract: Standard linear probing declares a property "encoded" when a classifier on hidden states achieves high accuracy. The protocol works well on a snapshot but breaks across pre-training: probe accuracy saturates within the first few…

17
arXiv — NLP / Computation & Language research 19d ago

Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay

arXiv:2606.11786v1 Announce Type: new Abstract: Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a…

37
arXiv — NLP / Computation & Language research 19d ago

Detecting Sensitive Personal Information in Japanese Pre-Training Corpora for Large Language Models

arXiv:2606.12114v1 Announce Type: new Abstract: Sensitive personal information can appear in large-scale pre-training corpora for large language models (LLMs). Detecting and filtering such information is therefore essential to ensure compliance with privacy regulations and…

34
arXiv — NLP / Computation & Language research 19d ago

ALIGNBEAM : Inference-Time Alignment Transfer via Cross-Vocabulary Logit Mixing

arXiv:2606.12342v1 Announce Type: new Abstract: Domain fine-tuning degrades the safety of large language models: fine-tuned specialists readily comply with harmful prompts framed in domain language. Existing inference-time defenses that mix logits from a safe anchor model…

18
Hugging Face Daily Papers research 19d ago

Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay

Abstract Continual Instruction Tuning enables effective fine-tuning of large language models for low-resource language translation, achieving superior performance compared to standard instruction tuning and multilingual models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large…

4
r/MachineLearning community 19d ago

Pyrecall open source tool for detecting catastrophic forgetting during LLM fine-tuning[P]

Surprised there's no real tooling for this given how much research exists on continual learning. Built pyrecall to fill the gap. Snapshots skill scores before/after fine-tuning, flags regressions, rolls back LoRA adapters by name. Fully local, no external APIs. v0.1.0, MIT, pip…

17
r/LocalLLaMA community 19d ago

SenseNova U1 dropped an infographic-specific finetune

it's the same U1-8B-MoT base with an extended MT (multi-task) training phase focused on structured visual output. the benchmark jumps are significant: IGenBench I-ACC (infographic accuracy) : 4.2👉17.0 (4x) Chart Understanding: 51.3👉69.5Text Rendering: 39.8👉46.6Overall…

32
Hugging Face Daily Papers research 20d ago

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Abstract Chain-of-thought supervised fine-tuning degrades long-context recall in hybrid linear-attention models by biasing attention gradients toward short-range patterns, but a training-free method called QK-Restore can restore long-context capabilities by reverting query-key…

8
arXiv — Machine Learning research 20d ago

Two to Tango: Coupled Task-Reference Selection for Safe LLM Fine-tuning

arXiv:2606.09866v1 Announce Type: new Abstract: Fine-tuning safety aligned large language models (LLMs) on downstream data improves adaptation but may erode learned safety behavior. Existing methods use fixed safety examples, global constraints, or one-sided task filtering. Our…

28
arXiv — Machine Learning research 20d ago

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

arXiv:2606.09932v1 Announce Type: new Abstract: Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has become a standard pipeline for Large Language Model (LLM) post-training. SFT is expected to provide a useful behavioral prior for RL to further enhance model…

30
arXiv — Machine Learning research 20d ago

A Unified Adaptive Feature Composition Framework for Multi-Task Generalization in Wireless Foundation Models

arXiv:2606.10277v1 Announce Type: new Abstract: Though wireless foundation models (WFMs) have shown strong potential in learning universal channel representations, their adaptation to various downstream tasks remains constrained by existing paradigms. Fine-tuning strategies…

19
arXiv — Machine Learning research 20d ago

Revisiting Positive Samples in Graph Contrastive Learning: From the Perspective of Message Passing

arXiv:2606.10284v1 Announce Type: new Abstract: Graph Contrastive Learning (GCL), which trains graph encoders by maximizing similarity between positive samples and minimizing it between negative ones, has emerged as a mainstream graph pre-training paradigm. It is widely…

16
arXiv — NLP / Computation & Language research 20d ago

CodeAlchemy: Synthetic Code Rewriting at Scale

arXiv:2606.10087v1 Announce Type: new Abstract: Pre-training on raw code teaches syntax but provides sparse signal for diverse real-world task formats. While synthetic data has proven transformative for language models, code remains largely unexplored beyond limited quality…

29
arXiv — NLP / Computation & Language research 20d ago

The Order Matters: Sequential Fine-Tuning of LLaMA for Coherent Automated Essay Scoring

arXiv:2606.10327v1 Announce Type: new Abstract: Automated Essay Scoring (AES) systems must judge interdependent discourse elements (e.g., lead, claim, evidence, conclusion), yet most approaches treat these in isolation, harming coherence and generalization. We investigate…

19
arXiv — NLP / Computation & Language research 20d ago

Hidden Consensus:Preference-Validity Compression in Human Feedback

arXiv:2606.10569v1 Announce Type: new Abstract: Standard RLHF pipelines often reduce heterogeneous human judgments into a single scalar reward target. We argue that this reduction can mis-measure alignment in structurally plural societies, where disagreement may reflect…

7
arXiv — NLP / Computation & Language research 20d ago

Small Data, Big Noise: Adversarial Training for Robust Parameter-Efficient Fine-Tuning

arXiv:2606.10610v1 Announce Type: new Abstract: Parameter-Efficient Fine-Tuning (PEFT) has become essential for adapting foundation models to downstream NLP tasks. However, current PEFT methods often struggle with robustness to noise and performance degradation on limited…

25
arXiv — NLP / Computation & Language research 20d ago

Speaker Group Encoding in Self-supervised Speech Recognition Models

arXiv:2606.10654v1 Announce Type: new Abstract: We investigate what self-supervised speech recognition models (S3Ms) learn about speaker groups (SGs). We examine several states of S3Ms: pretrained, finetuned on speaker identification (SID), finetuned on automatic speech…

10
arXiv — NLP / Computation & Language research 20d ago

Attention Expansion: Enhancing Keyphrase Extraction from Long Documents with Attention-Augmented Contextualized Embeddings

arXiv:2606.10716v1 Announce Type: new Abstract: Pre-trained language models (PLMs) have achieved strong performance in keyphrase extraction (KPE), largely due to their ability to generate rich contextualized representations. However, long-document KPE remains challenging because…

30
arXiv — NLP / Computation & Language research 20d ago

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

arXiv:2606.11052v1 Announce Type: new Abstract: Chain-of-thought (CoT) supervised fine-tuning (SFT) is widely adopted to improve reasoning ability, yet we find that it systematically degrades long-context recall in hybrid linear-attention models. Across architectures including…

26
arXiv — NLP / Computation & Language research 20d ago

Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction

arXiv:2606.10279v1 Announce Type: cross Abstract: Supervised fine-tuning with synthetic rationale data is widely assumed to improve language model performance on clinical prediction tasks by teaching models not just what to predict but why. We test this assumption on five-year…

28
arXiv — NLP / Computation & Language research 20d ago

Advancing the State-of-the-Art in Empirical Privacy Auditing

arXiv:2606.10481v1 Announce Type: cross Abstract: Parameter-efficient fine-tuning of large language models (LLMs) can exhibit problematic memorization of individual training examples. Empirical privacy auditing (EPA) quantifies this risk by measuring realistic data leakage on…

23
arXiv — NLP / Computation & Language research 20d ago

Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output

arXiv:2606.10528v1 Announce Type: cross Abstract: Current reinforcement learning from human feedback (RLHF) methods primarily rely on scalar rewards from a trained reward model (RM). While effective, scalar rewards are often noisy and fail to capture fine-grained preference…

32
Hugging Face Daily Papers research 20d ago

Emergent Misalignment Can Be Induced by Sycophancy and Reversed via Alignment Gating

Abstract Sycophancy fine-tuning contributes to emergent misalignment in language models, which can be reversed using Alignment Gating—a method that inserts learnable gates to identify and control unsafe responses while maintaining general capabilities. Generated by…

24
Hugging Face Daily Papers research 20d ago

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

Abstract QGF is an RL algorithm that improves policies at test time by using a value gradient to guide a pre-trained flow policy, avoiding training-time instability while maintaining competitive performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Expressive continuous…

31
r/LocalLLaMA community 20d ago

Fine-tuned Qwen2.5-7B to 96% of Claude Haiku on a domain-specific task using ~$3 of API calls and zero human labelers

Built a decision-reasoning engine (Orlog) and wanted to fine-tune a local model for it instead of paying per-call forever. The method (DV-DPO): Run a 3-voice council on each question, produce a synthesis Cross-examine: losing voices challenge the synthesis If synthesis gets…

35
Hugging Face Daily Papers research 20d ago

Robotic Policy Adaptation via Weight-Space Meta-Learning

Abstract WIZARD is a weight-space meta-learning framework that generates task-specific LoRA parameters for frozen VLA policies using language instructions and demonstration videos, enabling efficient task adaptation without fine-tuning. Generated by…

31
arXiv — Machine Learning research 21d ago

Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates

arXiv:2606.07596v1 Announce Type: new Abstract: Fine-tuning often introduces spurious correlations alongside task knowledge, causing systematic failures on underrepresented groups. Existing mitigations require retraining, group labels, or curated counterfactual data. We show a…

21
arXiv — Machine Learning research 21d ago

Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them

arXiv:2606.07597v1 Announce Type: new Abstract: Pre-training data mixtures are commonly tuned by running small-scale experiments and extrapolating to the target training budget. When high-quality data is scarce and must be repeated, this extrapolation frequently fails, but the…

23
arXiv — Machine Learning research 21d ago

DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment

arXiv:2606.07678v1 Announce Type: new Abstract: Safety alignment for large language models relies on preference data, but current pipelines often train on large, redundant datasets. Existing data selection methods typically score each preference pair independently, collapsing…

12
Hugging Face Daily Papers research 21d ago

On the Geometry of On-Policy Distillation

Abstract On-policy distillation exhibits distinct parameter space dynamics characterized by relaxed off-principal updates and subspace locking, forming a unique geometric pattern separate from supervised fine-tuning and reinforcement learning with verifiable rewards. Generated…

20

Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback

CoRe: A Continuously Reward-Finetuned LLM Query Rewriter for Multi-Stage Context-Aware Relevance in Web-Scale Video Search

Dual r9700 ai pro for training llms?

New model on huggingface

A Stationary (and Therefore Compatible) Representation is All You Need

Revisiting Articulated Parts Perception in Robot Manipulation

MentalMARBERT: Domain-Adaptive Pre-training and Two-Stage Fine-Tuning for Arabic Mental Health Disorders Detection

Small LLMs for Biomedical Claim Verification: Cost-Effective Fine-Tuning, Structural Dataset Shortcuts, and Cross-Domain Generalization

Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study

PolyAlign: Conditional Human-Distribution Alignment

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

Understanding helpfulness and harmless tension in reward models

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

Refiner: Robotics library from the ex-Hugging Face pre-training team

AMD R9700 vs GB10

Probabilistic Contrastive Pretraining for Multi-task ADME Property Prediction

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

Harness In-Context Operator Learning with Chain of Operators

Compatibility-Aware Dynamic Fine-Tuning for Large Language Models

When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis

Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay

Detecting Sensitive Personal Information in Japanese Pre-Training Corpora for Large Language Models

ALIGNBEAM : Inference-Time Alignment Transfer via Cross-Vocabulary Logit Mixing

Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay

Pyrecall open source tool for detecting catastrophic forgetting during LLM fine-tuning[P]

SenseNova U1 dropped an infographic-specific finetune

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Two to Tango: Coupled Task-Reference Selection for Safe LLM Fine-tuning

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

A Unified Adaptive Feature Composition Framework for Multi-Task Generalization in Wireless Foundation Models

Revisiting Positive Samples in Graph Contrastive Learning: From the Perspective of Message Passing

CodeAlchemy: Synthetic Code Rewriting at Scale

The Order Matters: Sequential Fine-Tuning of LLaMA for Coherent Automated Essay Scoring

Hidden Consensus:Preference-Validity Compression in Human Feedback

Small Data, Big Noise: Adversarial Training for Robust Parameter-Efficient Fine-Tuning

Speaker Group Encoding in Self-supervised Speech Recognition Models

Attention Expansion: Enhancing Keyphrase Extraction from Long Documents with Attention-Augmented Contextualized Embeddings

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction

Advancing the State-of-the-Art in Empirical Privacy Auditing

Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output

Emergent Misalignment Can Be Induced by Sycophancy and Reversed via Alignment Gating

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

Fine-tuned Qwen2.5-7B to 96% of Claude Haiku on a domain-specific task using ~$3 of API calls and zero human labelers

Robotic Policy Adaptation via Weight-Space Meta-Learning

Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates

Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them

DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment

On the Geometry of On-Policy Distillation