News / #training Tag Training 422 articles archived under #training · RSS Sign in to follow arXiv — NLP / Computation & Language research 15d ago Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback arXiv:2606.14368v1 Announce Type: cross Abstract: We study multi-domain LLM training in which two models, each stronger in a different domain, co-evolve by tutoring each other through on-policy feedback. Unlike one-way distillation or single-model fine-tuning, our goal is mutual… 20 arXiv — NLP / Computation & Language research 15d ago CoRe: A Continuously Reward-Finetuned LLM Query Rewriter for Multi-Stage Context-Aware Relevance in Web-Scale Video Search arXiv:2606.14127v1 Announce Type: cross Abstract: LLM-based query rewriters in production face a tension: the training reward must reflect how the rewrite is consumed by the production ranker, yet the training procedure must be cheap enough to support continuous redeployment as… 17 r/LocalLLaMA community 16d ago Dual r9700 ai pro for training llms? I am a developer and need high vram machine to finetune llms, how has your experience been with finetuning/training on multi gpu on 2x r700 amd ai pro gpus?   submitted by   /u/AppropriatePush6262 [link]   [comments] 13 r/LocalLLaMA community 17d ago New model on huggingface https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B A qwen finetune. Looks pretty even with qwen 3.7 plus, except it's actually open source. Disclosure: I work as a researcher for the city government of Rio de Janeiro, which developed this model.   submitted by  … 15 Hugging Face Daily Papers research 17d ago A Stationary (and Therefore Compatible) Representation is All You Need Abstract Stationary representations learned through d-Simplex fixed classifiers ensure model compatibility during sequential fine-tuning and updates, enabling continuous retrieval services without reprocessing. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Learning compatible… 25 Hugging Face Daily Papers research 17d ago Revisiting Articulated Parts Perception in Robot Manipulation Abstract A new geometric representation called Geometric Primary Structure (GPS) is introduced for articulated parts perception, enabling efficient data collection through VR annotation and achieving high manipulation success rates without fine-tuning. Generated by… 27 arXiv — NLP / Computation & Language research 18d ago MentalMARBERT: Domain-Adaptive Pre-training and Two-Stage Fine-Tuning for Arabic Mental Health Disorders Detection arXiv:2606.12649v1 Announce Type: new Abstract: Detecting mental health disorders from Arabic social media text remains challenging due to dialectal variation, informal language, limited high-quality annotated resources, and severe class imbalance. While English mental health… 25 arXiv — NLP / Computation & Language research 18d ago Small LLMs for Biomedical Claim Verification: Cost-Effective Fine-Tuning, Structural Dataset Shortcuts, and Cross-Domain Generalization arXiv:2606.12854v1 Announce Type: new Abstract: Large Language Models such as GPT-4o and GPT-5 achieve strong zero-shot performance on biomedical claim verification, but cost and opacity limit scalable use. We fine-tune three small LLMs: Phi-3-mini (3.8B), Qwen2.5-3B, and… 33 arXiv — NLP / Computation & Language research 18d ago Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study arXiv:2606.12881v1 Announce Type: new Abstract: We present an approach to fine-tuning large language models using Direct Preference Optimization (DPO), a reinforcement learning technique. Our experimental results demonstrate that DPO simplifies the training pipeline, improves… 24 arXiv — NLP / Computation & Language research 18d ago PolyAlign: Conditional Human-Distribution Alignment arXiv:2606.13227v1 Announce Type: new Abstract: Post-training methods such as supervised fine-tuning (SFT) and preference optimization typically align language models toward a single global assistant behavior. While effective for improving average helpfulness, this can suppress… 29 arXiv — NLP / Computation & Language research 18d ago Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning arXiv:2606.13680v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning… 11 arXiv — NLP / Computation & Language research 18d ago Understanding helpfulness and harmless tension in reward models arXiv:2606.13209v1 Announce Type: cross Abstract: Reward models are a key component of reinforcement learning from human feedback (RLHF), aligning language models toward both helpful and harmless behaviour. However, the internal mechanisms underlying these objectives and their… 12 Hugging Face Daily Papers research 18d ago Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents Abstract Evoflux enables compact language models to execute tool workflows more reliably by using evolutionary search to repair failed plans during inference, significantly improving execution feasibility compared to traditional fine-tuning methods. Generated by… 20 Hugging Face Daily Papers research 18d ago Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training Abstract ART enables parameter-efficient fine-tuning of frozen multimodal language models by optimizing raw visual input through gradient backpropagation, achieving performance comparable to LoRA while supporting pre-compiled computational graphs. Generated by… 8 r/LocalLLaMA community 18d ago Refiner: Robotics library from the ex-Hugging Face pre-training team ex-Huggingface pre-training team just announce a new library create for robotics data refinment! It supports ingestion of all robotics formats (Parquet, HDF5, MCAP, Zarr, RLDS, and LeRobot), as well as the common processing flows like visual hand-tracking, subtask annotations… 26 r/LocalLLaMA community 18d ago AMD R9700 vs GB10 I have a budget of 5K, and want to buy some gpus my requirement is 48gb+ vram, because I finetune small language model, perform DPO, in general tinkering/ development is my usecase. if you where in my shoe which among these would you get, on one hand amd is better bang for buck,… 4 arXiv — Machine Learning research 19d ago Probabilistic Contrastive Pretraining for Multi-task ADME Property Prediction arXiv:2606.11508v1 Announce Type: new Abstract: Accurate prediction of absorption, distribution, metabolism, and excretion (ADME) properties is critical to drug discovery, but remains challenging because ADME endpoints are noisy, interdependent, and often data-limited. We… 13 arXiv — Machine Learning research 19d ago Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training arXiv:2606.11854v1 Announce Type: new Abstract: There are two main Parameter-Efficient Fine-Tuning (PEFT) techniques for Large Language Models (LLMs). While Low-Rank Adaptation (LoRA) introduces additional weights between the LLM layers, Soft Prompting introduces additional… 18 arXiv — Machine Learning research 19d ago Harness In-Context Operator Learning with Chain of Operators arXiv:2606.12318v1 Announce Type: new Abstract: Neural operators approximate mappings between function spaces, but often generalize poorly to other operators and usually require fine-tuning or retraining. In-Context Operator Networks (ICON) addresses this issue by prompting the… 28 arXiv — NLP / Computation & Language research 19d ago Compatibility-Aware Dynamic Fine-Tuning for Large Language Models arXiv:2606.11206v1 Announce Type: new Abstract: Supervised Fine-Tuning (SFT) is the predominant paradigm for aligning large language models (LLMs), yet it suffers from optimization instability and limited generalization. Recent work attributes this issue to pathological gradient… 20 arXiv — NLP / Computation & Language research 19d ago When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis arXiv:2606.11375v1 Announce Type: new Abstract: Standard linear probing declares a property "encoded" when a classifier on hidden states achieves high accuracy. The protocol works well on a snapshot but breaks across pre-training: probe accuracy saturates within the first few… 17 arXiv — NLP / Computation & Language research 19d ago Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay arXiv:2606.11786v1 Announce Type: new Abstract: Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a… 37 arXiv — NLP / Computation & Language research 19d ago Detecting Sensitive Personal Information in Japanese Pre-Training Corpora for Large Language Models arXiv:2606.12114v1 Announce Type: new Abstract: Sensitive personal information can appear in large-scale pre-training corpora for large language models (LLMs). Detecting and filtering such information is therefore essential to ensure compliance with privacy regulations and… 34 arXiv — NLP / Computation & Language research 19d ago ALIGNBEAM : Inference-Time Alignment Transfer via Cross-Vocabulary Logit Mixing arXiv:2606.12342v1 Announce Type: new Abstract: Domain fine-tuning degrades the safety of large language models: fine-tuned specialists readily comply with harmful prompts framed in domain language. Existing inference-time defenses that mix logits from a safe anchor model… 18 Hugging Face Daily Papers research 19d ago Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay Abstract Continual Instruction Tuning enables effective fine-tuning of large language models for low-resource language translation, achieving superior performance compared to standard instruction tuning and multilingual models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large… 4 r/MachineLearning community 19d ago Pyrecall open source tool for detecting catastrophic forgetting during LLM fine-tuning[P] Surprised there's no real tooling for this given how much research exists on continual learning. Built pyrecall to fill the gap. Snapshots skill scores before/after fine-tuning, flags regressions, rolls back LoRA adapters by name. Fully local, no external APIs. v0.1.0, MIT, pip… 17 r/LocalLLaMA community 19d ago SenseNova U1 dropped an infographic-specific finetune it's the same U1-8B-MoT base with an extended MT (multi-task) training phase focused on structured visual output. the benchmark jumps are significant: IGenBench I-ACC (infographic accuracy) : 4.2👉17.0 (4x) Chart Understanding: 51.3👉69.5Text Rendering: 39.8👉46.6Overall… 32 Hugging Face Daily Papers research 20d ago Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It Abstract Chain-of-thought supervised fine-tuning degrades long-context recall in hybrid linear-attention models by biasing attention gradients toward short-range patterns, but a training-free method called QK-Restore can restore long-context capabilities by reverting query-key… 8 arXiv — Machine Learning research 20d ago Two to Tango: Coupled Task-Reference Selection for Safe LLM Fine-tuning arXiv:2606.09866v1 Announce Type: new Abstract: Fine-tuning safety aligned large language models (LLMs) on downstream data improves adaptation but may erode learned safety behavior. Existing methods use fixed safety examples, global constraints, or one-sided task filtering. Our… 28 arXiv — Machine Learning research 20d ago When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff arXiv:2606.09932v1 Announce Type: new Abstract: Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has become a standard pipeline for Large Language Model (LLM) post-training. SFT is expected to provide a useful behavioral prior for RL to further enhance model… 30 arXiv — Machine Learning research 20d ago A Unified Adaptive Feature Composition Framework for Multi-Task Generalization in Wireless Foundation Models arXiv:2606.10277v1 Announce Type: new Abstract: Though wireless foundation models (WFMs) have shown strong potential in learning universal channel representations, their adaptation to various downstream tasks remains constrained by existing paradigms. Fine-tuning strategies… 19 arXiv — Machine Learning research 20d ago Revisiting Positive Samples in Graph Contrastive Learning: From the Perspective of Message Passing arXiv:2606.10284v1 Announce Type: new Abstract: Graph Contrastive Learning (GCL), which trains graph encoders by maximizing similarity between positive samples and minimizing it between negative ones, has emerged as a mainstream graph pre-training paradigm. It is widely… 16 arXiv — NLP / Computation & Language research 20d ago CodeAlchemy: Synthetic Code Rewriting at Scale arXiv:2606.10087v1 Announce Type: new Abstract: Pre-training on raw code teaches syntax but provides sparse signal for diverse real-world task formats. While synthetic data has proven transformative for language models, code remains largely unexplored beyond limited quality… 29 arXiv — NLP / Computation & Language research 20d ago The Order Matters: Sequential Fine-Tuning of LLaMA for Coherent Automated Essay Scoring arXiv:2606.10327v1 Announce Type: new Abstract: Automated Essay Scoring (AES) systems must judge interdependent discourse elements (e.g., lead, claim, evidence, conclusion), yet most approaches treat these in isolation, harming coherence and generalization. We investigate… 19 arXiv — NLP / Computation & Language research 20d ago Hidden Consensus:Preference-Validity Compression in Human Feedback arXiv:2606.10569v1 Announce Type: new Abstract: Standard RLHF pipelines often reduce heterogeneous human judgments into a single scalar reward target. We argue that this reduction can mis-measure alignment in structurally plural societies, where disagreement may reflect… 7 arXiv — NLP / Computation & Language research 20d ago Small Data, Big Noise: Adversarial Training for Robust Parameter-Efficient Fine-Tuning arXiv:2606.10610v1 Announce Type: new Abstract: Parameter-Efficient Fine-Tuning (PEFT) has become essential for adapting foundation models to downstream NLP tasks. However, current PEFT methods often struggle with robustness to noise and performance degradation on limited… 25 arXiv — NLP / Computation & Language research 20d ago Speaker Group Encoding in Self-supervised Speech Recognition Models arXiv:2606.10654v1 Announce Type: new Abstract: We investigate what self-supervised speech recognition models (S3Ms) learn about speaker groups (SGs). We examine several states of S3Ms: pretrained, finetuned on speaker identification (SID), finetuned on automatic speech… 10 arXiv — NLP / Computation & Language research 20d ago Attention Expansion: Enhancing Keyphrase Extraction from Long Documents with Attention-Augmented Contextualized Embeddings arXiv:2606.10716v1 Announce Type: new Abstract: Pre-trained language models (PLMs) have achieved strong performance in keyphrase extraction (KPE), largely due to their ability to generate rich contextualized representations. However, long-document KPE remains challenging because… 30 arXiv — NLP / Computation & Language research 20d ago Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It arXiv:2606.11052v1 Announce Type: new Abstract: Chain-of-thought (CoT) supervised fine-tuning (SFT) is widely adopted to improve reasoning ability, yet we find that it systematically degrades long-context recall in hybrid linear-attention models. Across architectures including… 26 arXiv — NLP / Computation & Language research 20d ago Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction arXiv:2606.10279v1 Announce Type: cross Abstract: Supervised fine-tuning with synthetic rationale data is widely assumed to improve language model performance on clinical prediction tasks by teaching models not just what to predict but why. We test this assumption on five-year… 28 arXiv — NLP / Computation & Language research 20d ago Advancing the State-of-the-Art in Empirical Privacy Auditing arXiv:2606.10481v1 Announce Type: cross Abstract: Parameter-efficient fine-tuning of large language models (LLMs) can exhibit problematic memorization of individual training examples. Empirical privacy auditing (EPA) quantifies this risk by measuring realistic data leakage on… 23 arXiv — NLP / Computation & Language research 20d ago Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output arXiv:2606.10528v1 Announce Type: cross Abstract: Current reinforcement learning from human feedback (RLHF) methods primarily rely on scalar rewards from a trained reward model (RM). While effective, scalar rewards are often noisy and fail to capture fine-grained preference… 32 Hugging Face Daily Papers research 20d ago Emergent Misalignment Can Be Induced by Sycophancy and Reversed via Alignment Gating Abstract Sycophancy fine-tuning contributes to emergent misalignment in language models, which can be reversed using Alignment Gating—a method that inserts learnable gates to identify and control unsafe responses while maintaining general capabilities. Generated by… 24 Hugging Face Daily Papers research 20d ago Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning Abstract QGF is an RL algorithm that improves policies at test time by using a value gradient to guide a pre-trained flow policy, avoiding training-time instability while maintaining competitive performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Expressive continuous… 31 r/LocalLLaMA community 20d ago Fine-tuned Qwen2.5-7B to 96% of Claude Haiku on a domain-specific task using ~$3 of API calls and zero human labelers Built a decision-reasoning engine (Orlog) and wanted to fine-tune a local model for it instead of paying per-call forever. The method (DV-DPO): Run a 3-voice council on each question, produce a synthesis Cross-examine: losing voices challenge the synthesis If synthesis gets… 35 Hugging Face Daily Papers research 20d ago Robotic Policy Adaptation via Weight-Space Meta-Learning Abstract WIZARD is a weight-space meta-learning framework that generates task-specific LoRA parameters for frozen VLA policies using language instructions and demonstration videos, enabling efficient task adaptation without fine-tuning. Generated by… 31 arXiv — Machine Learning research 21d ago Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates arXiv:2606.07596v1 Announce Type: new Abstract: Fine-tuning often introduces spurious correlations alongside task knowledge, causing systematic failures on underrepresented groups. Existing mitigations require retraining, group labels, or curated counterfactual data. We show a… 21 arXiv — Machine Learning research 21d ago Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them arXiv:2606.07597v1 Announce Type: new Abstract: Pre-training data mixtures are commonly tuned by running small-scale experiments and extrapolating to the target training budget. When high-quality data is scarce and must be repeated, this extrapolation frequently fails, but the… 23 arXiv — Machine Learning research 21d ago DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment arXiv:2606.07678v1 Announce Type: new Abstract: Safety alignment for large language models relies on preference data, but current pipelines often train on large, redundant datasets. Existing data selection methods typically score each preference pair independently, collapsing… 12 Hugging Face Daily Papers research 21d ago On the Geometry of On-Policy Distillation Abstract On-policy distillation exhibits distinct parameter space dynamics characterized by relaxed off-principal updates and subspace locking, forming a unique geometric pattern separate from supervised fine-tuning and reinforcement learning with verifiable rewards. Generated… 20 Page 3 of 9 · 422 articles ← Newer Older →