Tag

Training

422 articles archived under #training · RSS

Hugging Face Daily Papers research 1mo ago

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

Abstract A large-scale GUI dataset was created by automatically extracting interaction trajectories from internet videos, enabling improved performance in GUI agents through pre-training on this diverse collection. AI-generated summary Recent advances in multimodal large…

35
arXiv — Machine Learning research 1mo ago

Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining

arXiv:2605.20296v1 Announce Type: new Abstract: Fine-tuning a language model for a target task routinely degrades capabilities the training data never explicitly threatened. We study this phenomenon, known as catastrophic forgetting, and propose a post-hoc repair solution that…

17
arXiv — Machine Learning research 1mo ago

Spectral Souping: A Unified Framework for Online Preference Alignment

arXiv:2605.20408v1 Announce Type: new Abstract: Reinforcement Learning from Human Feedback (RLHF) effectively aligns Large Language Models (LLMs) with aggregate human preferences but often fails to address the diverse and conflicting needs of individual users. To overcome this…

26
arXiv — Machine Learning research 1mo ago

An exponential mechanism based on quadratic approximations for fine-tuning machine learning models with privacy guarantees

arXiv:2605.20521v1 Announce Type: new Abstract: Fine-tuning adapts a pretrained machine learning model to a small, sensitive dataset, but this process risks memorizing individual new data points, making the model vulnerable to adversaries who seek to extract sensitive…

13
arXiv — Machine Learning research 1mo ago

Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach

arXiv:2605.20674v1 Announce Type: new Abstract: We introduce CoMET, \textit{\textbf{C}omposing \textbf{M}odality \textbf{E}ncoders with \textbf{T}abular foundation models}, a simple yet highly competitive method for multimodal classification: pass each modality through a frozen…

24
arXiv — NLP / Computation & Language research 1mo ago

FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation

arXiv:2605.20199v1 Announce Type: new Abstract: We present FlowLM, a flow matching language model transformed from pre-trained diffusion language models via efficient fine-tuning. By re-aligning the curved sampling trajectories of diffusion models into straight-line flows,…

4
arXiv — NLP / Computation & Language research 1mo ago

Memory Grafting: Scaling Language Model Pre-training via Offline Conditional Memory

arXiv:2605.20948v1 Announce Type: new Abstract: Scaling conditional memory offers a promising way to increase language-model capacity, but existing methods such as Engram learn large memory tables from scratch during pre-training, making memory scaling expensive and sometimes…

12
arXiv — NLP / Computation & Language research 1mo ago

SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence

arXiv:2605.21333v1 Announce Type: new Abstract: Natively trained spiking language models struggle to combine Transformer-like language quality, stable multi-domain pre-training, and high activation sparsity. We present SymbolicLight V1, a spike-gated dual-path language model…

7
arXiv — NLP / Computation & Language research 1mo ago

SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning

arXiv:2605.21147v1 Announce Type: cross Abstract: As the number of model parameters increases, parameter-efficient fine-tuning (PEFT) has become the go-to choice for tailoring pre-trained large language models. Low-rank Adaptation (LoRA) uses a low-rank update method to simulate…

24
Hugging Face Daily Papers research 1mo ago

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment

Abstract Direct Preference Optimization (DPO) is theoretically equivalent to Reinforcement Learning from Human Feedback (RLHF) only under specific assumptions, otherwise optimizing different objectives; Constrained Preference Optimization (CPO) is proposed as a solution with…

17
Hugging Face Daily Papers research 1mo ago

Zero-Shot Sim-to-Real Robot Learning: A Dexterous Manipulation Study on Reactive Catching

Abstract Domain-Randomized Instance Set (DRIS) enables robust policy learning for dexterous manipulation tasks by simultaneously representing multiple randomized instances, achieving strong sim-to-real transfer without extensive real-world fine-tuning. AI-generated summary…

19
Hugging Face Daily Papers research 1mo ago

Why Do Reasoning Models Lose Coverage? The Role of Data and Forks in the Road

Abstract Reasoning models exhibit coverage shrinkage during supervised fine-tuning due to decision-point scenarios in training data, which can be mitigated through targeted data synthesis and diversity-encouraging decoding mechanisms. AI-generated summary Recent progress in…

30
r/LocalLLaMA community 1mo ago

A streamlined Hugging Face model search utility coded by Qwen 3.6-27B

Hi all. As some may have been aware, Hugging Face's model search had issues recently. (It seems to be resolved now though). I also often find myself struggling with the standard search interface when trying to find new derivative quants or finetunes of some particular models,…

24
Hugging Face Daily Papers research 1mo ago

Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning

Abstract Reinforcement Fine-Tuning suffers from catastrophic forgetting in visual continual learning, which is addressed through Retention-aware Policy Optimization that uses trajectory-level reward shaping and cross-task advantage normalization. AI-generated summary Recent…

11
arXiv — Machine Learning research 1mo ago

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

arXiv:2605.18795v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning of large language models, yet most variants target dense architectures. Mixture-of-Experts (MoE) models scale parameters at near-constant per-token compute, and…

27
arXiv — Machine Learning research 1mo ago

DynaTrain: Fast Online Parallelism Switching for Elastic LLM Training

arXiv:2605.18815v1 Announce Type: new Abstract: Modern large language model (LLM) training is inherently dynamic: resource fluctuations, RLHF phase shifts, and cluster elasticity continually reshape the optimal parallelism layout, posing a significant challenge to existing…

22
arXiv — Machine Learning research 1mo ago

Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training

arXiv:2605.18822v1 Announce Type: new Abstract: Post-training has become essential for adapting large language models (LLMs) to complex downstream behaviors, including instruction following, preference alignment, and multi-step reasoning. Reinforcement learning with verifiable…

28
arXiv — Machine Learning research 1mo ago

TEMPO: Temporal Enforcement via Mode-Separated Policy Optimization for Trustworthy LLM Backtesting

arXiv:2605.18843v1 Announce Type: new Abstract: Backtesting large language models on historical events requires reasoning exclusively from information available before a specified cutoff date. Yet models routinely leak post-cutoff knowledge from pre-training into their…

37
arXiv — Machine Learning research 1mo ago

HypergraphFormer: Learning Hypergraphs from LLMs for Editable Floor Plan Generation

arXiv:2605.18932v1 Announce Type: new Abstract: In this work, we propose HypergraphFormer, a novel and efficient approach to floor plan generation based on learning hypergraph representations with a large language model (LLM). The model is trained via supervised fine-tuning to…

18
arXiv — Machine Learning research 1mo ago

Distilling Linearized Behavior for Effective Task Arithmetic

arXiv:2605.18993v1 Announce Type: new Abstract: Task vector composition has emerged as a promising paradigm for editing pre-trained models, enabling model merging through addition and unlearning through subtraction. Fine-tuning in the tangent space of a pre-trained model (linear…

20
arXiv — Machine Learning research 1mo ago

LoRA vs. Full Fine-Tuning: A Theoretical Perspective

arXiv:2605.19018v1 Announce Type: new Abstract: Fine-tuning adapts a pre-trained model to downstream tasks using a small amount of labeled data. Low-Rank Adaptation (LoRA) is an efficient fine-tuning method that reduces memory and computation costs while often achieving…

25
arXiv — Machine Learning research 1mo ago

Learning When to Adapt

arXiv:2605.19028v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method, yet its learned correction is static: the same low-rank update is applied to every input. This input-agnostic approach creates an inevitable…

38
arXiv — NLP / Computation & Language research 1mo ago

Fine-tuning language encoding models on slow fMRI improves prediction for fast ECoG

arXiv:2605.19224v1 Announce Type: new Abstract: Neuroscientists have recently turned to intracranial brain recording methods, like electrocorticography (ECoG), for human experiments because of the fine spatial and temporal resolution that they afford. Models trained on this…

11
arXiv — NLP / Computation & Language research 1mo ago

EmbGen: Teaching with Reassembled Corpora

arXiv:2605.19394v1 Announce Type: new Abstract: Adapting small instruction-tuned models to specialized domains often relies on supervised fine-tuning (SFT) on curated instruction-response examples, which is expensive to collect at scale. Synthetic training examples generated by…

29
Vercel — AI dev-tools 1mo ago

Chat SDK now supports callback URLs on buttons and modals

You can now pause a Workflow run on a Chat SDK card and resume it when someone clicks a button. The same flow works for form submissions. Buttons and modals accept a new callbackUrl prop, and the event payload is sent to that endpoint. To build a card like this, create a…

36
TechCrunch — AI news-outlet 1mo ago

OpenAI co-founder Andrej Karpathy joins Anthropic’s pre-training team

Andrej Karpathy has joined Anthropic to work on pre-training. He previously co-founded and worked at OpenAI and led computer vision and AI at Tesla.

27
arXiv — Machine Learning research 1mo ago

Goal-Conditioned Supervised Learning for LLM Fine-Tuning

arXiv:2605.16345v1 Announce Type: new Abstract: Large language models often require fine-tuning to better align their behavior with user intent at deployment. Existing approaches are commonly divided into online and offline paradigms. Online methods, such as RL-based alignment,…

28
arXiv — Machine Learning research 1mo ago

Flow-Direct: Feedback-Efficient and Reusable Guidance for Flow Models via Non-Parametric Guidance Field

arXiv:2605.16348v1 Announce Type: new Abstract: Training-free guidance enables pre-trained diffusion and flow models to optimize application-specific objectives using feedback from external black-box reward functions. However, existing methods are feedback-inefficient because…

24
arXiv — Machine Learning research 1mo ago

LEAF: A Living Benchmark for Event-Augmented Forecasting

arXiv:2605.16358v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly applied to forecasting. To evaluate this capability while mitigating pre-training data contamination, several living benchmarks have been proposed. However, existing benchmarks either…

33
arXiv — Machine Learning research 1mo ago

Strategic Over-Parameterization for Generalizable Low-Rank Adaptation

arXiv:2605.16470v1 Announce Type: new Abstract: Adapting large language models (LLMs) to downstream tasks via full fine-tuning is increasingly impractical due to its computational and memory demands. Parameter-efficient fine-tuning (PEFT) approaches such as Low-Rank Adaptation…

30
arXiv — Machine Learning research 1mo ago

Scalable Knowledge Editing for Mixture-of-Experts LLMs via Tensor-Structured Updates

arXiv:2605.16686v1 Announce Type: new Abstract: Knowledge editing (KE) provides a lightweight alternative to repeated fine-tuning of LLMs. However, most existing KE methods target dense feed-forward layers, while modern LLMs increasingly adopt Mixture-of-Experts (MoE)…

14
arXiv — Machine Learning research 1mo ago

UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models

arXiv:2605.16690v1 Announce Type: new Abstract: Heterogeneous LoRA-rank methods address system heterogeneity in federated fine-tuning of foundation models by assigning client-specific ranks based on computational capabilities. However, these methods achieve only marginal…

32
arXiv — NLP / Computation & Language research 1mo ago

MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

arXiv:2605.16865v1 Announce Type: new Abstract: Supervised fine-tuning (SFT) is widely used to inject new knowledge into language models, but it often degrades pretrained capabilities such as reasoning and general-domain performance. We argue this forgetting arises because…

15
arXiv — NLP / Computation & Language research 1mo ago

Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost?

arXiv:2605.16996v1 Announce Type: new Abstract: Can large language models reliably express a human-like personality, or are they merely mimicking surface cues without a stable underlying profile? To investigate this, we induce personality in LLMs by fine-tuning them on the…

14
arXiv — NLP / Computation & Language research 1mo ago

Weak-to-Strong Elicitation via Mismatched Wrong Drafts

arXiv:2605.17314v1 Announce Type: new Abstract: We consider whether off-policy experience from a smaller, weaker model can elicit capability in a stronger learner that on-policy RL fine-tuning (e.g., GRPO) does not reach. We find that injecting mathematically wrong drafts from a…

25
arXiv — NLP / Computation & Language research 1mo ago

Transitivity Meets Cyclicity: Explicit Preference Decomposition for Dynamic Large Language Model Alignment

arXiv:2605.17342v1 Announce Type: new Abstract: Standard RLHF relies on transitive scalar rewards, failing to capture the cyclic nature of human preferences. While some approaches like the General Preference Model (GPM) address this, we identify a theoretical limitation: their…

11
arXiv — NLP / Computation & Language research 1mo ago

Internalizing Tool Knowledge in Small Language Models via QLoRA Fine-Tuning

arXiv:2605.17774v1 Announce Type: new Abstract: Large language models are increasingly used as planning components in agentic systems, but current tool-use pipelines often require full tool schemas to be included in every prompt, creating substantial token overhead and limiting…

24
arXiv — NLP / Computation & Language research 1mo ago

A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$\Delta$ Integration into Upcycled MoE

arXiv:2605.18083v1 Announce Type: new Abstract: Expanding Large Language Models~(LLMs) to new languages is a costly endeavor, demanding extensive Continued Pre-Training~(CPT) and data-intensive alignment. While recent data-free merging techniques attempt to bypass alignment by…

30
arXiv — NLP / Computation & Language research 1mo ago

Ancient Greek to Modern Greek Machine Translation: A Novel Benchmark and Fine-Tuning Experiments on LLMs and NMT Models

arXiv:2605.18504v1 Announce Type: new Abstract: Machine Translation (MT) for Ancient Greek (AG) to Modern Greek (MG) is a low-resource task, constrained by the lack of large-scale, high-quality parallel data. We address this gap by introducing the AG-MG Parallel Corpus, a new…

19
Hugging Face official-blog 1mo ago

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

Back to Articles Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation Enterprise + Article Published May 18, 2026 Upvote - Ting-Yun Chang ting-yunc nvidia Miguel Martin miguelmartin-nv nvidia Jonathan Allen nv-spectralflight nvidia Ke Ding kding1…

11
Hugging Face Daily Papers research 1mo ago

Follow the Mean: Reference-Guided Flow Matching

Abstract Flow matching enables controllable generation through example-based adaptation via conditional endpoint mean adjustment, offering training-free and parametric guidance methods for style and content control. AI-generated summary Existing approaches to controllable…

23
Hugging Face Daily Papers research 1mo ago

Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models

Abstract SAE-FT enables robust fine-tuning of vision-language models by regularizing visual representations through sparse autoencoder constraints, maintaining performance while improving robustness against distribution shifts. AI-generated summary Large-scale pre-trained…

34
arXiv — Machine Learning research 1mo ago

TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

arXiv:2605.15207v1 Announce Type: new Abstract: Multi-agent LLM systems have shown promise for complex reasoning, yet recent evaluations reveal they often underperform single-model baselines. We identify a structural failure mode in sequential fine-tuning of shared-context…

29
arXiv — Machine Learning research 1mo ago

Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation

arXiv:2605.15239v1 Announce Type: new Abstract: Safety alignment often improves robustness to harmful queries at the cost of reasoning ability, a tradeoff known as the safety tax. A common cause is distributional mismatch: supervised fine-tuning trains the target model on safety…

18
arXiv — Machine Learning research 1mo ago

Tadpole: Autoencoders as Foundation Models for 3D PDEs with Online Learning

arXiv:2605.15284v1 Announce Type: new Abstract: We introduce Tadpole, a novel foundation model for three-dimensional partial differential equations (PDEs) that addresses key challenges in transferability, scalability to high dimensionality, and multi-functionality. Tadpole is…

12
arXiv — Machine Learning research 1mo ago

Representation Without Reward: A JEPA Audit for LLM Fine-Tuning

arXiv:2605.15394v1 Announce Type: new Abstract: Joint-embedding predictive architectures (JEPAs) propose that a model should learn more useful abstractions when trained to predict latent representations rather than observed outputs. For autoregressive language-model fine-tuning…

30
arXiv — Machine Learning research 1mo ago

Towards Code-Oriented LM Embeddings for Surrogate-Assisted Neural Architecture Search

arXiv:2605.15649v1 Announce Type: new Abstract: Developing effective surrogates (performance predictors) for Neural Architecture Search (NAS) typically requires expensive fine-tuning or the engineering of complex representations. We propose a low-cost embedding strategy that…

36
arXiv — Machine Learning research 1mo ago

AOT-POT: Adaptive Operator Transformation for Large-Scale PDE Pre-training

arXiv:2605.15793v1 Announce Type: new Abstract: Pre-training neural operators on diverse partial differential equation (PDE) datasets has emerged as a promising direction for building general-purpose surrogate models in scientific machine learning. However, the inherent…

27
arXiv — Machine Learning research 1mo ago

CHoE: Cross-Domain Heterogeneous Graph Prompt Learning via Structure-Conditioned Experts

arXiv:2605.15888v1 Announce Type: new Abstract: Heterogeneous Graph Prompt Learning (HGPL)has emerged as a promising paradigm for bridging the gap between the objectives of pre-training foundation models and their downstream applications in heterogeneous graph settings. However,…

31
arXiv — Machine Learning research 1mo ago

LoCO: Low-rank Compositional Rotation Fine-tuning

arXiv:2605.15916v1 Announce Type: new Abstract: Parameter-efficient fine-tuning (PEFT) has emerged as an critical technique for adapting large-scale foundation models across natural language processing and computer vision. While existing methods such as low-rank adaptations…

4

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining

Spectral Souping: A Unified Framework for Online Preference Alignment

An exponential mechanism based on quadratic approximations for fine-tuning machine learning models with privacy guarantees

Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach

FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation

Memory Grafting: Scaling Language Model Pre-training via Offline Conditional Memory

SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence

SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment

Zero-Shot Sim-to-Real Robot Learning: A Dexterous Manipulation Study on Reactive Catching

Why Do Reasoning Models Lose Coverage? The Role of Data and Forks in the Road

A streamlined Hugging Face model search utility coded by Qwen 3.6-27B

Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

DynaTrain: Fast Online Parallelism Switching for Elastic LLM Training

Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training

TEMPO: Temporal Enforcement via Mode-Separated Policy Optimization for Trustworthy LLM Backtesting

HypergraphFormer: Learning Hypergraphs from LLMs for Editable Floor Plan Generation

Distilling Linearized Behavior for Effective Task Arithmetic

LoRA vs. Full Fine-Tuning: A Theoretical Perspective

Learning When to Adapt

Fine-tuning language encoding models on slow fMRI improves prediction for fast ECoG

EmbGen: Teaching with Reassembled Corpora

Chat SDK now supports callback URLs on buttons and modals

OpenAI co-founder Andrej Karpathy joins Anthropic&#8217;s pre-training team

Goal-Conditioned Supervised Learning for LLM Fine-Tuning

Flow-Direct: Feedback-Efficient and Reusable Guidance for Flow Models via Non-Parametric Guidance Field

LEAF: A Living Benchmark for Event-Augmented Forecasting

Strategic Over-Parameterization for Generalizable Low-Rank Adaptation

Scalable Knowledge Editing for Mixture-of-Experts LLMs via Tensor-Structured Updates

UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models

MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost?

Weak-to-Strong Elicitation via Mismatched Wrong Drafts

Transitivity Meets Cyclicity: Explicit Preference Decomposition for Dynamic Large Language Model Alignment

Internalizing Tool Knowledge in Small Language Models via QLoRA Fine-Tuning

A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$\Delta$ Integration into Upcycled MoE

Ancient Greek to Modern Greek Machine Translation: A Novel Benchmark and Fine-Tuning Experiments on LLMs and NMT Models

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

Follow the Mean: Reference-Guided Flow Matching

Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models

TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation

Tadpole: Autoencoders as Foundation Models for 3D PDEs with Online Learning

Representation Without Reward: A JEPA Audit for LLM Fine-Tuning

Towards Code-Oriented LM Embeddings for Surrogate-Assisted Neural Architecture Search

AOT-POT: Adaptive Operator Transformation for Large-Scale PDE Pre-training

CHoE: Cross-Domain Heterogeneous Graph Prompt Learning via Structure-Conditioned Experts

LoCO: Low-rank Compositional Rotation Fine-tuning

OpenAI co-founder Andrej Karpathy joins Anthropic’s pre-training team