News / #training Tag Training 422 articles archived under #training · RSS Sign in to follow Hugging Face Daily Papers research 1mo ago Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Abstract A large-scale GUI dataset was created by automatically extracting interaction trajectories from internet videos, enabling improved performance in GUI agents through pre-training on this diverse collection. AI-generated summary Recent advances in multimodal large… 35 arXiv — Machine Learning research 1mo ago Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining arXiv:2605.20296v1 Announce Type: new Abstract: Fine-tuning a language model for a target task routinely degrades capabilities the training data never explicitly threatened. We study this phenomenon, known as catastrophic forgetting, and propose a post-hoc repair solution that… 17 arXiv — Machine Learning research 1mo ago Spectral Souping: A Unified Framework for Online Preference Alignment arXiv:2605.20408v1 Announce Type: new Abstract: Reinforcement Learning from Human Feedback (RLHF) effectively aligns Large Language Models (LLMs) with aggregate human preferences but often fails to address the diverse and conflicting needs of individual users. To overcome this… 26 arXiv — Machine Learning research 1mo ago An exponential mechanism based on quadratic approximations for fine-tuning machine learning models with privacy guarantees arXiv:2605.20521v1 Announce Type: new Abstract: Fine-tuning adapts a pretrained machine learning model to a small, sensitive dataset, but this process risks memorizing individual new data points, making the model vulnerable to adversaries who seek to extract sensitive… 13 arXiv — Machine Learning research 1mo ago Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach arXiv:2605.20674v1 Announce Type: new Abstract: We introduce CoMET, \textit{\textbf{C}omposing \textbf{M}odality \textbf{E}ncoders with \textbf{T}abular foundation models}, a simple yet highly competitive method for multimodal classification: pass each modality through a frozen… 24 arXiv — NLP / Computation & Language research 1mo ago FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation arXiv:2605.20199v1 Announce Type: new Abstract: We present FlowLM, a flow matching language model transformed from pre-trained diffusion language models via efficient fine-tuning. By re-aligning the curved sampling trajectories of diffusion models into straight-line flows,… 4 arXiv — NLP / Computation & Language research 1mo ago Memory Grafting: Scaling Language Model Pre-training via Offline Conditional Memory arXiv:2605.20948v1 Announce Type: new Abstract: Scaling conditional memory offers a promising way to increase language-model capacity, but existing methods such as Engram learn large memory tables from scratch during pre-training, making memory scaling expensive and sometimes… 12 arXiv — NLP / Computation & Language research 1mo ago SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence arXiv:2605.21333v1 Announce Type: new Abstract: Natively trained spiking language models struggle to combine Transformer-like language quality, stable multi-domain pre-training, and high activation sparsity. We present SymbolicLight V1, a spike-gated dual-path language model… 7 arXiv — NLP / Computation & Language research 1mo ago SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning arXiv:2605.21147v1 Announce Type: cross Abstract: As the number of model parameters increases, parameter-efficient fine-tuning (PEFT) has become the go-to choice for tailoring pre-trained large language models. Low-rank Adaptation (LoRA) uses a low-rank update method to simulate… 24 Hugging Face Daily Papers research 1mo ago Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment Abstract Direct Preference Optimization (DPO) is theoretically equivalent to Reinforcement Learning from Human Feedback (RLHF) only under specific assumptions, otherwise optimizing different objectives; Constrained Preference Optimization (CPO) is proposed as a solution with… 17 Hugging Face Daily Papers research 1mo ago Zero-Shot Sim-to-Real Robot Learning: A Dexterous Manipulation Study on Reactive Catching Abstract Domain-Randomized Instance Set (DRIS) enables robust policy learning for dexterous manipulation tasks by simultaneously representing multiple randomized instances, achieving strong sim-to-real transfer without extensive real-world fine-tuning. AI-generated summary… 19 Hugging Face Daily Papers research 1mo ago Why Do Reasoning Models Lose Coverage? The Role of Data and Forks in the Road Abstract Reasoning models exhibit coverage shrinkage during supervised fine-tuning due to decision-point scenarios in training data, which can be mitigated through targeted data synthesis and diversity-encouraging decoding mechanisms. AI-generated summary Recent progress in… 30 r/LocalLLaMA community 1mo ago A streamlined Hugging Face model search utility coded by Qwen 3.6-27B Hi all. As some may have been aware, Hugging Face's model search had issues recently. (It seems to be resolved now though). I also often find myself struggling with the standard search interface when trying to find new derivative quants or finetunes of some particular models,… 24 Hugging Face Daily Papers research 1mo ago Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning Abstract Reinforcement Fine-Tuning suffers from catastrophic forgetting in visual continual learning, which is addressed through Retention-aware Policy Optimization that uses trajectory-level reward shaping and cross-task advantage normalization. AI-generated summary Recent… 11 arXiv — Machine Learning research 1mo ago HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models arXiv:2605.18795v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning of large language models, yet most variants target dense architectures. Mixture-of-Experts (MoE) models scale parameters at near-constant per-token compute, and… 27 arXiv — Machine Learning research 1mo ago DynaTrain: Fast Online Parallelism Switching for Elastic LLM Training arXiv:2605.18815v1 Announce Type: new Abstract: Modern large language model (LLM) training is inherently dynamic: resource fluctuations, RLHF phase shifts, and cluster elasticity continually reshape the optimal parallelism layout, posing a significant challenge to existing… 22 arXiv — Machine Learning research 1mo ago Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training arXiv:2605.18822v1 Announce Type: new Abstract: Post-training has become essential for adapting large language models (LLMs) to complex downstream behaviors, including instruction following, preference alignment, and multi-step reasoning. Reinforcement learning with verifiable… 28 arXiv — Machine Learning research 1mo ago TEMPO: Temporal Enforcement via Mode-Separated Policy Optimization for Trustworthy LLM Backtesting arXiv:2605.18843v1 Announce Type: new Abstract: Backtesting large language models on historical events requires reasoning exclusively from information available before a specified cutoff date. Yet models routinely leak post-cutoff knowledge from pre-training into their… 37 arXiv — Machine Learning research 1mo ago HypergraphFormer: Learning Hypergraphs from LLMs for Editable Floor Plan Generation arXiv:2605.18932v1 Announce Type: new Abstract: In this work, we propose HypergraphFormer, a novel and efficient approach to floor plan generation based on learning hypergraph representations with a large language model (LLM). The model is trained via supervised fine-tuning to… 18 arXiv — Machine Learning research 1mo ago Distilling Linearized Behavior for Effective Task Arithmetic arXiv:2605.18993v1 Announce Type: new Abstract: Task vector composition has emerged as a promising paradigm for editing pre-trained models, enabling model merging through addition and unlearning through subtraction. Fine-tuning in the tangent space of a pre-trained model (linear… 20 arXiv — Machine Learning research 1mo ago LoRA vs. Full Fine-Tuning: A Theoretical Perspective arXiv:2605.19018v1 Announce Type: new Abstract: Fine-tuning adapts a pre-trained model to downstream tasks using a small amount of labeled data. Low-Rank Adaptation (LoRA) is an efficient fine-tuning method that reduces memory and computation costs while often achieving… 25 arXiv — Machine Learning research 1mo ago Learning When to Adapt arXiv:2605.19028v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method, yet its learned correction is static: the same low-rank update is applied to every input. This input-agnostic approach creates an inevitable… 38 arXiv — NLP / Computation & Language research 1mo ago Fine-tuning language encoding models on slow fMRI improves prediction for fast ECoG arXiv:2605.19224v1 Announce Type: new Abstract: Neuroscientists have recently turned to intracranial brain recording methods, like electrocorticography (ECoG), for human experiments because of the fine spatial and temporal resolution that they afford. Models trained on this… 11 arXiv — NLP / Computation & Language research 1mo ago EmbGen: Teaching with Reassembled Corpora arXiv:2605.19394v1 Announce Type: new Abstract: Adapting small instruction-tuned models to specialized domains often relies on supervised fine-tuning (SFT) on curated instruction-response examples, which is expensive to collect at scale. Synthetic training examples generated by… 29 Vercel — AI dev-tools 1mo ago Chat SDK now supports callback URLs on buttons and modals You can now pause a Workflow run on a Chat SDK card and resume it when someone clicks a button. The same flow works for form submissions. Buttons and modals accept a new callbackUrl prop, and the event payload is sent to that endpoint. To build a card like this, create a… 36 TechCrunch — AI news-outlet 1mo ago OpenAI co-founder Andrej Karpathy joins Anthropic’s pre-training team Andrej Karpathy has joined Anthropic to work on pre-training. He previously co-founded and worked at OpenAI and led computer vision and AI at Tesla. 27 arXiv — Machine Learning research 1mo ago Goal-Conditioned Supervised Learning for LLM Fine-Tuning arXiv:2605.16345v1 Announce Type: new Abstract: Large language models often require fine-tuning to better align their behavior with user intent at deployment. Existing approaches are commonly divided into online and offline paradigms. Online methods, such as RL-based alignment,… 28 arXiv — Machine Learning research 1mo ago Flow-Direct: Feedback-Efficient and Reusable Guidance for Flow Models via Non-Parametric Guidance Field arXiv:2605.16348v1 Announce Type: new Abstract: Training-free guidance enables pre-trained diffusion and flow models to optimize application-specific objectives using feedback from external black-box reward functions. However, existing methods are feedback-inefficient because… 24 arXiv — Machine Learning research 1mo ago LEAF: A Living Benchmark for Event-Augmented Forecasting arXiv:2605.16358v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly applied to forecasting. To evaluate this capability while mitigating pre-training data contamination, several living benchmarks have been proposed. However, existing benchmarks either… 33 arXiv — Machine Learning research 1mo ago Strategic Over-Parameterization for Generalizable Low-Rank Adaptation arXiv:2605.16470v1 Announce Type: new Abstract: Adapting large language models (LLMs) to downstream tasks via full fine-tuning is increasingly impractical due to its computational and memory demands. Parameter-efficient fine-tuning (PEFT) approaches such as Low-Rank Adaptation… 30 arXiv — Machine Learning research 1mo ago Scalable Knowledge Editing for Mixture-of-Experts LLMs via Tensor-Structured Updates arXiv:2605.16686v1 Announce Type: new Abstract: Knowledge editing (KE) provides a lightweight alternative to repeated fine-tuning of LLMs. However, most existing KE methods target dense feed-forward layers, while modern LLMs increasingly adopt Mixture-of-Experts (MoE)… 14 arXiv — Machine Learning research 1mo ago UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models arXiv:2605.16690v1 Announce Type: new Abstract: Heterogeneous LoRA-rank methods address system heterogeneity in federated fine-tuning of foundation models by assigning client-specific ranks based on computational capabilities. However, these methods achieve only marginal… 32 arXiv — NLP / Computation & Language research 1mo ago MixSD: Mixed Contextual Self-Distillation for Knowledge Injection arXiv:2605.16865v1 Announce Type: new Abstract: Supervised fine-tuning (SFT) is widely used to inject new knowledge into language models, but it often degrades pretrained capabilities such as reasoning and general-domain performance. We argue this forgetting arises because… 15 arXiv — NLP / Computation & Language research 1mo ago Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost? arXiv:2605.16996v1 Announce Type: new Abstract: Can large language models reliably express a human-like personality, or are they merely mimicking surface cues without a stable underlying profile? To investigate this, we induce personality in LLMs by fine-tuning them on the… 14 arXiv — NLP / Computation & Language research 1mo ago Weak-to-Strong Elicitation via Mismatched Wrong Drafts arXiv:2605.17314v1 Announce Type: new Abstract: We consider whether off-policy experience from a smaller, weaker model can elicit capability in a stronger learner that on-policy RL fine-tuning (e.g., GRPO) does not reach. We find that injecting mathematically wrong drafts from a… 25 arXiv — NLP / Computation & Language research 1mo ago Transitivity Meets Cyclicity: Explicit Preference Decomposition for Dynamic Large Language Model Alignment arXiv:2605.17342v1 Announce Type: new Abstract: Standard RLHF relies on transitive scalar rewards, failing to capture the cyclic nature of human preferences. While some approaches like the General Preference Model (GPM) address this, we identify a theoretical limitation: their… 11 arXiv — NLP / Computation & Language research 1mo ago Internalizing Tool Knowledge in Small Language Models via QLoRA Fine-Tuning arXiv:2605.17774v1 Announce Type: new Abstract: Large language models are increasingly used as planning components in agentic systems, but current tool-use pipelines often require full tool schemas to be included in every prompt, creating substantial token overhead and limiting… 24 arXiv — NLP / Computation & Language research 1mo ago A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$\Delta$ Integration into Upcycled MoE arXiv:2605.18083v1 Announce Type: new Abstract: Expanding Large Language Models~(LLMs) to new languages is a costly endeavor, demanding extensive Continued Pre-Training~(CPT) and data-intensive alignment. While recent data-free merging techniques attempt to bypass alignment by… 30 arXiv — NLP / Computation & Language research 1mo ago Ancient Greek to Modern Greek Machine Translation: A Novel Benchmark and Fine-Tuning Experiments on LLMs and NMT Models arXiv:2605.18504v1 Announce Type: new Abstract: Machine Translation (MT) for Ancient Greek (AG) to Modern Greek (MG) is a low-resource task, constrained by the lack of large-scale, high-quality parallel data. We address this gap by introducing the AG-MG Parallel Corpus, a new… 19 Hugging Face official-blog 1mo ago Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation Back to Articles Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation Enterprise + Article Published May 18, 2026 Upvote - Ting-Yun Chang ting-yunc nvidia Miguel Martin miguelmartin-nv nvidia Jonathan Allen nv-spectralflight nvidia Ke Ding kding1… 11 Hugging Face Daily Papers research 1mo ago Follow the Mean: Reference-Guided Flow Matching Abstract Flow matching enables controllable generation through example-based adaptation via conditional endpoint mean adjustment, offering training-free and parametric guidance methods for style and content control. AI-generated summary Existing approaches to controllable… 23 Hugging Face Daily Papers research 1mo ago Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models Abstract SAE-FT enables robust fine-tuning of vision-language models by regularizing visual representations through sparse autoencoder constraints, maintaining performance while improving robustness against distribution shifts. AI-generated summary Large-scale pre-trained… 34 arXiv — Machine Learning research 1mo ago TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination arXiv:2605.15207v1 Announce Type: new Abstract: Multi-agent LLM systems have shown promise for complex reasoning, yet recent evaluations reveal they often underperform single-model baselines. We identify a structural failure mode in sequential fine-tuning of shared-context… 29 arXiv — Machine Learning research 1mo ago Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation arXiv:2605.15239v1 Announce Type: new Abstract: Safety alignment often improves robustness to harmful queries at the cost of reasoning ability, a tradeoff known as the safety tax. A common cause is distributional mismatch: supervised fine-tuning trains the target model on safety… 18 arXiv — Machine Learning research 1mo ago Tadpole: Autoencoders as Foundation Models for 3D PDEs with Online Learning arXiv:2605.15284v1 Announce Type: new Abstract: We introduce Tadpole, a novel foundation model for three-dimensional partial differential equations (PDEs) that addresses key challenges in transferability, scalability to high dimensionality, and multi-functionality. Tadpole is… 12 arXiv — Machine Learning research 1mo ago Representation Without Reward: A JEPA Audit for LLM Fine-Tuning arXiv:2605.15394v1 Announce Type: new Abstract: Joint-embedding predictive architectures (JEPAs) propose that a model should learn more useful abstractions when trained to predict latent representations rather than observed outputs. For autoregressive language-model fine-tuning… 30 arXiv — Machine Learning research 1mo ago Towards Code-Oriented LM Embeddings for Surrogate-Assisted Neural Architecture Search arXiv:2605.15649v1 Announce Type: new Abstract: Developing effective surrogates (performance predictors) for Neural Architecture Search (NAS) typically requires expensive fine-tuning or the engineering of complex representations. We propose a low-cost embedding strategy that… 36 arXiv — Machine Learning research 1mo ago AOT-POT: Adaptive Operator Transformation for Large-Scale PDE Pre-training arXiv:2605.15793v1 Announce Type: new Abstract: Pre-training neural operators on diverse partial differential equation (PDE) datasets has emerged as a promising direction for building general-purpose surrogate models in scientific machine learning. However, the inherent… 27 arXiv — Machine Learning research 1mo ago CHoE: Cross-Domain Heterogeneous Graph Prompt Learning via Structure-Conditioned Experts arXiv:2605.15888v1 Announce Type: new Abstract: Heterogeneous Graph Prompt Learning (HGPL)has emerged as a promising paradigm for bridging the gap between the objectives of pre-training foundation models and their downstream applications in heterogeneous graph settings. However,… 31 arXiv — Machine Learning research 1mo ago LoCO: Low-rank Compositional Rotation Fine-tuning arXiv:2605.15916v1 Announce Type: new Abstract: Parameter-efficient fine-tuning (PEFT) has emerged as an critical technique for adapting large-scale foundation models across natural language processing and computer vision. While existing methods such as low-rank adaptations… 4 Page 7 of 9 · 422 articles ← Newer Older →