Tag

Training

422 articles archived under #training · RSS

arXiv — Machine Learning research 1mo ago

Density-aware Sample-specific Attack

arXiv:2605.27809v1 Announce Type: new Abstract: Despite recent progress in backdoor attacks, existing methods remain susceptible to post-training defenses that erase the backdoor through fine-tuning or pruning. We revisit the core objectives of backdoor attacks and derive…

29
arXiv — Machine Learning research 1mo ago

CAREF: Calibration-Aware Regularization for Explanation Faithfulness Without Rationale Supervision

arXiv:2605.27835v1 Announce Type: new Abstract: We introduce CAREF, a parameter-efficient fine-tuning framework that jointly optimizes predictive accuracy and explanation faithfulness via calibration-aware regularization. At its core, CAREF couples entropy-based calibration with…

34
arXiv — Machine Learning research 1mo ago

Continual Learning in Modern Hopfield Networks with an Application to Diffusion Models

arXiv:2605.27975v1 Announce Type: new Abstract: Generative models, including diffusion models, are increasingly used as foundation models and adapted through sequential fine-tuning, making continual learning an essential problem setting. However, continual learning in such…

33
arXiv — Machine Learning research 1mo ago

SPARD: Defending Harmful Fine-Tuning Attack via Safety Projection with Relevance-Diversity Data Selection

arXiv:2605.28030v1 Announce Type: new Abstract: Fine-tuning large language models often undermines their safety alignment, a problem further amplified by harmful fine-tuning attacks in which adversarial data removes safeguards and induces unsafe behaviors. We propose SPARD, a…

22
arXiv — NLP / Computation & Language research 1mo ago

From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons

arXiv:2605.27387v1 Announce Type: new Abstract: Diffusion models promise efficient parallel text generation but rely on bidirectional attention, creating a structural mismatch with pre-trained Autoregressive (AR) models. This incompatibility precludes reusing robust AR priors,…

4
arXiv — NLP / Computation & Language research 1mo ago

The Missing Piece in Pre-trained Model Evaluation: Reward-Guided Decoding Unlocks Task-Oriented Behavior Without Parameter Updates

arXiv:2605.28020v1 Announce Type: new Abstract: With the rapid progress of large language models (LLMs), reliably evaluating the capabilities of pre-trained LLMs has become increasingly important. The challenge is that base pre-trained models are optimized for next-token…

29
arXiv — NLP / Computation & Language research 1mo ago

Routing-Aligned Fine-Tuning for Multilingual Downstream Tasks in Mixture-of-Experts Models

arXiv:2605.28306v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models have emerged as a dominant paradigm for efficient LLM scaling, yet adapting them to non-English downstream tasks remains challenging. Existing fine-tuning approaches treat MoE models as monolithic…

37
r/LocalLLaMA community 1mo ago

Gemma-4-Harmonia-31B-Uncensored-Heretic Is Out Now, a Merge of Multiple gemma-4-31B-it Finetunes Designed for a Targeted Approach to Deep Neural Consolidation, Minimizing Regression While Amplifying Unique Capability Boundaries. With KLD 0.0047 and 9/100 Refusals!

Provided in both Safetensors and GGUFs. Safetensors, llmfan46/Gemma-4-Harmonia-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic GGUFs, llmfan46/Gemma-4-Harmonia-31B-it-uncensored-heretic-GGUF:…

14
r/MachineLearning community 1mo ago

UK GDPR Small Business Q&A — 5,000 synthetic pairs with article-level citations [D]

Dataset for fine-tuning compliance assistants. Each pair includes: - A practical SME-facing question ("Can I use pre-ticked consent boxes?") - An answer with specific UK GDPR article references, ICO guidance by name, and actionable steps - Source metadata: which GDPR concepts…

23
r/LocalLLaMA community 1mo ago

I built a 103B-token Usenet corpus (1980–2013) — pre-web, human-only, zero AI contamination. Got strong traction on r/ML, thought this community would find it useful.

Posted this to r/MachineLearning a couple weeks ago (30K views, 100+ upvotes) and have been meaning to share it here where the fine-tuning angle is more directly relevant. I spent years building and processing a complete Usenet corpus from 1980 to 2013. Here’s why it might…

37
r/LocalLLaMA community 1mo ago

ReAligned-Qwen3.5 Release

New from Lazarus AI and Eric Hartford, creator of Dolphin and Samantha, announcing the release of the ReAligned-Qwen3.5 series of models. Apache 2.0 license, finetuned to reduce Chinese ideological bias and censorship, refusal behavior, and state-narrative framing. I use SFT +…

19
Hugging Face Daily Papers research 1mo ago

NSF-SciFy: Mining the NSF Awards Database for Scientific Claims

Abstract NSF-SciFy is a large-scale dataset of scientific claims and investigation proposals extracted from NSF award abstracts, enabling improved language model fine-tuning for claim verification and scientific discovery tracking. AI-generated summary We introduce NSF-SciFy, a…

22
Hugging Face Daily Papers research 1mo ago

Understanding Data Temporality Impact on Large Language Models Pre-training

Abstract Pre-training large language models on temporally ordered data improves their factual freshness and temporal precision compared to standard shuffled pre-training while maintaining general language understanding capabilities. AI-generated summary Large language models…

4
arXiv — Machine Learning research 1mo ago

GEM: Geometric Entropy Mixing for Optimal LLM Data Curation

arXiv:2605.26121v1 Announce Type: new Abstract: LLM pre-training efficacy increasingly depends on data composition rather than sheer volume. Yet, optimal mixing is hindered by categorization flaws: human taxonomies suffer from ontological misalignment, and Euclidean clustering…

27
arXiv — Machine Learning research 1mo ago

GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training

arXiv:2605.26184v1 Announce Type: new Abstract: Hybrid post-training usually combines supervised fine-tuning and reinforcement learning, but fixed mixing schedules cannot adapt when the relative noise of the two signals changes over time. We propose GAC, a noise-aware controller…

24
arXiv — Machine Learning research 1mo ago

Curriculum Learning for Safety Alignment

arXiv:2605.26315v1 Announce Type: new Abstract: Direct Preference Optimisation (DPO) is widely used for safety alignment in large language models. However, prior work shows it is brittle and exhibits poor out-of-distribution (OOD) generalisation. In this paper, we investigate…

20
arXiv — Machine Learning research 1mo ago

Aperiodic and Low-Frequency Spectral Bias in Reconstruction based EEG Foundation Models

arXiv:2605.26434v1 Announce Type: new Abstract: EEG foundation models, pre-trained on large-scale unlabelled EEG data, have emerged as a promising direction towards learning generalizable EEG representations. Despite showing positive results in data-rich regimes, they often fail…

23
arXiv — Machine Learning research 1mo ago

Extra-Merge: Tracing the Rank-1 Subspace of Model Merging in Language Model Pre-Training

arXiv:2605.26484v1 Announce Type: new Abstract: Model merging has emerged as a lightweight paradigm for enhancing Large Language Models (LLMs), yet its underlying mechanisms remain poorly understood. In this work, we analyze late-stage pre-training trajectories and uncover a…

16
arXiv — Machine Learning research 1mo ago

The Stability of Singular Distribution: A Spectral Perspective on the Two-Phase Dynamics of Language Model Pre-training

arXiv:2605.26489v1 Announce Type: new Abstract: Large language model pre-training typically exhibits a two-phase trajectory: a fast initial loss drop followed by a prolonged slow improvement. We identify an underlying spectral phenomenon, Stability of Singular Distribution…

11
arXiv — Machine Learning research 1mo ago

Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models

arXiv:2605.26491v1 Announce Type: new Abstract: Preference optimization has emerged as an efficient alternative to online reinforcement learning from human feedback (RLHF) for aligning text-to-image diffusion models. However, existing methods largely reduce supervision to binary…

10
arXiv — Machine Learning research 1mo ago

Open-Weight LLM Fine-Tuning Defenses are Susceptible to Simple Attacks

arXiv:2605.26526v1 Announce Type: new Abstract: Recent defenses for safeguarding open-weight large language models (LLMs) are intended to prevent adversarial usage. Underlying these defenses is an assumption that new harmful behavior is learned through fine-tuning rather than…

33
arXiv — NLP / Computation & Language research 1mo ago

Learning to Adapt SFT Data for Better Reasoning Generalization

arXiv:2605.26924v1 Announce Type: new Abstract: Large language models (LLMs) have achieved remarkable progress, with post-training playing a crucial role in enhancing their reasoning capabilities. Among post-training paradigms, supervised fine-tuning (SFT) is widely used: it…

8
arXiv — Machine Learning research 1mo ago

Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing

arXiv:2605.24052v1 Announce Type: new Abstract: To better serve users' demands in mobile applications (e.g., navigation), mobile crowdsourcing platforms can iteratively align large language model (LLM)-generated content (e.g., AI-generated traffic condition predictions) with…

10
arXiv — Machine Learning research 1mo ago

Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning

arXiv:2605.24058v1 Announce Type: new Abstract: On-device adaptation of large language models commonly keeps a quantized base model frozen while training and deploying a small, task-specific LoRA adapter. In the unmerged adapter-mode setting, however, the adapter is more than a…

28
arXiv — Machine Learning research 1mo ago

Bilevel Optimization of Synthetic Trajectories for Multi-Turn LLM Fine-Tuning

arXiv:2605.24743v1 Announce Type: new Abstract: While LLMs excel at single-turn generation, they struggle with long-horizon, multi-turn interactions. Offline reinforcement learning (RL) offers a scalable approach, yet its performance hinges on the availability and quality of…

34
arXiv — NLP / Computation & Language research 1mo ago

Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions

arXiv:2605.24452v1 Announce Type: new Abstract: Legal NLP benchmarks evaluate models on randomly split data, implicitly assuming that legal language is stationary. We test this assumption by fine-tuning four transformer encoders -- XLM-RoBERTa (base and large) and their…

15
arXiv — NLP / Computation & Language research 1mo ago

Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs

arXiv:2605.24681v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown great promise in multilingual machine translation (MT), even with limited bilingual supervision. However, fine-tuning LLMs with parallel corpora presents major challenges, namely parameter…

28
arXiv — NLP / Computation & Language research 1mo ago

NITP: Next Implicit Token Prediction for LLM Pre-training

arXiv:2605.24956v1 Announce Type: new Abstract: Standard next-token prediction (NTP) supervises language models solely through discrete labels in the output logit space. We argue that this sparse one-hot supervision leaves the latent representation space under-constrained,…

23
arXiv — Machine Learning research 1mo ago

FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning

arXiv:2605.22869v1 Announce Type: new Abstract: Both full fine-tuning (Full FT) and parameter-efficient fine-tuning methods such as LoRA introduce weight updates without accounting for the spectral structure established during pretraining. As a result, noisy gradients from…

36
arXiv — Machine Learning research 1mo ago

Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning

arXiv:2605.23171v1 Announce Type: new Abstract: Recent advancements in instructional fine-tuning have injected noise into embeddings, with NEFTune (Jain et al., 2024) setting benchmarks using uniform noise. Despite NEFTune's empirical findings that uniform noise outperforms…

37
arXiv — Machine Learning research 1mo ago

RelPrism: A Multi-Faceted Pre-training Framework with Self-Generated Tasks for Relational Databases

arXiv:2605.23241v1 Announce Type: new Abstract: Relational databases (RDBs) remain the cornerstone of modern data systems and support diverse predictive tasks. Recent relational deep learning (RDL) methods enable end-to-end prediction by converting RDBs into graphs, where rows…

16
arXiv — Machine Learning research 1mo ago

Convex Optimization for Alignment and Preference Learning on a Single GPU

arXiv:2605.23244v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) to align with human preferences has driven the success of systems such as Gemini and ChatGPT. However, approaches like Reinforcement Learning from Human Feedback (RLHF) remain…

20
arXiv — Machine Learning research 1mo ago

Diffusion Domain Expansion: Learning to Coordinate Pre-trained Diffusion Models

arXiv:2605.23275v1 Announce Type: new Abstract: In this paper, we propose Diffusion Domain Expansion (DDE), a method that efficiently extends pre-trained diffusion models to generate larger objects and handle more complex conditioning beyond their original capabilities. Our…

27
arXiv — NLP / Computation & Language research 1mo ago

Learnability-Informed Fine-Tuning of Diffusion Language Models

arXiv:2605.22939v1 Announce Type: new Abstract: We aim to improve the reasoning capabilities of diffusion language models (DLMs). While SFT is a popular post-training recipe for autoregressive models, its use in DLMs faces challenges and can even hurt performance, though the…

20
arXiv — NLP / Computation & Language research 1mo ago

Structure-Guided Entity Resolution: Fine-Tuning LLMs for Robust Name Matching in Complex Linguistic Contexts

arXiv:2605.23597v1 Announce Type: new Abstract: Matching person names across heterogeneous records is a core challenge in entity resolution, especially within linguistically and culturally complex environments. Variations in naming conventions, inconsistent transliteration…

24
arXiv — NLP / Computation & Language research 1mo ago

Is a Document Educational or Just Wikipedia-Style? -- Pitfalls of Classifier-Based Quality Filtering

arXiv:2605.23721v1 Announce Type: new Abstract: Classifier-based Quality Filtering has recently emerged as a fundamental technique in constructing pre-training corpora. The ability to deploy a single model that can replace or supplement a set of heuristics has proven effective…

35
arXiv — NLP / Computation & Language research 1mo ago

Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

arXiv:2510.00526v3 Announce Type: replace Abstract: Supervised fine-tuning (SFT) is the standard approach for post-training large language models (LLMs), yet it often shows limited generalization. We trace this limitation to its default training objective: negative log…

13
arXiv — NLP / Computation & Language research 1mo ago

Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches

arXiv:2512.12677v2 Announce Type: replace Abstract: We explore efficient strategies to fine-tune decoder-only Large Language Models (LLMs) for downstream text classification under resource constraints. Two approaches are investigated: (1) attaching a classification head to a…

4
r/LocalLLaMA community 1mo ago

llama.cpp has a clever trick for speeding up KV cache decode

So, I use llama-server as my endpoint to run local models and connect them to Open-WebUI, Hermes, and OpenCode. But since llama.cpp's webUI has been receiving a lot of updates, I took a look at its settings and noticed a particular one under developer options. This is the…

23
r/LocalLLaMA community 1mo ago

G4-MeroMero-26B-A4B-it-uncensored-heretic Is Out Now, a Finetune of gemma-4-26B-A4B-it, With KLD of 0.0152 and 12/100 Refusals!

When I previously posted the uncensored version of the 31B version of the MeroMero finetune, quite a few people asked for the 26B-A4B version, I wasn't so keen on it because I considered the 31B to be the better version, but I understand that people might want the 26B-A4B…

20
Hugging Face Daily Papers research 1mo ago

Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators

Abstract Audio diffusion models are adapted for interactive music generation through efficient block-wise processing and novel training paradigms that enable real-time performance on consumer hardware. AI-generated summary Interactive streaming music generation promises the use…

11
r/LocalLLaMA community 1mo ago

Low-level coding dataset

Hi all, I've recently been thinking about putting together a community sourced coding dataset for finetuning models, with a heavy focus on cpp and systems programming. My goal is to eventually have a model (say a finetune of Qwen3.6-27b) that is good at stuff like memory…

15
arXiv — Machine Learning research 1mo ago

From Parameters to Data: A Task-Parameter-Guided Fine-Tuning Pipeline for Efficient LLM Alignment

arXiv:2605.21558v1 Announce Type: new Abstract: Adapting Large Language Models (LLMs) to specialized domains typically incurs high data and computational overhead. While prior efficiency efforts have largely treated data selection and parameter-efficient fine-tuning as isolated…

38
arXiv — NLP / Computation & Language research 1mo ago

Token-weighted Direct Preference Optimization with Attention

arXiv:2605.21883v1 Announce Type: new Abstract: Direct Preference Optimization (DPO) aligns Large Language Models with human preferences without the need for a separate reward model. However, DPO treats all tokens in responses equally, neglecting the differing importance of…

5
arXiv — NLP / Computation & Language research 1mo ago

Modeling Pathology-Like Behavioral Patterns in Language Models Through Behavioral Fine-Tuning

arXiv:2605.22356v1 Announce Type: new Abstract: Large language models are increasingly used as computational tools for modeling human-like behavior. We introduce a behavioral induction framework that modifies model policies through fine-tuning on structured decision-making…

19
arXiv — NLP / Computation & Language research 1mo ago

Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion

arXiv:2605.22579v1 Announce Type: new Abstract: Recent work has identified a counterintuitive phenomenon termed "Hyperfitting", where fine-tuning Large Language Models (LLMs) to near-zero training loss on small datasets surprisingly enhances open-ended generation quality and…

16
arXiv — NLP / Computation & Language research 1mo ago

Understanding Data Temporality Impact on Large Language Models Pre-training

arXiv:2605.22769v1 Announce Type: new Abstract: Large language models (LLMs) are typically trained on shuffled corpora, yielding models whose knowledge is frozen at train time and whose temporal grounding remains poorly understood. In this work, we study the impact of…

14
llama.cpp releases dev-tools 1mo ago

b9276

server: expose prompt token counts in /slots endpoint ( #23454 ) Add n_prompt_tokens, n_prompt_tokens_processed, and n_prompt_tokens_cache to the /slots JSON response. These fields are already tracked internally but were not exposed, making it impossible for clients to monitor…

15
r/LocalLLaMA community 1mo ago

LatitudeGames/Equinox-31B · Hugging Face

new model from LatitudeGames - Gemma 31B finetune https://huggingface.co/LatitudeGames/Equinox-31B-GGUF Equinox draws its name from the balance between extremes. Trained on a balanced blend of Wayfarer 2 's unforgiving dark adventures and Hearthfire 's quiet slice-of-life…

14
r/LocalLLaMA community 1mo ago

I'm running an agentic system with kobold.cpp as my backend. Am I losing performance?

Currently, I'm running a Hermes agent with an OpenAI v1 compatible endpoint provided by Kobold. My setup is a a 24GB 3090Ti + 512GB DDR4 running Qwen3.6-35B-A3B. I plan to move to a larger MoE model once I'm satisfied with how everything is working, but I'm just wondering if I'm…

33

Density-aware Sample-specific Attack

CAREF: Calibration-Aware Regularization for Explanation Faithfulness Without Rationale Supervision

Continual Learning in Modern Hopfield Networks with an Application to Diffusion Models

SPARD: Defending Harmful Fine-Tuning Attack via Safety Projection with Relevance-Diversity Data Selection

From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons

The Missing Piece in Pre-trained Model Evaluation: Reward-Guided Decoding Unlocks Task-Oriented Behavior Without Parameter Updates

Routing-Aligned Fine-Tuning for Multilingual Downstream Tasks in Mixture-of-Experts Models

Gemma-4-Harmonia-31B-Uncensored-Heretic Is Out Now, a Merge of Multiple gemma-4-31B-it Finetunes Designed for a Targeted Approach to Deep Neural Consolidation, Minimizing Regression While Amplifying Unique Capability Boundaries. With KLD 0.0047 and 9/100 Refusals!

UK GDPR Small Business Q&A — 5,000 synthetic pairs with article-level citations [D]

I built a 103B-token Usenet corpus (1980–2013) — pre-web, human-only, zero AI contamination. Got strong traction on r/ML, thought this community would find it useful.

ReAligned-Qwen3.5 Release

NSF-SciFy: Mining the NSF Awards Database for Scientific Claims

Understanding Data Temporality Impact on Large Language Models Pre-training

GEM: Geometric Entropy Mixing for Optimal LLM Data Curation

GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training

Curriculum Learning for Safety Alignment

Aperiodic and Low-Frequency Spectral Bias in Reconstruction based EEG Foundation Models

Extra-Merge: Tracing the Rank-1 Subspace of Model Merging in Language Model Pre-Training

The Stability of Singular Distribution: A Spectral Perspective on the Two-Phase Dynamics of Language Model Pre-training

Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models

Open-Weight LLM Fine-Tuning Defenses are Susceptible to Simple Attacks

Learning to Adapt SFT Data for Better Reasoning Generalization

Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing

Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning

Bilevel Optimization of Synthetic Trajectories for Multi-Turn LLM Fine-Tuning

Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions

Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs

NITP: Next Implicit Token Prediction for LLM Pre-training

FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning

Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning

RelPrism: A Multi-Faceted Pre-training Framework with Self-Generated Tasks for Relational Databases

Convex Optimization for Alignment and Preference Learning on a Single GPU

Diffusion Domain Expansion: Learning to Coordinate Pre-trained Diffusion Models

Learnability-Informed Fine-Tuning of Diffusion Language Models

Structure-Guided Entity Resolution: Fine-Tuning LLMs for Robust Name Matching in Complex Linguistic Contexts

Is a Document Educational or Just Wikipedia-Style? -- Pitfalls of Classifier-Based Quality Filtering

Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches

llama.cpp has a clever trick for speeding up KV cache decode

G4-MeroMero-26B-A4B-it-uncensored-heretic Is Out Now, a Finetune of gemma-4-26B-A4B-it, With KLD of 0.0152 and 12/100 Refusals!

Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators

Low-level coding dataset

From Parameters to Data: A Task-Parameter-Guided Fine-Tuning Pipeline for Efficient LLM Alignment

Token-weighted Direct Preference Optimization with Attention

Modeling Pathology-Like Behavioral Patterns in Language Models Through Behavioral Fine-Tuning

Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion

Understanding Data Temporality Impact on Large Language Models Pre-training

b9276

LatitudeGames/Equinox-31B · Hugging Face

I'm running an agentic system with kobold.cpp as my backend. Am I losing performance?