Tag

Training

38 articles archived under #training · RSS

r/LocalLLaMA community 5h ago

I made a UI and server for using Anthropic's new Natural Language Autoencoders locally with llama.cpp

Anthropic's first open weight models, Natural Language Autoencoders , are just finetunes of popular open weight models. They do not modify architecture and modeling code so inference with llama.cpp is mostly trivial. I packaged every feature of NLAs (namely activation…

34
Hugging Face Daily Papers research 7h ago

Efficient Pre-Training with Token Superposition

Abstract Token-Superposition Training (TST) improves pre-training efficiency by combining contiguous tokens into bags during a superposition phase with multi-hot cross-entropy objective, achieving faster training times without architectural changes. AI-generated summary…

30
Hugging Face Daily Papers research 8h ago

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

Abstract ORBIT addresses catastrophic forgetting in large language model fine-tuning for generative retrieval by tracking parameter distances and employing weight averaging to maintain model performance. AI-generated summary Despite the rapid advancements in large language model…

7
TechCrunch — AI news-outlet 11h ago

Adaption aims big with AutoScientist, an AI tool that helps models train themselves

Adaption's new AutoScientist tool is designed to let models adapt to specific capabilities quickly through an automated approach to conventional fine-tuning.

17
Hugging Face Daily Papers research 18h ago

L2P: Unlocking Latent Potential for Pixel Generation

Abstract Latent-to-Pixel transfer paradigm efficiently leverages pre-trained latent diffusion models to create pixel-space models with minimal training overhead and high-resolution generation capabilities. AI-generated summary Pixel diffusion models have recently regained…

14
Hugging Face Daily Papers research 18h ago

FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning

Abstract Training framework FocuSFT improves long-context language model performance by addressing attention allocation issues through bilevel optimization with parametric memory that focuses attention on semantically relevant content. AI-generated summary Large language models…

25
arXiv — Machine Learning research 19h ago

Rotation-Preserving Supervised Fine-Tuning

arXiv:2605.10973v1 Announce Type: new Abstract: Supervised fine-tuning (SFT) improves in-domain performance but can degrade out-of-domain (OOD) generalization. Prior work suggests that this degradation is related to changes in dominant singular subspaces of pretrained weight…

22
arXiv — Machine Learning research 19h ago

$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin

arXiv:2605.10981v1 Announce Type: new Abstract: Reference-free preference optimization has emerged as an efficient alternative to reinforcement learning from human feedback, with Simple Preference Optimization(SimPO) demonstrating strong performance by eliminating the explicit…

23
arXiv — Machine Learning research 19h ago

Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training

arXiv:2605.11134v1 Announce Type: new Abstract: Preference learning methods such as Direct Preference Optimization (DPO) are known to induce reliance on spurious correlations, leading to sycophancy and length bias in today's language models and potentially severe goal…

13
arXiv — Machine Learning research 19h ago

Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

arXiv:2605.11235v1 Announce Type: new Abstract: In LLM Reinforcement Fine-Tuning (RFT), curriculum learning drives both efficiency and performance. Yet, current methods externalize curriculum judgment via handcrafted heuristics or auxiliary models, risking misalignment with the…

18
arXiv — Machine Learning research 19h ago

The tractability landscape of diffusion alignment: regularization, rewards, and computational primitives

arXiv:2605.11361v1 Announce Type: new Abstract: Inference-time reward alignment asks how to turn a pre-trained diffusion model with base law $p$ into a sampler that favors a reward $r$ while remaining close to $p$. Since there is no canonical distributional distance for this…

27
arXiv — Machine Learning research 19h ago

Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies

arXiv:2605.11387v1 Announce Type: new Abstract: We address the problem of fine-tuning pre-trained generative policies with reinforcement learning (RL) while preserving the multimodality of their action distributions. Existing methods for RL fine-tuning of generative policies…

17
arXiv — Machine Learning research 19h ago

Efficient Adjoint Matching for Fine-tuning Diffusion Models

arXiv:2605.11480v1 Announce Type: new Abstract: Reward fine-tuning has become a common approach for aligning pretrained diffusion and flow models with human preferences in text-to-image generation. Among reward-gradient-based methods, Adjoint Matching (AM) provides a principled…

30
arXiv — NLP / Computation & Language research 19h ago

Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training

arXiv:2605.11416v1 Announce Type: new Abstract: Selective layer-wise updates are essential for low-cost continued pre-training of Large Language Models (LLMs), yet determining which layers to freeze or train remains an empirical black-box problem due to the lack of interpretable…

28
arXiv — NLP / Computation & Language research 19h ago

A Study on Hidden Layer Distillation for Large Language Model Pre-Training

arXiv:2605.11513v1 Announce Type: new Abstract: Knowledge Distillation (KD) is a critical tool for training Large Language Models (LLMs), yet the majority of research focuses on approaches that rely solely on output logits, neglecting semantic information in the teacher's…

25
arXiv — NLP / Computation & Language research 19h ago

When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models

arXiv:2605.11612v1 Announce Type: new Abstract: Backdoor vulnerabilities widely exist in the fine-tuning of large language models(LLMs). Most backdoor poisoning methods operate mainly at the token level and lack deeper semantic manipulation, which limits stealthiness. In…

25
arXiv — NLP / Computation & Language research 19h ago

Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

arXiv:2605.11685v1 Announce Type: new Abstract: Large language model (LLM) unlearning aims to remove specific data influences from pre-trained model without costly retraining, addressing privacy, copyright, and safety concerns. However, recent studies reveal a critical…

17
arXiv — NLP / Computation & Language research 19h ago

On Predicting the Post-training Potential of Pre-trained LLMs

arXiv:2605.11978v1 Announce Type: new Abstract: The performance of Large Language Models (LLMs) on downstream tasks is fundamentally constrained by the capabilities acquired during pre-training. However, traditional benchmarks like MMLU often fail to reflect a base model's…

11
arXiv — NLP / Computation & Language research 19h ago

Mitigating Context-Memory Conflicts in LLMs through Dynamic Cognitive Reconciliation Decoding

arXiv:2605.12185v1 Announce Type: new Abstract: Large language models accumulate extensive parametric knowledge through pre-training. However, knowledge conflicts occur when outdated or incorrect parametric knowledge conflicts with external knowledge in the context. Existing…

27
arXiv — NLP / Computation & Language research 19h ago

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

arXiv:2605.12288v1 Announce Type: new Abstract: Direct Preference Optimization (DPO) is a widely used RL-free method for aligning language models from pairwise preferences, but it models preferences over full sequences even though generation is driven by per-token decisions.…

12
arXiv — NLP / Computation & Language research 19h ago

Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation

arXiv:2605.12345v1 Announce Type: new Abstract: Parameter-efficient fine-tuning (PEFT) techniques offer task-specific fine-tuning at a fraction of the cost of full fine-tuning, but require separate fine-tuning for every new task (combination). In this paper, we explore three…

25
arXiv — NLP / Computation & Language research 19h ago

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

arXiv:2605.12419v1 Announce Type: new Abstract: Despite the rapid advancements in large language model (LLM) development, fine-tuning them for specific tasks often results in the catastrophic forgetting of their general, language-based reasoning abilities. This work investigates…

24
r/LocalLLaMA community 23h ago

Fine-Tuning TranslateGemma-4B to improve bi-directional English & Welsh translations on an H200 GPU!

Open source repo: https://github.com/grctest/finetuned-gemmatranslate-cy 5% of the fine-tuning took 40 minutes and cost a couple dollars to prove the process works. Looking forwards to Flash Attention v4 to leave beta, to test fine-tuning performance on a B200 on the cloud,…

16
NVIDIA Developer Blog official-blog 1d ago

How to Eliminate Pipeline Friction in AI Model Serving

The path from a trained AI model to production should be smooth, but rarely is. Many teams invest weeks fine-tuning models, only to discover that exporting to a...

17
Simon Willison community 1d ago

llm 0.32a2

Release: llm 0.32a2 A bunch of useful stuff in this LLM alpha, but the most important detail is this one: Most reasoning-capable OpenAI models now use the /v1/responses endpoint instead of /v1/chat/completions . This enables interleaved reasoning across tool calls for GPT-5…

22
r/MachineLearning community 1d ago

TabPFN-3 just released: a pre-trained tabular foundation model for up to 1M rows [R][N]

TabPFN-3 was released today, the next iteration of the tabular foundation model, originally published in Nature. Quick recap for anyone new to TabPFN: TabPFN predicts on tabular data in a single forward pass - no training, no hyperparameter search, no tuning. Built on TabPFN-2.5…

31
r/LocalLLaMA community 1d ago

examples : add llama-eval by ggerganov · Pull Request #21152 · ggml-org/llama.cpp

now you can evaluate your models at home, sounds like a perfect tool to compare quants and finetunes Datasets: AIME, AIME2025, GSM8K, GPQA   submitted by   /u/jacek2023 [link]   [comments]

15
OpenAI Python SDK releases dev-tools 9d ago

v2.34.0

2.34.0 (2026-05-04) Full Changelog: v2.33.0...v2.34.0 Features api: add external_key_id to projects, email/metadata params to users, update types ( 2d232ee ) api: add support for Admin API Keys per endpoint ( b8b176a ) api: admin API updates ( 4ae1138 ) api: manual updates (…

15
ComfyUI releases dev-tools 16d ago

v0.20.1

What's Changed feat: SUPIR model support (CORE-17) by @kijai in #13250 Some optimizations to make Ernie inference a bit faster. by @comfyanonymous in #13472 fix: append directory type annotation to internal files endpoint (CORE-71) by @Abdulrehman-PIAIC80387 in #13305 Add link…

25
NVIDIA Developer Blog official-blog 18d ago

Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints

DeepSeek just launched its fourth generation of flagship models with DeepSeek-V4-Pro and DeepSeek-V4-Flash, both targeted at enabling highly efficient...

5
Vercel — AI dev-tools 1mo ago

How Waldium made a blog platform work for humans and AI alike

Waldium is a two-person, YC-backed startup that built an agentic CMS for businesses. Co-founded by Amrutha Gujjar and CTO Shivam Singhal, the platform automates content research and creation, and gives every customer blog its own MCP server endpoint so AI agents can query it…

9
OpenAI Python SDK releases dev-tools 1mo ago

v2.30.0

2.30.0 (2026-03-25) Full Changelog: v2.29.0...v2.30.0 Features api: add keys field to Click/DoubleClick/Drag/Move/Scroll computer actions ( ee1bbed ) Bug Fixes api: align SDK response types with expanded item schemas ( f3f258a ) sanitize endpoint path params ( 89f6698 ) types:…

11
Smol AI News news-outlet 1mo ago

not much happened today

**Cursor's Composer 2**, built on **Kimi K2.5**, sparked discussion over model attribution and licensing, highlighting a shift toward post-trained derivatives of open-source models with domain-specific fine-tuning and reinforcement learning. **Claude Code** is expanding into…

36
OpenAI Python SDK releases dev-tools 1mo ago

v2.29.0

2.29.0 (2026-03-17) Full Changelog: v2.28.0...v2.29.0 Features api: 5.4 nano and mini model slugs ( 3b45666 ) api: add /v1/videos endpoint to batches create method ( c0e7a16 ) api: add defer_loading field to ToolFunction ( 3167595 ) api: add in and nin operators to…

21
NVIDIA Developer Blog official-blog 2mo ago

Develop Native Multimodal Agents with Qwen3.5 VLM Using NVIDIA GPU-Accelerated Endpoints

Alibaba has introduced the new open source Qwen3.5 series built for native multimodal agents. The first model in this series is a ~400B parameter native...

25
Hugging Face official-blog 5mo ago

20x Faster TRL Fine-tuning with RapidFire AI

Back to Articles 20x Faster TRL Fine-tuning with RapidFire AI Published November 21, 2025 Update on GitHub Upvote 27 Kamran Bigdely kbigdelysh rapidfire-ai-inc Arun Kumar arunkk09 rapidfire-ai-inc Quentin Gallouédec qgallouedec Hugging Face TRL now officially integrates with…

13
Lil'Log (Lilian Weng) research 31mo ago

Adversarial Attacks on LLMs

The use of large language models in the real world has strongly accelerated by the launch of ChatGPT. We (including my team at OpenAI, shoutout to them) have invested a lot of effort to build default safe behavior into the model during the alignment process (e.g. via RLHF ).…

5
Eugene Yan research 33mo ago

Patterns for Building LLM-based Systems & Products

Evals, RAG, fine-tuning, caching, guardrails, defensive UX, and collecting user feedback.

22

I made a UI and server for using Anthropic's new Natural Language Autoencoders locally with llama.cpp

Efficient Pre-Training with Token Superposition

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

Adaption aims big with AutoScientist, an AI tool that helps models train themselves

L2P: Unlocking Latent Potential for Pixel Generation

FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning

Rotation-Preserving Supervised Fine-Tuning

$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin

Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training

Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

The tractability landscape of diffusion alignment: regularization, rewards, and computational primitives

Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies

Efficient Adjoint Matching for Fine-tuning Diffusion Models

Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training

A Study on Hidden Layer Distillation for Large Language Model Pre-Training

When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models

Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

On Predicting the Post-training Potential of Pre-trained LLMs

Mitigating Context-Memory Conflicts in LLMs through Dynamic Cognitive Reconciliation Decoding

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

Fine-Tuning TranslateGemma-4B to improve bi-directional English & Welsh translations on an H200 GPU!

How to Eliminate Pipeline Friction in AI Model Serving

llm 0.32a2

TabPFN-3 just released: a pre-trained tabular foundation model for up to 1M rows [R][N]

examples : add llama-eval by ggerganov · Pull Request #21152 · ggml-org/llama.cpp

v2.34.0

v0.20.1

Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints

How Waldium made a blog platform work for humans and AI alike

v2.30.0

not much happened today

v2.29.0

Develop Native Multimodal Agents with Qwen3.5 VLM Using NVIDIA GPU-Accelerated Endpoints

20x Faster TRL Fine-tuning with RapidFire AI

Adversarial Attacks on LLMs

Patterns for Building LLM-based Systems & Products