Tag

Training

423 articles archived under #training · RSS

arXiv — NLP / Computation & Language research 1mo ago

When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models

arXiv:2605.11612v1 Announce Type: new Abstract: Backdoor vulnerabilities widely exist in the fine-tuning of large language models(LLMs). Most backdoor poisoning methods operate mainly at the token level and lack deeper semantic manipulation, which limits stealthiness. In…

25
arXiv — NLP / Computation & Language research 1mo ago

Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

arXiv:2605.11685v1 Announce Type: new Abstract: Large language model (LLM) unlearning aims to remove specific data influences from pre-trained model without costly retraining, addressing privacy, copyright, and safety concerns. However, recent studies reveal a critical…

17
arXiv — NLP / Computation & Language research 1mo ago

On Predicting the Post-training Potential of Pre-trained LLMs

arXiv:2605.11978v1 Announce Type: new Abstract: The performance of Large Language Models (LLMs) on downstream tasks is fundamentally constrained by the capabilities acquired during pre-training. However, traditional benchmarks like MMLU often fail to reflect a base model's…

11
arXiv — NLP / Computation & Language research 1mo ago

Mitigating Context-Memory Conflicts in LLMs through Dynamic Cognitive Reconciliation Decoding

arXiv:2605.12185v1 Announce Type: new Abstract: Large language models accumulate extensive parametric knowledge through pre-training. However, knowledge conflicts occur when outdated or incorrect parametric knowledge conflicts with external knowledge in the context. Existing…

27
arXiv — NLP / Computation & Language research 1mo ago

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

arXiv:2605.12288v1 Announce Type: new Abstract: Direct Preference Optimization (DPO) is a widely used RL-free method for aligning language models from pairwise preferences, but it models preferences over full sequences even though generation is driven by per-token decisions.…

12
arXiv — NLP / Computation & Language research 1mo ago

Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation

arXiv:2605.12345v1 Announce Type: new Abstract: Parameter-efficient fine-tuning (PEFT) techniques offer task-specific fine-tuning at a fraction of the cost of full fine-tuning, but require separate fine-tuning for every new task (combination). In this paper, we explore three…

25
arXiv — NLP / Computation & Language research 1mo ago

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

arXiv:2605.12419v1 Announce Type: new Abstract: Despite the rapid advancements in large language model (LLM) development, fine-tuning them for specific tasks often results in the catastrophic forgetting of their general, language-based reasoning abilities. This work investigates…

24
r/LocalLLaMA community 1mo ago

Fine-Tuning TranslateGemma-4B to improve bi-directional English & Welsh translations on an H200 GPU!

Open source repo: https://github.com/grctest/finetuned-gemmatranslate-cy 5% of the fine-tuning took 40 minutes and cost a couple dollars to prove the process works. Looking forwards to Flash Attention v4 to leave beta, to test fine-tuning performance on a B200 on the cloud,…

16
NVIDIA Developer Blog official-blog 1mo ago

How to Eliminate Pipeline Friction in AI Model Serving

The path from a trained AI model to production should be smooth, but rarely is. Many teams invest weeks fine-tuning models, only to discover that exporting to a...

17
Simon Willison community 1mo ago

llm 0.32a2

Release: llm 0.32a2 A bunch of useful stuff in this LLM alpha, but the most important detail is this one: Most reasoning-capable OpenAI models now use the /v1/responses endpoint instead of /v1/chat/completions . This enables interleaved reasoning across tool calls for GPT-5…

22
r/MachineLearning community 1mo ago

TabPFN-3 just released: a pre-trained tabular foundation model for up to 1M rows [R][N]

TabPFN-3 was released today, the next iteration of the tabular foundation model, originally published in Nature. Quick recap for anyone new to TabPFN: TabPFN predicts on tabular data in a single forward pass - no training, no hyperparameter search, no tuning. Built on TabPFN-2.5…

31
r/LocalLLaMA community 1mo ago

examples : add llama-eval by ggerganov · Pull Request #21152 · ggml-org/llama.cpp

now you can evaluate your models at home, sounds like a perfect tool to compare quants and finetunes Datasets: AIME, AIME2025, GSM8K, GPQA   submitted by   /u/jacek2023 [link]   [comments]

15
OpenAI Python SDK releases dev-tools 1mo ago

v2.34.0

2.34.0 (2026-05-04) Full Changelog: v2.33.0...v2.34.0 Features api: add external_key_id to projects, email/metadata params to users, update types ( 2d232ee ) api: add support for Admin API Keys per endpoint ( b8b176a ) api: admin API updates ( 4ae1138 ) api: manual updates (…

15
ComfyUI releases dev-tools 2mo ago

v0.20.1

What's Changed feat: SUPIR model support (CORE-17) by @kijai in #13250 Some optimizations to make Ernie inference a bit faster. by @comfyanonymous in #13472 fix: append directory type annotation to internal files endpoint (CORE-71) by @Abdulrehman-PIAIC80387 in #13305 Add link…

25
NVIDIA Developer Blog official-blog 2mo ago

Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints

DeepSeek just launched its fourth generation of flagship models with DeepSeek-V4-Pro and DeepSeek-V4-Flash, both targeted at enabling highly efficient...

5
Vercel — AI dev-tools 3mo ago

How Waldium made a blog platform work for humans and AI alike

Waldium is a two-person, YC-backed startup that built an agentic CMS for businesses. Co-founded by Amrutha Gujjar and CTO Shivam Singhal, the platform automates content research and creation, and gives every customer blog its own MCP server endpoint so AI agents can query it…

9
OpenAI Python SDK releases dev-tools 3mo ago

v2.30.0

2.30.0 (2026-03-25) Full Changelog: v2.29.0...v2.30.0 Features api: add keys field to Click/DoubleClick/Drag/Move/Scroll computer actions ( ee1bbed ) Bug Fixes api: align SDK response types with expanded item schemas ( f3f258a ) sanitize endpoint path params ( 89f6698 ) types:…

11
Smol AI News news-outlet 3mo ago

not much happened today

**Cursor's Composer 2**, built on **Kimi K2.5**, sparked discussion over model attribution and licensing, highlighting a shift toward post-trained derivatives of open-source models with domain-specific fine-tuning and reinforcement learning. **Claude Code** is expanding into…

36
OpenAI Python SDK releases dev-tools 3mo ago

v2.29.0

2.29.0 (2026-03-17) Full Changelog: v2.28.0...v2.29.0 Features api: 5.4 nano and mini model slugs ( 3b45666 ) api: add /v1/videos endpoint to batches create method ( c0e7a16 ) api: add defer_loading field to ToolFunction ( 3167595 ) api: add in and nin operators to…

21
NVIDIA Developer Blog official-blog 4mo ago

Develop Native Multimodal Agents with Qwen3.5 VLM Using NVIDIA GPU-Accelerated Endpoints

Alibaba has introduced the new open source Qwen3.5 series built for native multimodal agents. The first model in this series is a ~400B parameter native...

25
Hugging Face official-blog 7mo ago

20x Faster TRL Fine-tuning with RapidFire AI

Back to Articles 20x Faster TRL Fine-tuning with RapidFire AI Published November 21, 2025 Update on GitHub Upvote 27 Kamran Bigdely kbigdelysh rapidfire-ai-inc Arun Kumar arunkk09 rapidfire-ai-inc Quentin Gallouédec qgallouedec Hugging Face TRL now officially integrates with…

13
Lil'Log (Lilian Weng) research 32mo ago

Adversarial Attacks on LLMs

The use of large language models in the real world has strongly accelerated by the launch of ChatGPT. We (including my team at OpenAI, shoutout to them) have invested a lot of effort to build default safe behavior into the model during the alignment process (e.g. via RLHF ).…

5
Eugene Yan research 35mo ago

Patterns for Building LLM-based Systems & Products

Evals, RAG, fine-tuning, caching, guardrails, defensive UX, and collecting user feedback.

22

When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models

Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

On Predicting the Post-training Potential of Pre-trained LLMs

Mitigating Context-Memory Conflicts in LLMs through Dynamic Cognitive Reconciliation Decoding

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

Fine-Tuning TranslateGemma-4B to improve bi-directional English & Welsh translations on an H200 GPU!

How to Eliminate Pipeline Friction in AI Model Serving

llm 0.32a2

TabPFN-3 just released: a pre-trained tabular foundation model for up to 1M rows [R][N]

examples : add llama-eval by ggerganov · Pull Request #21152 · ggml-org/llama.cpp

v2.34.0

v0.20.1

Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints

How Waldium made a blog platform work for humans and AI alike

v2.30.0

not much happened today

v2.29.0

Develop Native Multimodal Agents with Qwen3.5 VLM Using NVIDIA GPU-Accelerated Endpoints

20x Faster TRL Fine-tuning with RapidFire AI

Adversarial Attacks on LLMs

Patterns for Building LLM-based Systems & Products