News / #training Tag Training 423 articles archived under #training · RSS Sign in to follow arXiv — NLP / Computation & Language research 1mo ago When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models arXiv:2605.11612v1 Announce Type: new Abstract: Backdoor vulnerabilities widely exist in the fine-tuning of large language models(LLMs). Most backdoor poisoning methods operate mainly at the token level and lack deeper semantic manipulation, which limits stealthiness. In… 25 arXiv — NLP / Computation & Language research 1mo ago Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter arXiv:2605.11685v1 Announce Type: new Abstract: Large language model (LLM) unlearning aims to remove specific data influences from pre-trained model without costly retraining, addressing privacy, copyright, and safety concerns. However, recent studies reveal a critical… 17 arXiv — NLP / Computation & Language research 1mo ago On Predicting the Post-training Potential of Pre-trained LLMs arXiv:2605.11978v1 Announce Type: new Abstract: The performance of Large Language Models (LLMs) on downstream tasks is fundamentally constrained by the capabilities acquired during pre-training. However, traditional benchmarks like MMLU often fail to reflect a base model's… 11 arXiv — NLP / Computation & Language research 1mo ago Mitigating Context-Memory Conflicts in LLMs through Dynamic Cognitive Reconciliation Decoding arXiv:2605.12185v1 Announce Type: new Abstract: Large language models accumulate extensive parametric knowledge through pre-training. However, knowledge conflicts occur when outdated or incorrect parametric knowledge conflicts with external knowledge in the context. Existing… 27 arXiv — NLP / Computation & Language research 1mo ago TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching arXiv:2605.12288v1 Announce Type: new Abstract: Direct Preference Optimization (DPO) is a widely used RL-free method for aligning language models from pairwise preferences, but it models preferences over full sequences even though generation is driven by per-token decisions.… 12 arXiv — NLP / Computation & Language research 1mo ago Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation arXiv:2605.12345v1 Announce Type: new Abstract: Parameter-efficient fine-tuning (PEFT) techniques offer task-specific fine-tuning at a fraction of the cost of full fine-tuning, but require separate fine-tuning for every new task (combination). In this paper, we explore three… 25 arXiv — NLP / Computation & Language research 1mo ago ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging arXiv:2605.12419v1 Announce Type: new Abstract: Despite the rapid advancements in large language model (LLM) development, fine-tuning them for specific tasks often results in the catastrophic forgetting of their general, language-based reasoning abilities. This work investigates… 24 r/LocalLLaMA community 1mo ago Fine-Tuning TranslateGemma-4B to improve bi-directional English & Welsh translations on an H200 GPU! Open source repo: https://github.com/grctest/finetuned-gemmatranslate-cy 5% of the fine-tuning took 40 minutes and cost a couple dollars to prove the process works. Looking forwards to Flash Attention v4 to leave beta, to test fine-tuning performance on a B200 on the cloud,… 16 NVIDIA Developer Blog official-blog 1mo ago How to Eliminate Pipeline Friction in AI Model Serving The path from a trained AI model to production should be smooth, but rarely is. Many teams invest weeks fine-tuning models, only to discover that exporting to a... 17 Simon Willison community 1mo ago llm 0.32a2 Release: llm 0.32a2 A bunch of useful stuff in this LLM alpha, but the most important detail is this one: Most reasoning-capable OpenAI models now use the /v1/responses endpoint instead of /v1/chat/completions . This enables interleaved reasoning across tool calls for GPT-5… 22 r/MachineLearning community 1mo ago TabPFN-3 just released: a pre-trained tabular foundation model for up to 1M rows [R][N] TabPFN-3 was released today, the next iteration of the tabular foundation model, originally published in Nature. Quick recap for anyone new to TabPFN: TabPFN predicts on tabular data in a single forward pass - no training, no hyperparameter search, no tuning. Built on TabPFN-2.5… 31 r/LocalLLaMA community 1mo ago examples : add llama-eval by ggerganov · Pull Request #21152 · ggml-org/llama.cpp now you can evaluate your models at home, sounds like a perfect tool to compare quants and finetunes Datasets: AIME, AIME2025, GSM8K, GPQA   submitted by   /u/jacek2023 [link]   [comments] 15 OpenAI Python SDK releases dev-tools 1mo ago v2.34.0 2.34.0 (2026-05-04) Full Changelog: v2.33.0...v2.34.0 Features api: add external_key_id to projects, email/metadata params to users, update types ( 2d232ee ) api: add support for Admin API Keys per endpoint ( b8b176a ) api: admin API updates ( 4ae1138 ) api: manual updates (… 15 ComfyUI releases dev-tools 2mo ago v0.20.1 What's Changed feat: SUPIR model support (CORE-17) by @kijai in #13250 Some optimizations to make Ernie inference a bit faster. by @comfyanonymous in #13472 fix: append directory type annotation to internal files endpoint (CORE-71) by @Abdulrehman-PIAIC80387 in #13305 Add link… 25 NVIDIA Developer Blog official-blog 2mo ago Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints DeepSeek just launched its fourth generation of flagship models with DeepSeek-V4-Pro and DeepSeek-V4-Flash, both targeted at enabling highly efficient... 5 Vercel — AI dev-tools 3mo ago How Waldium made a blog platform work for humans and AI alike Waldium is a two-person, YC-backed startup that built an agentic CMS for businesses. Co-founded by Amrutha Gujjar and CTO Shivam Singhal, the platform automates content research and creation, and gives every customer blog its own MCP server endpoint so AI agents can query it… 9 OpenAI Python SDK releases dev-tools 3mo ago v2.30.0 2.30.0 (2026-03-25) Full Changelog: v2.29.0...v2.30.0 Features api: add keys field to Click/DoubleClick/Drag/Move/Scroll computer actions ( ee1bbed ) Bug Fixes api: align SDK response types with expanded item schemas ( f3f258a ) sanitize endpoint path params ( 89f6698 ) types:… 11 Smol AI News news-outlet 3mo ago not much happened today **Cursor's Composer 2**, built on **Kimi K2.5**, sparked discussion over model attribution and licensing, highlighting a shift toward post-trained derivatives of open-source models with domain-specific fine-tuning and reinforcement learning. **Claude Code** is expanding into… 36 OpenAI Python SDK releases dev-tools 3mo ago v2.29.0 2.29.0 (2026-03-17) Full Changelog: v2.28.0...v2.29.0 Features api: 5.4 nano and mini model slugs ( 3b45666 ) api: add /v1/videos endpoint to batches create method ( c0e7a16 ) api: add defer_loading field to ToolFunction ( 3167595 ) api: add in and nin operators to… 21 NVIDIA Developer Blog official-blog 4mo ago Develop Native Multimodal Agents with Qwen3.5 VLM Using NVIDIA GPU-Accelerated Endpoints Alibaba has introduced the new open source Qwen3.5 series built for native multimodal agents. The first model in this series is a ~400B parameter native... 25 Hugging Face official-blog 7mo ago 20x Faster TRL Fine-tuning with RapidFire AI Back to Articles 20x Faster TRL Fine-tuning with RapidFire AI Published November 21, 2025 Update on GitHub Upvote 27 Kamran Bigdely kbigdelysh rapidfire-ai-inc Arun Kumar arunkk09 rapidfire-ai-inc Quentin Gallouédec qgallouedec Hugging Face TRL now officially integrates with… 13 Lil'Log (Lilian Weng) research 32mo ago Adversarial Attacks on LLMs The use of large language models in the real world has strongly accelerated by the launch of ChatGPT. We (including my team at OpenAI, shoutout to them) have invested a lot of effort to build default safe behavior into the model during the alignment process (e.g. via RLHF ).… 5 Eugene Yan research 35mo ago Patterns for Building LLM-based Systems & Products Evals, RAG, fine-tuning, caching, guardrails, defensive UX, and collecting user feedback. 22 Page 9 of 9 · 423 articles ← Newer