News / #training Tag Training 422 articles archived under #training · RSS Sign in to follow arXiv — NLP / Computation & Language research 1mo ago Toward LLMs Beyond English-Centric Development arXiv:2605.15613v1 Announce Type: new Abstract: Through an analysis of sequences generated by open-weight large language models (LLMs), we demonstrate that LLMs are heavily biased toward English. While continual pre-training is commonly used to adapt LLMs to a target language,… 19 arXiv — NLP / Computation & Language research 1mo ago Reference-Free Reinforcement Learning Fine-Tuning for MT: A Seq2Seq Perspective arXiv:2605.15976v1 Announce Type: new Abstract: Production machine translation relies overwhelmingly on encoder-decoder Seq2Seq models, yet reinforcement learning approaches to MT fine-tuning have largely targeted decoder-only LLMs at $\geq$7B parameters, with limited systematic… 17 arXiv — NLP / Computation & Language research 1mo ago From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery arXiv:2605.15412v1 Announce Type: cross Abstract: Modern quantitative trading increasingly relies on systematic models to extract predictive signals from large-scale financial data, where alpha factor discovery plays a central role in transforming market observations into… 13 arXiv — NLP / Computation & Language research 1mo ago Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training arXiv:2506.01732v3 Announce Type: replace Abstract: Large Language Models (LLMs) are pre-trained on large amounts of data from different sources and domains. Such datasets often contain trillions of tokens, including large portions of copyrighted or proprietary content, which… 11 r/LocalLLaMA community 1mo ago Gemma-4-Gembrain-31B-it-uncensored-heretic Is Out Now, a Merge of Multiple Gemma 4 31B it Finetunes Designed to Boost Logical and Lateral Thinking for Improved Adherence, Increased Swipe Variety and Enhanced Creative Prose, With KLD of 0.0186 and 13/100 Refusals! Provided in both Safetensors and GGUFs. Safetensors: llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic GGUFs: llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic-GGUF:… 19 r/LocalLLaMA community 1mo ago G4-Meromero-31B-Uncensored-Heretic Is Out Now, a Finetune of Gemma 4 31B It Designed for Creative Tasks, With Kld of 0.0100 and 15/100 Refusals! Provided in both Safetensors and GGUFs. Safetensors: llmfan46/G4-MeroMero-31B-uncensored-heretic: https://huggingface.co/llmfan46/G4-MeroMero-31B-uncensored-heretic GGUFs: llmfan46/G4-MeroMero-31B-uncensored-heretic-GGUF:… 29 r/LocalLLaMA community 1mo ago gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic is Out Now, A Writing Finetune that Aims to Improve Gemma 4 31B it Writing Quality with More Natural English and Better Prose, Good for Creative Writings, Translations and RPs! Provided in both Safetensors and GGUFs. llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic… 38 r/LocalLLaMA community 1mo ago LLM Phone Home: Reliable Apps that can deliver inference from local backend Hello all, I’m wondering what suggestions there are for an ios app that can serve an openai compatible endpoint. I am using 3sparks which works GREAT for that specific use, BUT, there is no mcp, no web search, etc. I want to show people that a local model with web search on your… 25 r/LocalLLaMA community 1mo ago Best dataset for model pre-training Well, alright, i want ~100M parameters . on a NVIDIA L4 (24GB VRAM) . any good dataset (and quanity of tokens ) to pretrain ?   submitted by   /u/Ok-Type-7663 [link]   [comments] 15 Hugging Face Daily Papers research 1mo ago Long Context Pre-Training with Lighthouse Attention Abstract Lighthouse Attention enables efficient training of causal transformers at long sequences by using hierarchical selection-based attention that reduces computational complexity while maintaining model performance. AI-generated summary Training causal transformers at… 33 Hugging Face Daily Papers research 1mo ago Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance Abstract FEST is a few-shot demonstration-guided reinforcement learning algorithm that achieves strong performance with minimal supervised fine-tuning data by combining supervised signals, on-policy learning, and weighted training to prevent overfitting. AI-generated summary… 22 r/LocalLLaMA community 1mo ago [FOUNDING] SupraLabs - real open-source AI models for you! https://preview.redd.it/k6lub2ypva1h1.png?width=1500&format=png&auto=webp&s=cd44452c86b5216fec17113a72f43bbf169edafb Hey r/LocalLLaMA ! We founded SupraLabs , and it's huge! What we do? We train, finetune and explore small models with good results to revolutionize small AI… 30 Hugging Face Daily Papers research 1mo ago Dynamic Latent Routing Abstract Temporal composition of sub-policies in MDPs with time-varying rewards enables optimal policy recovery through generalized Dijkstra search, which inspires a dynamic latent routing method for language model fine-tuning that outperforms traditional supervised approaches.… 34 arXiv — Machine Learning research 1mo ago Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning arXiv:2605.13936v1 Announce Type: new Abstract: The recent success of large language models (LLMs) has been largely driven by vast public datasets. However, the next frontier for LLM development lies beyond public data. Much of the world's most valuable information is private,… 33 arXiv — Machine Learning research 1mo ago ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization arXiv:2605.14497v1 Announce Type: new Abstract: Offline-to-online reinforcement learning harnesses the stability of offline pretraining and the flexibility of online fine-tuning. A key challenge lies in the non-stationary distribution shift between offline datasets and the… 34 arXiv — NLP / Computation & Language research 1mo ago PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts arXiv:2605.14055v1 Announce Type: new Abstract: Parameter-Efficient Fine-Tuning (PEFT) is widely used for adapting Large Language Models (LLMs) for various tasks. Recently, there has been an increasing demand for fine-tuning a single LLM for multiple tasks because it requires… 4 arXiv — NLP / Computation & Language research 1mo ago To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model arXiv:2605.14291v1 Announce Type: cross Abstract: The rapid advancement of Large Vision-Language Models (LVLMs) is increasingly accompanied by unauthorized scraping and training on multimodal web data, posing severe copyright and privacy risks to data owners. Existing… 8 Hugging Face Daily Papers research 1mo ago Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty? Abstract Current multimodal models struggle to match human expert aesthetic judgment in comparative image selection tasks, as demonstrated by the Visual Aesthetic Benchmark which reveals significant performance gaps and shows that fine-tuning on expert examples can improve… 14 Vercel — AI dev-tools 1mo ago Trace any Vercel request from the CLI You can now generate Session Traces through the Vercel CLI. Use the new vercel curl --trace command to generate an OpenTelemetry trace to the specified endpoint from the terminal. Use the new vercel traces get command to fetch the generated trace by request ID. Available on all… 38 r/LocalLLaMA community 1mo ago Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO) Hello I have been working on creating a LLM from ground up. It is based on deepseek architecture with heavily VRAM footprint reduced optimized(GUM+muon) Currently this is the json schema I am using which should suffice as to what currently is being pretrained. Training on a… 7 r/LocalLLaMA community 1mo ago Dropping learning rate fixed my Qlora fine-tune more than anything else i tried Been fine-tuning llama 3.1 8b with Qlora for a classification task using about 8k samples. I was getting bad eval results for a while and kept thinking something was wrong with my data. Tried cleaning the dataset, tried different prompt templates, messed with rank and alpha.… 35 Hugging Face Daily Papers research 1mo ago Learning Agentic Policy from Action Guidance Abstract Agentic reinforcement learning for large language models leverages action data from human interactions as reference guidance to improve exploration and reduce dependence on costly supervised fine-tuning. AI-generated summary Agentic reinforcement learning (RL) for Large… 21 Hugging Face Daily Papers research 1mo ago Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition Abstract Research identifies studio-bias in multilingual ASR fine-tuning and proposes R-MFT method to improve spontaneous speech performance while maintaining efficiency. AI-generated summary Fine-tuning multilingual ASR models like Whisper for low-resource languages often… 20 arXiv — Machine Learning research 1mo ago ODRPO: Ordinal Decompositions of Discrete Rewards for Robust Policy Optimization arXiv:2605.12667v1 Announce Type: new Abstract: The alignment of Large Language Models (LLMs) utilizes Reinforcement Learning from AI Feedback (RLAIF) for non-verifiable domains such as long-form question answering and open-ended instruction following. These domains often rely… 37 arXiv — Machine Learning research 1mo ago Early Data Exposure Improves Robustness to Subsequent Fine-Tuning arXiv:2605.12705v1 Announce Type: new Abstract: How can we train models whose post-trained capabilities survive subsequent fine-tuning? Rather than focusing on downstream interventions to mitigate forgetting of upstream capabilities, we study how upstream training choices - that… 30 arXiv — Machine Learning research 1mo ago Low-Rank Adapters Initialization via Gradient Surgery for Continual Learning arXiv:2605.12752v1 Announce Type: new Abstract: LoRA is widely adopted for continual fine-tuning of Large Language Models due to its parameter efficiency, modularity across tasks, and compatibility with replay strategies. However, LoRA-based continual learning remains vulnerable… 20 arXiv — Machine Learning research 1mo ago Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer arXiv:2605.12798v1 Announce Type: new Abstract: Fine-tuning LLMs on narrow harmful datasets can induce Emergent Misalignment (EM), where models exhibit misaligned behavior far beyond the fine-tuning distribution. We argue that emergent misalignment can be better understood as a… 19 arXiv — Machine Learning research 1mo ago Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning arXiv:2605.12906v1 Announce Type: new Abstract: Data selection during supervised fine-tuning (SFT) can critically change the behavior of large language models (LLMs). Although existing work has studied the effect of selecting data based on heuristics such as perplexity,… 24 arXiv — Machine Learning research 1mo ago From Instance Selection to Fixed-Pool Data Recipe Search for Supervised Fine-Tuning arXiv:2605.12944v1 Announce Type: new Abstract: Supervised fine-tuning (SFT) data selection is commonly formulated as instance ranking: score each example and retain a top-$k$ subset. However, effective SFT training subsets are often produced through ordered curation recipes,… 18 arXiv — Machine Learning research 1mo ago Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy arXiv:2605.12991v1 Announce Type: new Abstract: LLM-based multi-agent pipelines flip from correct to incorrect answers under simulated peer disagreement at rates we term yield, a vulnerability widely attributed to RLHF-induced sycophancy. We test this attribution across four… 21 arXiv — Machine Learning research 1mo ago Continual Fine-Tuning of Large Language Models via Program Memory arXiv:2605.13162v1 Announce Type: new Abstract: Parameter-Efficient Fine-Tuning (PEFT), particularly Low-Rank Adaptation (LoRA), has become a standard approach for adapting Large Language Models (LLMs) under limited compute. However, in continual settings where models are… 33 Hugging Face Daily Papers research 1mo ago Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context Abstract Long-context continued pre-training enhances vision-language models' ability to handle extended documents while maintaining performance across diverse contexts through strategic data mixture design. AI-generated summary Long-context modeling is becoming a core… 24 Hugging Face Daily Papers research 1mo ago Revisiting DAgger in the Era of LLM-Agents Abstract DAgger-style training for long-horizon language model agents combines supervised fine-tuning and reinforcement learning benefits by using teacher-student policy interpolation with on-policy interactions. AI-generated summary Long-horizon LM agents learn from multi-turn… 20 Hacker News — AI on Front Page community 1mo ago Princeton mandates proctoring for in-person exams, upending 133 year precedent Article URL: https://www.dailyprincetonian.com/article/2026/05/princeton-news-adpol-proctoring-in-person-examinations-passed-faculty-133-years-precedent Comments URL: https://news.ycombinator.com/item?id=48126848 Points: 226 # Comments: 310 9 r/LocalLLaMA community 1mo ago I made a UI and server for using Anthropic's new Natural Language Autoencoders locally with llama.cpp Anthropic's first open weight models, Natural Language Autoencoders , are just finetunes of popular open weight models. They do not modify architecture and modeling code so inference with llama.cpp is mostly trivial. I packaged every feature of NLAs (namely activation… 34 Hugging Face Daily Papers research 1mo ago Efficient Pre-Training with Token Superposition Abstract Token-Superposition Training (TST) improves pre-training efficiency by combining contiguous tokens into bags during a superposition phase with multi-hot cross-entropy objective, achieving faster training times without architectural changes. AI-generated summary… 30 Hugging Face Daily Papers research 1mo ago ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging Abstract ORBIT addresses catastrophic forgetting in large language model fine-tuning for generative retrieval by tracking parameter distances and employing weight averaging to maintain model performance. AI-generated summary Despite the rapid advancements in large language model… 7 TechCrunch — AI news-outlet 1mo ago Adaption aims big with AutoScientist, an AI tool that helps models train themselves Adaption's new AutoScientist tool is designed to let models adapt to specific capabilities quickly through an automated approach to conventional fine-tuning. 17 Hugging Face Daily Papers research 1mo ago L2P: Unlocking Latent Potential for Pixel Generation Abstract Latent-to-Pixel transfer paradigm efficiently leverages pre-trained latent diffusion models to create pixel-space models with minimal training overhead and high-resolution generation capabilities. AI-generated summary Pixel diffusion models have recently regained… 14 Hugging Face Daily Papers research 1mo ago FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning Abstract Training framework FocuSFT improves long-context language model performance by addressing attention allocation issues through bilevel optimization with parametric memory that focuses attention on semantically relevant content. AI-generated summary Large language models… 25 arXiv — Machine Learning research 1mo ago Rotation-Preserving Supervised Fine-Tuning arXiv:2605.10973v1 Announce Type: new Abstract: Supervised fine-tuning (SFT) improves in-domain performance but can degrade out-of-domain (OOD) generalization. Prior work suggests that this degradation is related to changes in dominant singular subspaces of pretrained weight… 22 arXiv — Machine Learning research 1mo ago $\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin arXiv:2605.10981v1 Announce Type: new Abstract: Reference-free preference optimization has emerged as an efficient alternative to reinforcement learning from human feedback, with Simple Preference Optimization(SimPO) demonstrating strong performance by eliminating the explicit… 23 arXiv — Machine Learning research 1mo ago Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training arXiv:2605.11134v1 Announce Type: new Abstract: Preference learning methods such as Direct Preference Optimization (DPO) are known to induce reliance on spurious correlations, leading to sycophancy and length bias in today's language models and potentially severe goal… 13 arXiv — Machine Learning research 1mo ago Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning arXiv:2605.11235v1 Announce Type: new Abstract: In LLM Reinforcement Fine-Tuning (RFT), curriculum learning drives both efficiency and performance. Yet, current methods externalize curriculum judgment via handcrafted heuristics or auxiliary models, risking misalignment with the… 18 arXiv — Machine Learning research 1mo ago The tractability landscape of diffusion alignment: regularization, rewards, and computational primitives arXiv:2605.11361v1 Announce Type: new Abstract: Inference-time reward alignment asks how to turn a pre-trained diffusion model with base law $p$ into a sampler that favors a reward $r$ while remaining close to $p$. Since there is no canonical distributional distance for this… 27 arXiv — Machine Learning research 1mo ago Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies arXiv:2605.11387v1 Announce Type: new Abstract: We address the problem of fine-tuning pre-trained generative policies with reinforcement learning (RL) while preserving the multimodality of their action distributions. Existing methods for RL fine-tuning of generative policies… 17 arXiv — Machine Learning research 1mo ago Efficient Adjoint Matching for Fine-tuning Diffusion Models arXiv:2605.11480v1 Announce Type: new Abstract: Reward fine-tuning has become a common approach for aligning pretrained diffusion and flow models with human preferences in text-to-image generation. Among reward-gradient-based methods, Adjoint Matching (AM) provides a principled… 30 arXiv — NLP / Computation & Language research 1mo ago Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training arXiv:2605.11416v1 Announce Type: new Abstract: Selective layer-wise updates are essential for low-cost continued pre-training of Large Language Models (LLMs), yet determining which layers to freeze or train remains an empirical black-box problem due to the lack of interpretable… 28 arXiv — NLP / Computation & Language research 1mo ago A Study on Hidden Layer Distillation for Large Language Model Pre-Training arXiv:2605.11513v1 Announce Type: new Abstract: Knowledge Distillation (KD) is a critical tool for training Large Language Models (LLMs), yet the majority of research focuses on approaches that rely solely on output logits, neglecting semantic information in the teacher's… 25 arXiv — NLP / Computation & Language research 1mo ago When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models arXiv:2605.11612v1 Announce Type: new Abstract: Backdoor vulnerabilities widely exist in the fine-tuning of large language models(LLMs). Most backdoor poisoning methods operate mainly at the token level and lack deeper semantic manipulation, which limits stealthiness. In… 25 Page 8 of 9 · 422 articles ← Newer Older →