News / #reasoning Tag Reasoning 500 articles archived under #reasoning · RSS Sign in to follow Hugging Face Daily Papers research 12d ago Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation Abstract ViGOS is a visually grounded on-policy self-distillation framework for multimodal large language models that improves image-grounded behavior by using specialized teachers for different stages of reasoning and handling invalid rollouts. Generated by… 8 arXiv — NLP / Computation & Language research 12d ago Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier arXiv:2606.18284v1 Announce Type: cross Abstract: The limiting resource for training agents via reinforcement learning (RL) is increasingly frontier task supply: valid, solvable tasks just difficult enough to train the current model. As reasoning and agentic models improve,… 21 arXiv — Machine Learning research 12d ago Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging arXiv:2606.18521v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Reward (RLVR) has emerged as a powerful post-training paradigm that surpasses Supervised Fine-Tuning (SFT) in eliciting reasoning intelligence and resisting catastrophic forgetting. Recent… 13 arXiv — Machine Learning research 12d ago Learning from Own Solutions: Self-Conditioned Credit Assignment for Reinforcement Learning with Verifiable Rewards arXiv:2606.18810v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has driven substantial progress in training LLMs for reasoning tasks, but representative methods such as GRPO assign uniform credit across all tokens, wasting gradient on… 11 arXiv — Machine Learning research 12d ago Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation arXiv:2606.18844v1 Announce Type: new Abstract: Self-distillation improves reasoning in large language models by using the model's own rollouts as training signal, typically through implicit logit-level alignment that minimizes KL divergence toward a privileged target… 14 arXiv — NLP / Computation & Language research 12d ago REVES: REvision and VErification--Augmented Training for Test-Time Scaling arXiv:2606.18910v1 Announce Type: cross Abstract: Test-time scaling via sequential revision has emerged as a powerful paradigm for enhancing Large Language Model (LLM) reasoning. However, standard post-training methods primarily optimize single-shot objectives, creating a… 27 arXiv — Machine Learning research 12d ago EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts arXiv:2606.18967v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a representative post-training paradigm for LLMs, enabling strong reasoning and agentic capabilities. However, rollout generation remains a dominant latency bottleneck because autoregressive… 25 arXiv — Machine Learning research 12d ago Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation arXiv:2606.19120v1 Announce Type: new Abstract: On-policy self-distillation (OPSD) trains a model on its own rollouts and uses a frozen copy to provide dense token-level targets conditioned on a reference target. This works well for LLM reasoning, but a direct extension to… 31 arXiv — NLP / Computation & Language research 12d ago LLM Parameters for Math Across Languages: Shared or Separate? arXiv:2606.18453v1 Announce Type: new Abstract: Large language models (LLMs) exhibit substantial cross-lingual variation in mathematical reasoning performance, but it remains unclear whether these differences reflect language-specific parameters or a shared mechanism that… 27 arXiv — NLP / Computation & Language research 12d ago Towards Scalable Customization and Deployment of Multi-Agent Systems for Enterprise Applications arXiv:2606.18502v1 Announce Type: new Abstract: Large language model (LLM)-based multi-agent systems demonstrate strong performance on complex reasoning and task execution, enabling broad enterprise applications. However, production deployment remains challenging due to… 38 arXiv — NLP / Computation & Language research 12d ago PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding arXiv:2606.18624v1 Announce Type: new Abstract: Natural language understanding often depends on meanings that are implied rather than explicitly stated, requiring pragmatic reasoning. Despite strong performance on math and logical reasoning, large language models (LLMs) still… 6 arXiv — NLP / Computation & Language research 12d ago TW-LegalBench: Measuring Taiwanese Legal Understanding arXiv:2606.18699v1 Announce Type: new Abstract: Large language models (LLMs) have shown impressive capabilities across diverse tasks, yet their performance on jurisdiction-specific legal reasoning remains underexplored. We present TW-LegalBench that utilizes Taiwanese legal… 22 arXiv — NLP / Computation & Language research 12d ago Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning arXiv:2606.18831v1 Announce Type: new Abstract: Long-context reasoning is an essential capability for large language models, particularly when they are deployed as autonomous agents that must reason over lengthy trajectories. Reinforcement learning (RL) has recently emerged as a… 36 arXiv — NLP / Computation & Language research 12d ago ScholarSum: Student-Teacher Abstractive Summarization via Knowledge Graph Reasoning and Reflective Refinement arXiv:2606.18850v1 Announce Type: new Abstract: Abstractive summarization plays a crucial role in enabling efficient understanding of scientific literature, yet it inherently demands both linguistic fluency and factual faithfulness. Existing approaches often fail to reconcile… 28 arXiv — NLP / Computation & Language research 12d ago GraphPO: Graph-based Policy Optimization for Reasoning Models arXiv:2606.18954v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become a standard paradigm for enhancing the capability of large reasoning models. RLVR typically samples responses independently and optimizes the policy using from final… 9 arXiv — NLP / Computation & Language research 12d ago Enhancing Multilingual Reasoning via Steerable Model Merging arXiv:2606.19002v1 Announce Type: new Abstract: Model merging is an effective technique for composing the capabilities of a multilingual model and a reasoning model. It has achieved promising generalization in multilingual reasoning tasks by aligning feature spaces of different… 36 arXiv — NLP / Computation & Language research 12d ago DreamReasoner-8B: Block-Size Curriculum Learning for Diffusion Reasoning Models arXiv:2606.19257v1 Announce Type: new Abstract: Block diffusion language models accelerate decoding through parallel block-wise denoising, yet whether they can be reliably scaled for long chain-of-thought (CoT) reasoning remains unresolved. To this end, we develop… 18 arXiv — NLP / Computation & Language research 12d ago Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents arXiv:2606.18947v1 Announce Type: cross Abstract: Production LLM agents increasingly depend on real-time search, yet native search grounding bundles retrieval policy, provider choice, evidence injection, cost, latency, and generation behavior behind a single model-provider… 20 arXiv — NLP / Computation & Language research 12d ago STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability arXiv:2606.19236v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards algorithms like GRPO have emerged as the dominant post-training paradigm for complex reasoning in LLMs, yet commonly suffer from policy entropy collapse during training. We conduct a… 35 arXiv — NLP / Computation & Language research 12d ago Structured Inference with Large Language Gibbs arXiv:2606.19264v1 Announce Type: cross Abstract: The knowledge encoded in large language models (LLMs) can serve as a substrate for structured reasoning over variables describing a complex world, but accessing this knowledge in a probabilistically coherent manner poses a… 8 arXiv — NLP / Computation & Language research 12d ago Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation arXiv:2606.19327v1 Announce Type: cross Abstract: Post-training of reasoning language models is commonly driven by supervised distillation and reinforcement learning with verifiable rewards. Distillation often relies on chain-of-thought annotations that are expensive to obtain… 35 arXiv — NLP / Computation & Language research 12d ago Native Active Perception as Reasoning for Omni-Modal Understanding arXiv:2606.19341v1 Announce Type: cross Abstract: Passive models for long video understanding typically rely on a "watch-it-all" paradigm, processing frames uniformly regardless of query difficulty, causing computational cost to grow with video duration. Although interactive… 12 arXiv — NLP / Computation & Language research 12d ago ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark arXiv:2505.23851v3 Announce Type: replace Abstract: Large language models (LLMs) are increasingly applied to symbolic mathematics, yet existing evaluations often conflate pattern memorization with genuine reasoning. To address this gap, we present ASyMOB, a high-resolution… 38 arXiv — NLP / Computation & Language research 12d ago UniECG: Understanding and Generating ECG in One Unified Model arXiv:2509.18588v2 Announce Type: replace Abstract: Electrocardiogram (ECG) interpretation is a fundamental skill in medical education, yet students often need more than static examples to connect waveform evidence with diagnostic reasoning. This paper presents UniECG as a step… 38 arXiv — NLP / Computation & Language research 12d ago ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents arXiv:2603.00026v2 Announce Type: replace Abstract: Memory management is essential for LLM agents in long-term interactions. Current memory frameworks typically treat agents as passive ``recorders'' and retrieve information without understanding its deeper implications. They may… 15 Hugging Face Daily Papers research 12d ago Guava: An Effective and Universal Harness for Embodied Manipulation Abstract A harness framework for embodied tool use combines high-level reasoning with external modules, enabling compact models to perform complex manipulation tasks with minimal training data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Language models trained on large-scale… 15 Hugging Face Daily Papers research 12d ago Sumi: Open Uniform Diffusion Language Model from Scratch Abstract A large-scale uniform diffusion language model pretrained from scratch demonstrates competitive performance on knowledge and reasoning tasks while highlighting differences in commonsense reasoning compared to autoregressive models. Generated by… 15 Hugging Face Daily Papers research 12d ago Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning Abstract Visual-Seeker enables visual-native multimodal deep search through active visual reasoning, outperforming proprietary models on real-world web environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal large language models (MLLMs) have demonstrated… 25 arXiv — Machine Learning research 13d ago Learning to Refine Hidden States for Reliable LLM Reasoning arXiv:2606.17524v1 Announce Type: new Abstract: Large language models show strong reasoning ability, but their internal reasoning process can remain unstable in complex multi-step settings, where early hidden-state errors may propagate to incorrect predictions. We propose ReLAR,… 35 arXiv — Machine Learning research 13d ago Continual Self-Improvement with Lightweight Experiential Latent Memories arXiv:2606.17803v1 Announce Type: new Abstract: Large language models achieve strong reasoning performance by scaling inference-time compute, yet remain fundamentally stateless, discarding the rich, self-produced reasoning traces generated during this process. We investigate… 21 arXiv — Machine Learning research 13d ago From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning arXiv:2606.18089v1 Announce Type: new Abstract: Post-training pipelines that combine supervised fine-tuning (SFT) with reinforcement learning (RL) have emerged as the key recipe for transforming large language models (LLMs) into robust reasoners. We argue that this combined… 17 arXiv — NLP / Computation & Language research 13d ago Decoding Hidden Deception in Reasoning LLMs: Activation Explainers for Deception Auditing arXiv:2606.17478v1 Announce Type: new Abstract: As LLMs acquire stronger reasoning capabilities, deceptive behavior becomes an increasingly serious safety concern. Existing deception monitors either score visible transcripts or derive scalar probe scores from representation… 23 arXiv — NLP / Computation & Language research 13d ago From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning arXiv:2606.17682v1 Announce Type: new Abstract: Reinforcement learning pipelines for Large Language Model (LLM) training often rely on manually redesigned environments between stages, requiring practitioners to heuristically infer which configuration will best improve the… 28 arXiv — NLP / Computation & Language research 13d ago SuCo: Sufficiency-guided Continuous Adaptive Reasoning arXiv:2606.17687v1 Announce Type: new Abstract: Despite remarkable performance on complex tasks, Large Reasoning Models (LRMs) often generate excessively long Chain-of-Thoughts (CoT), inflating computational costs even for simple queries. Existing efforts to mitigate this… 21 arXiv — NLP / Computation & Language research 13d ago Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models arXiv:2606.17890v1 Announce Type: new Abstract: Long-form chain-of-thought reasoning can improve LLM performance on complex tasks, but models often continue generating unnecessary reasoning after a correct answer has emerged. We refer to this behavior as overthinking. We study… 29 arXiv — NLP / Computation & Language research 13d ago ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions arXiv:2606.17905v1 Announce Type: new Abstract: Large language models perform increasingly well on standardized logical reasoning benchmarks, but whether this ability remains robust beyond English is unclear. We introduce ChLogic, an English--Chinese aligned benchmark that tests… 10 arXiv — NLP / Computation & Language research 13d ago Visuals Lie, Consistency Speaks: Disentangling Spatial Attention from Reliability in Vision-Language Models arXiv:2606.17389v1 Announce Type: cross Abstract: Multimodal Foundation Models are increasingly used as reasoning agents, making reliability, knowing when a model may hallucinate, critical. A common intuition, which we call the Attention-Confidence Assumption, holds that… 24 arXiv — NLP / Computation & Language research 13d ago The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI Act arXiv:2606.18158v1 Announce Type: cross Abstract: Large language models now produce legal text of at least median quality, yet no existing benchmark can evaluate whether they perform doctrinal legal reasoning, which forms the interpretive core of legal work, rather than the… 38 arXiv — NLP / Computation & Language research 13d ago MedicalAgentsBench for Complex Medical Reasoning: Comparing Internalized Reasoning Models versus Externalized Agent-based Frameworks arXiv:2503.07459v3 Announce Type: replace Abstract: Complex medical reasoning requires integrating heterogeneous clinical evidence across multiple inference steps. Large language models (LLMs) now approach this through two routes: internalized reasoning and externalized agent… 19 arXiv — NLP / Computation & Language research 13d ago Adaptive Activation Steering for Efficient LLM Reasoning via Closed-Loop PID Control arXiv:2506.18831v3 Announce Type: replace Abstract: Reasoning LLMs trained with long chain-of-thought often overthink: they spend tokens on redundant reflection and transitions that inflate cost without improving accuracy. Static activation steering (e.g.\ SEAL) suppresses such… 6 arXiv — NLP / Computation & Language research 13d ago EngTrace: A Symbolic Benchmark for Verifiable Process Supervision of Engineering Reasoning arXiv:2511.01650v3 Announce Type: replace Abstract: Large Language Models (LLMs) are increasingly entering specialized, safety-critical engineering workflows governed by strict quantitative standards and immutable physical laws, making rigorous evaluation of their reasoning… 38 arXiv — NLP / Computation & Language research 13d ago Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning arXiv:2601.03872v2 Announce Type: replace Abstract: The integration of large language models (LLMs) with external tools has significantly expanded the capabilities of AI agents. However, as the diversity of both LLMs and tools increases, selecting the optimal model-tool… 27 Hugging Face Daily Papers research 13d ago ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions Abstract ChLogic benchmark reveals persistent performance gaps between English and Chinese logical reasoning in large language models, influenced by surface realization differences and translation artifacts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models… 37 Hugging Face Daily Papers research 13d ago TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs Abstract A framework called TRIAGE is proposed to improve clinical early warning systems by training large language models to generate dialectical reasoning for continuous risk scoring with better calibration and interpretability. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 29 r/LocalLLaMA community 13d ago “Wait,” in reasoning models makes my eye twitch I get that it helps, I know why they do it, but it’s still annoying as hell lol   submitted by   /u/Borkato [link]   [comments] 11 r/LocalLLaMA community 13d ago GLM-5.2 just dropped open weights and it already looks weirdly strong for coding GLM-5.2 just released and the early numbers look pretty insane. 1M context window, open weights, MIT license, two reasoning effort modes, and it is already showing up near the top of coding arenas. I know every new model gets hyped for 24 hours, but this one actually looks worth… 28 Hugging Face Daily Papers research 13d ago ExpRL: Exploratory RL for LLM Mid-Training Abstract ExpRL uses human-written question-answer data as reward scaffolds to provide automated reinforcement learning priming for language models, outperforming traditional methods on math reasoning tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse reward reinforcement… 23 r/LocalLLaMA community 13d ago Scaling former VibeThinker-1.5B to 3B — now it reaches frontier math & coding performance https://preview.redd.it/obgodr9dfn7h1.png?width=1796&format=png&auto=webp&s=b5fd95e2b7e6f8ed7704e3de66778e970d34a1dd We trained VibeThinker-3B to test how far verifiable reasoning can be pushed in a strict small-model regime. It gets 94.3 on AIME'26, 80.2 on LiveCodeBench v6,… 36 Hugging Face Daily Papers research 13d ago Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale Abstract Ling-2.6 and Ring-2.6 models are presented as scalable solutions for agentic intelligence, featuring architectural upgrades and specialized training methods to balance fast response times with advanced reasoning capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 34 r/LocalLLaMA community 13d ago Gemma 12b - Reasoning hardening instructions I've become quite happy with Gemma 12b QAT as a general assistant lately. It is small enough to run on my PC while still leave plenty of VRAM free for other tasks and fast enough that I I don't have to go make coffee while it thinks. I saw someone on youtube throwing trick… 36 Page 4 of 10 · 500 articles ← Newer Older →