Tag

Reasoning

500 articles archived under #reasoning · RSS

Hugging Face Daily Papers research 12d ago

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

Abstract ViGOS is a visually grounded on-policy self-distillation framework for multimodal large language models that improves image-grounded behavior by using specialized teachers for different stages of reasoning and handling invalid rollouts. Generated by…

8
arXiv — NLP / Computation & Language research 12d ago

Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier

arXiv:2606.18284v1 Announce Type: cross Abstract: The limiting resource for training agents via reinforcement learning (RL) is increasingly frontier task supply: valid, solvable tasks just difficult enough to train the current model. As reasoning and agentic models improve,…

21
arXiv — Machine Learning research 12d ago

Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging

arXiv:2606.18521v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Reward (RLVR) has emerged as a powerful post-training paradigm that surpasses Supervised Fine-Tuning (SFT) in eliciting reasoning intelligence and resisting catastrophic forgetting. Recent…

13
arXiv — Machine Learning research 12d ago

Learning from Own Solutions: Self-Conditioned Credit Assignment for Reinforcement Learning with Verifiable Rewards

arXiv:2606.18810v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has driven substantial progress in training LLMs for reasoning tasks, but representative methods such as GRPO assign uniform credit across all tokens, wasting gradient on…

11
arXiv — Machine Learning research 12d ago

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

arXiv:2606.18844v1 Announce Type: new Abstract: Self-distillation improves reasoning in large language models by using the model's own rollouts as training signal, typically through implicit logit-level alignment that minimizes KL divergence toward a privileged target…

14
arXiv — NLP / Computation & Language research 12d ago

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

arXiv:2606.18910v1 Announce Type: cross Abstract: Test-time scaling via sequential revision has emerged as a powerful paradigm for enhancing Large Language Model (LLM) reasoning. However, standard post-training methods primarily optimize single-shot objectives, creating a…

27
arXiv — Machine Learning research 12d ago

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

arXiv:2606.18967v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a representative post-training paradigm for LLMs, enabling strong reasoning and agentic capabilities. However, rollout generation remains a dominant latency bottleneck because autoregressive…

25
arXiv — Machine Learning research 12d ago

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

arXiv:2606.19120v1 Announce Type: new Abstract: On-policy self-distillation (OPSD) trains a model on its own rollouts and uses a frozen copy to provide dense token-level targets conditioned on a reference target. This works well for LLM reasoning, but a direct extension to…

31
arXiv — NLP / Computation & Language research 12d ago

LLM Parameters for Math Across Languages: Shared or Separate?

arXiv:2606.18453v1 Announce Type: new Abstract: Large language models (LLMs) exhibit substantial cross-lingual variation in mathematical reasoning performance, but it remains unclear whether these differences reflect language-specific parameters or a shared mechanism that…

27
arXiv — NLP / Computation & Language research 12d ago

Towards Scalable Customization and Deployment of Multi-Agent Systems for Enterprise Applications

arXiv:2606.18502v1 Announce Type: new Abstract: Large language model (LLM)-based multi-agent systems demonstrate strong performance on complex reasoning and task execution, enabling broad enterprise applications. However, production deployment remains challenging due to…

38
arXiv — NLP / Computation & Language research 12d ago

PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding

arXiv:2606.18624v1 Announce Type: new Abstract: Natural language understanding often depends on meanings that are implied rather than explicitly stated, requiring pragmatic reasoning. Despite strong performance on math and logical reasoning, large language models (LLMs) still…

6
arXiv — NLP / Computation & Language research 12d ago

TW-LegalBench: Measuring Taiwanese Legal Understanding

arXiv:2606.18699v1 Announce Type: new Abstract: Large language models (LLMs) have shown impressive capabilities across diverse tasks, yet their performance on jurisdiction-specific legal reasoning remains underexplored. We present TW-LegalBench that utilizes Taiwanese legal…

22
arXiv — NLP / Computation & Language research 12d ago

Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning

arXiv:2606.18831v1 Announce Type: new Abstract: Long-context reasoning is an essential capability for large language models, particularly when they are deployed as autonomous agents that must reason over lengthy trajectories. Reinforcement learning (RL) has recently emerged as a…

36
arXiv — NLP / Computation & Language research 12d ago

ScholarSum: Student-Teacher Abstractive Summarization via Knowledge Graph Reasoning and Reflective Refinement

arXiv:2606.18850v1 Announce Type: new Abstract: Abstractive summarization plays a crucial role in enabling efficient understanding of scientific literature, yet it inherently demands both linguistic fluency and factual faithfulness. Existing approaches often fail to reconcile…

28
arXiv — NLP / Computation & Language research 12d ago

GraphPO: Graph-based Policy Optimization for Reasoning Models

arXiv:2606.18954v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become a standard paradigm for enhancing the capability of large reasoning models. RLVR typically samples responses independently and optimizes the policy using from final…

9
arXiv — NLP / Computation & Language research 12d ago

Enhancing Multilingual Reasoning via Steerable Model Merging

arXiv:2606.19002v1 Announce Type: new Abstract: Model merging is an effective technique for composing the capabilities of a multilingual model and a reasoning model. It has achieved promising generalization in multilingual reasoning tasks by aligning feature spaces of different…

36
arXiv — NLP / Computation & Language research 12d ago

DreamReasoner-8B: Block-Size Curriculum Learning for Diffusion Reasoning Models

arXiv:2606.19257v1 Announce Type: new Abstract: Block diffusion language models accelerate decoding through parallel block-wise denoising, yet whether they can be reliably scaled for long chain-of-thought (CoT) reasoning remains unresolved. To this end, we develop…

18
arXiv — NLP / Computation & Language research 12d ago

Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents

arXiv:2606.18947v1 Announce Type: cross Abstract: Production LLM agents increasingly depend on real-time search, yet native search grounding bundles retrieval policy, provider choice, evidence injection, cost, latency, and generation behavior behind a single model-provider…

20
arXiv — NLP / Computation & Language research 12d ago

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

arXiv:2606.19236v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards algorithms like GRPO have emerged as the dominant post-training paradigm for complex reasoning in LLMs, yet commonly suffer from policy entropy collapse during training. We conduct a…

35
arXiv — NLP / Computation & Language research 12d ago

Structured Inference with Large Language Gibbs

arXiv:2606.19264v1 Announce Type: cross Abstract: The knowledge encoded in large language models (LLMs) can serve as a substrate for structured reasoning over variables describing a complex world, but accessing this knowledge in a probabilistically coherent manner poses a…

8
arXiv — NLP / Computation & Language research 12d ago

Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation

arXiv:2606.19327v1 Announce Type: cross Abstract: Post-training of reasoning language models is commonly driven by supervised distillation and reinforcement learning with verifiable rewards. Distillation often relies on chain-of-thought annotations that are expensive to obtain…

35
arXiv — NLP / Computation & Language research 12d ago

Native Active Perception as Reasoning for Omni-Modal Understanding

arXiv:2606.19341v1 Announce Type: cross Abstract: Passive models for long video understanding typically rely on a "watch-it-all" paradigm, processing frames uniformly regardless of query difficulty, causing computational cost to grow with video duration. Although interactive…

12
arXiv — NLP / Computation & Language research 12d ago

ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

arXiv:2505.23851v3 Announce Type: replace Abstract: Large language models (LLMs) are increasingly applied to symbolic mathematics, yet existing evaluations often conflate pattern memorization with genuine reasoning. To address this gap, we present ASyMOB, a high-resolution…

38
arXiv — NLP / Computation & Language research 12d ago

UniECG: Understanding and Generating ECG in One Unified Model

arXiv:2509.18588v2 Announce Type: replace Abstract: Electrocardiogram (ECG) interpretation is a fundamental skill in medical education, yet students often need more than static examples to connect waveform evidence with diagnostic reasoning. This paper presents UniECG as a step…

38
arXiv — NLP / Computation & Language research 12d ago

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

arXiv:2603.00026v2 Announce Type: replace Abstract: Memory management is essential for LLM agents in long-term interactions. Current memory frameworks typically treat agents as passive ``recorders'' and retrieve information without understanding its deeper implications. They may…

15
Hugging Face Daily Papers research 12d ago

Guava: An Effective and Universal Harness for Embodied Manipulation

Abstract A harness framework for embodied tool use combines high-level reasoning with external modules, enabling compact models to perform complex manipulation tasks with minimal training data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Language models trained on large-scale…

15
Hugging Face Daily Papers research 12d ago

Sumi: Open Uniform Diffusion Language Model from Scratch

Abstract A large-scale uniform diffusion language model pretrained from scratch demonstrates competitive performance on knowledge and reasoning tasks while highlighting differences in commonsense reasoning compared to autoregressive models. Generated by…

15
Hugging Face Daily Papers research 12d ago

Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning

Abstract Visual-Seeker enables visual-native multimodal deep search through active visual reasoning, outperforming proprietary models on real-world web environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal large language models (MLLMs) have demonstrated…

25
arXiv — Machine Learning research 13d ago

Learning to Refine Hidden States for Reliable LLM Reasoning

arXiv:2606.17524v1 Announce Type: new Abstract: Large language models show strong reasoning ability, but their internal reasoning process can remain unstable in complex multi-step settings, where early hidden-state errors may propagate to incorrect predictions. We propose ReLAR,…

35
arXiv — Machine Learning research 13d ago

Continual Self-Improvement with Lightweight Experiential Latent Memories

arXiv:2606.17803v1 Announce Type: new Abstract: Large language models achieve strong reasoning performance by scaling inference-time compute, yet remain fundamentally stateless, discarding the rich, self-produced reasoning traces generated during this process. We investigate…

21
arXiv — Machine Learning research 13d ago

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

arXiv:2606.18089v1 Announce Type: new Abstract: Post-training pipelines that combine supervised fine-tuning (SFT) with reinforcement learning (RL) have emerged as the key recipe for transforming large language models (LLMs) into robust reasoners. We argue that this combined…

17
arXiv — NLP / Computation & Language research 13d ago

Decoding Hidden Deception in Reasoning LLMs: Activation Explainers for Deception Auditing

arXiv:2606.17478v1 Announce Type: new Abstract: As LLMs acquire stronger reasoning capabilities, deceptive behavior becomes an increasingly serious safety concern. Existing deception monitors either score visible transcripts or derive scalar probe scores from representation…

23
arXiv — NLP / Computation & Language research 13d ago

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

arXiv:2606.17682v1 Announce Type: new Abstract: Reinforcement learning pipelines for Large Language Model (LLM) training often rely on manually redesigned environments between stages, requiring practitioners to heuristically infer which configuration will best improve the…

28
arXiv — NLP / Computation & Language research 13d ago

SuCo: Sufficiency-guided Continuous Adaptive Reasoning

arXiv:2606.17687v1 Announce Type: new Abstract: Despite remarkable performance on complex tasks, Large Reasoning Models (LRMs) often generate excessively long Chain-of-Thoughts (CoT), inflating computational costs even for simple queries. Existing efforts to mitigate this…

21
arXiv — NLP / Computation & Language research 13d ago

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models

arXiv:2606.17890v1 Announce Type: new Abstract: Long-form chain-of-thought reasoning can improve LLM performance on complex tasks, but models often continue generating unnecessary reasoning after a correct answer has emerged. We refer to this behavior as overthinking. We study…

29
arXiv — NLP / Computation & Language research 13d ago

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

arXiv:2606.17905v1 Announce Type: new Abstract: Large language models perform increasingly well on standardized logical reasoning benchmarks, but whether this ability remains robust beyond English is unclear. We introduce ChLogic, an English--Chinese aligned benchmark that tests…

10
arXiv — NLP / Computation & Language research 13d ago

Visuals Lie, Consistency Speaks: Disentangling Spatial Attention from Reliability in Vision-Language Models

arXiv:2606.17389v1 Announce Type: cross Abstract: Multimodal Foundation Models are increasingly used as reasoning agents, making reliability, knowing when a model may hallucinate, critical. A common intuition, which we call the Attention-Confidence Assumption, holds that…

24
arXiv — NLP / Computation & Language research 13d ago

The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI Act

arXiv:2606.18158v1 Announce Type: cross Abstract: Large language models now produce legal text of at least median quality, yet no existing benchmark can evaluate whether they perform doctrinal legal reasoning, which forms the interpretive core of legal work, rather than the…

38
arXiv — NLP / Computation & Language research 13d ago

MedicalAgentsBench for Complex Medical Reasoning: Comparing Internalized Reasoning Models versus Externalized Agent-based Frameworks

arXiv:2503.07459v3 Announce Type: replace Abstract: Complex medical reasoning requires integrating heterogeneous clinical evidence across multiple inference steps. Large language models (LLMs) now approach this through two routes: internalized reasoning and externalized agent…

19
arXiv — NLP / Computation & Language research 13d ago

Adaptive Activation Steering for Efficient LLM Reasoning via Closed-Loop PID Control

arXiv:2506.18831v3 Announce Type: replace Abstract: Reasoning LLMs trained with long chain-of-thought often overthink: they spend tokens on redundant reflection and transitions that inflate cost without improving accuracy. Static activation steering (e.g.\ SEAL) suppresses such…

6
arXiv — NLP / Computation & Language research 13d ago

EngTrace: A Symbolic Benchmark for Verifiable Process Supervision of Engineering Reasoning

arXiv:2511.01650v3 Announce Type: replace Abstract: Large Language Models (LLMs) are increasingly entering specialized, safety-critical engineering workflows governed by strict quantitative standards and immutable physical laws, making rigorous evaluation of their reasoning…

38
arXiv — NLP / Computation & Language research 13d ago

Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning

arXiv:2601.03872v2 Announce Type: replace Abstract: The integration of large language models (LLMs) with external tools has significantly expanded the capabilities of AI agents. However, as the diversity of both LLMs and tools increases, selecting the optimal model-tool…

27
Hugging Face Daily Papers research 13d ago

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

Abstract ChLogic benchmark reveals persistent performance gaps between English and Chinese logical reasoning in large language models, influenced by surface realization differences and translation artifacts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models…

37
Hugging Face Daily Papers research 13d ago

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

Abstract A framework called TRIAGE is proposed to improve clinical early warning systems by training large language models to generate dialectical reasoning for continuous risk scoring with better calibration and interpretability. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

29
r/LocalLLaMA community 13d ago

“Wait,” in reasoning models makes my eye twitch

I get that it helps, I know why they do it, but it’s still annoying as hell lol   submitted by   /u/Borkato [link]   [comments]

11
r/LocalLLaMA community 13d ago

GLM-5.2 just dropped open weights and it already looks weirdly strong for coding

GLM-5.2 just released and the early numbers look pretty insane. 1M context window, open weights, MIT license, two reasoning effort modes, and it is already showing up near the top of coding arenas. I know every new model gets hyped for 24 hours, but this one actually looks worth…

28
Hugging Face Daily Papers research 13d ago

ExpRL: Exploratory RL for LLM Mid-Training

Abstract ExpRL uses human-written question-answer data as reward scaffolds to provide automated reinforcement learning priming for language models, outperforming traditional methods on math reasoning tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse reward reinforcement…

23
r/LocalLLaMA community 13d ago

Scaling former VibeThinker-1.5B to 3B — now it reaches frontier math & coding performance

https://preview.redd.it/obgodr9dfn7h1.png?width=1796&format=png&auto=webp&s=b5fd95e2b7e6f8ed7704e3de66778e970d34a1dd We trained VibeThinker-3B to test how far verifiable reasoning can be pushed in a strict small-model regime. It gets 94.3 on AIME'26, 80.2 on LiveCodeBench v6,…

36
Hugging Face Daily Papers research 13d ago

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale

Abstract Ling-2.6 and Ring-2.6 models are presented as scalable solutions for agentic intelligence, featuring architectural upgrades and specialized training methods to balance fast response times with advanced reasoning capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

34
r/LocalLLaMA community 13d ago

Gemma 12b - Reasoning hardening instructions

I've become quite happy with Gemma 12b QAT as a general assistant lately. It is small enough to run on my PC while still leave plenty of VRAM free for other tasks and fast enough that I I don't have to go make coffee while it thinks. I saw someone on youtube throwing trick…

36

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier

Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging

Learning from Own Solutions: Self-Conditioned Credit Assignment for Reinforcement Learning with Verifiable Rewards

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

LLM Parameters for Math Across Languages: Shared or Separate?

Towards Scalable Customization and Deployment of Multi-Agent Systems for Enterprise Applications

PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding

TW-LegalBench: Measuring Taiwanese Legal Understanding

Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning

ScholarSum: Student-Teacher Abstractive Summarization via Knowledge Graph Reasoning and Reflective Refinement

GraphPO: Graph-based Policy Optimization for Reasoning Models

Enhancing Multilingual Reasoning via Steerable Model Merging

DreamReasoner-8B: Block-Size Curriculum Learning for Diffusion Reasoning Models

Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

Structured Inference with Large Language Gibbs

Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation

Native Active Perception as Reasoning for Omni-Modal Understanding

ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

UniECG: Understanding and Generating ECG in One Unified Model

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Guava: An Effective and Universal Harness for Embodied Manipulation

Sumi: Open Uniform Diffusion Language Model from Scratch

Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning

Learning to Refine Hidden States for Reliable LLM Reasoning

Continual Self-Improvement with Lightweight Experiential Latent Memories

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

Decoding Hidden Deception in Reasoning LLMs: Activation Explainers for Deception Auditing

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

SuCo: Sufficiency-guided Continuous Adaptive Reasoning

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

Visuals Lie, Consistency Speaks: Disentangling Spatial Attention from Reliability in Vision-Language Models

The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI Act

MedicalAgentsBench for Complex Medical Reasoning: Comparing Internalized Reasoning Models versus Externalized Agent-based Frameworks

Adaptive Activation Steering for Efficient LLM Reasoning via Closed-Loop PID Control

EngTrace: A Symbolic Benchmark for Verifiable Process Supervision of Engineering Reasoning

Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

“Wait,” in reasoning models makes my eye twitch

GLM-5.2 just dropped open weights and it already looks weirdly strong for coding

ExpRL: Exploratory RL for LLM Mid-Training

Scaling former VibeThinker-1.5B to 3B — now it reaches frontier math & coding performance

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale

Gemma 12b - Reasoning hardening instructions