Tag

Reasoning

500 articles archived under #reasoning · RSS

r/LocalLLaMA community 25d ago

[NEW MODEL] SupraLabs just released a new model! - Supra-50M-Reasoning

SupraLabs just released a new model! - Supra-50M-Reasoning Hello again r/LocalLLaMA ! Supra-50M-Reasoning (ThinkSupra-50M) is the reasoning version of Supra-50M-Instruct. It produces a full thinking chain before every answer, fine-tuned from Supra-50M-Base using a custom…

14
r/LocalLLaMA community 25d ago

Benchmark & Reality Check on Gemma 4 12B: Great model, but your local settings are probably breaking it (Fix inside)

I completed a Python bug hunting benchmark with Gemma 4 12B. I used the Unsloth Dynamic Q5 GGUF model. The model has good capabilities. Default settings in LM Studio disable the reasoning. Fix the LM Studio reasoning configuration. LM Studio looks for Qwen tokens. Gemma 4 uses…

30
Hugging Face Daily Papers research 25d ago

Multimodal Music Recommendation System using LLMs

Abstract A multimodal framework for session-based music recommendation integrates audio, lyric, and semantic signals with LLM-based sequential reasoning to improve recommendation accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Music recommendation systems typically treat…

16
arXiv — Machine Learning research 25d ago

State commitment learning: training language models to distinguish computation from memory

arXiv:2606.05201v1 Announce Type: new Abstract: Reasoning language models do not distinguish tokens used for computation from tokens that constitute persistent state: once generated, all hidden thoughts remain in context and influence future predictions. As a result, downstream…

19
arXiv — Machine Learning research 25d ago

Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents

arXiv:2606.05263v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards improves reasoning and tool use, yet long-horizon language agents still learn unsupported evidence chains, belief drift, and shortcut actions that satisfy terminal checks. Existing…

5
arXiv — Machine Learning research 25d ago

Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models

arXiv:2606.05434v1 Announce Type: new Abstract: Group Relative Policy Optimisation (GRPO) has emerged as an effective reinforcement-learning algorithm for aligning language models on reasoning tasks, but it treats every token position and every sampled rollout symmetrically. We…

17
arXiv — Machine Learning research 25d ago

What Objects Enable, Not What They Are: Functional Latent Spaces for Affordance Reasoning

arXiv:2606.05533v1 Announce Type: new Abstract: Existing robot planning systems rely on appearance-based reasoning, where visual observations are encoded into latent spaces organized around object appearances (e.g., recognizing a "cart" based on how it looks). However, planning…

13
arXiv — Machine Learning research 25d ago

Compress-Distill: Reasoning Trace Compression for Efficient Knowledge Distillation

arXiv:2606.05988v1 Announce Type: new Abstract: Reasoning models produce long chain-of-thought traces that are costly to distill and encourage verbose student outputs. We study post-hoc compression of such traces before knowledge distillation. Two teachers, Qwen3.5-397B-A17B and…

30
arXiv — Machine Learning research 25d ago

HoT-SSM:Higher-order Temporal Knowledge Graph Reasoning with State Space Models for Health Care

arXiv:2606.05994v1 Announce Type: new Abstract: Medical knowledge graphs (MKGs) infused with clinical knowledge have been increasingly used to model electronic health records (EHRs) to support interpretable predictions in healthcare domain. However, existing MKG-based approaches…

31
arXiv — Machine Learning research 25d ago

On Advantage Estimates for Max@K Policy Gradients

arXiv:2606.06080v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards is widely used for post-training reasoning models, but sparse outcome rewards make exploration difficult. A complementary approach is to optimize inference-time objectives such as…

19
arXiv — NLP / Computation & Language research 25d ago

Multi-Granularity Reasoning for Natural Language Inference

arXiv:2606.05181v1 Announce Type: new Abstract: Natural Language Inference (NLI) is a fundamental task in natural language understanding that requires determining the logical relationship between a premise and a hypothesis. Despite the remarkable success of transformer-based…

31
arXiv — NLP / Computation & Language research 25d ago

LoRi: Low-Rank Distillation for Implicit Reasoning

arXiv:2606.05315v1 Announce Type: new Abstract: Implicit chain-of-thought (iCoT) methods aim to internalize reasoning in large language models, but often underperform explicit CoT prompting. We empirically find that hidden-state reasoning trajectories exhibit low-rank structure.…

36
arXiv — NLP / Computation & Language research 25d ago

ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces

arXiv:2606.05402v1 Announce Type: new Abstract: Large reasoning models (LRMs) produce reasoning traces with non-linear structures, such as backtracking and self-correction, that complicate the evaluation and monitoring of the reasoning process. We introduce ReasoningFlow, a…

30
arXiv — NLP / Computation & Language research 25d ago

Beyond tokens: a unified framework for latent communication in LLM-based multi-agent systems

arXiv:2606.05711v1 Announce Type: new Abstract: Multi-agent systems built on large language models (LLMs) have become a prevailing paradigm for tackling complex reasoning, planning, and tool-use tasks. The dominant communication protocol in such systems is natural language:…

24
arXiv — NLP / Computation & Language research 25d ago

Narrative Knowledge Weaver: Narrative-Centric Retrieval-Augmented Reasoning for Long-Form Text Understanding

arXiv:2606.05724v1 Announce Type: new Abstract: Long-form narrative QA requires reasoning over evolving story worlds rather than isolated passages: answers may depend on earlier goals, changing character states, social relations, causal triggers, temporal position, and later…

24
arXiv — NLP / Computation & Language research 25d ago

MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA

arXiv:2606.05749v1 Announce Type: new Abstract: Iterative retrieval-reasoning agents have recently shown promise for multimodal long-document question answering. However, most existing systems maintain a single growing context that mixes retrieval traces, observations, and…

10
arXiv — NLP / Computation & Language research 25d ago

TARPO: Token-Wise Latent-Explicit Reasoning via Action-Routing Policy Optimization

arXiv:2606.05859v1 Announce Type: new Abstract: Latent reasoning has emerged as a promising alternative to discrete Chain-of-Thought (CoT) in large language models (LLMs), enabling more expressive reasoning by operating over continuous representations. However, the inherently…

7
arXiv — NLP / Computation & Language research 25d ago

IA-RAG: Interval-Algebra-Driven Temporal Reasoning for Dynamic Knowledge Retrieval

arXiv:2606.06044v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) has shown strong effectiveness in grounding Large Language Models (LLMs) with external knowledge. However, existing RAG and Graph RAG frameworks largely treat knowledge as static or associate…

13
arXiv — NLP / Computation & Language research 25d ago

SkillComposer: Learning to Evolve Agent Skills for Specification and Generalization

arXiv:2606.06079v1 Announce Type: new Abstract: Agent skills, which consist of reusable strategies that guide agent reasoning and action, have shown strong potential for improving model capability at inference time. However, current skill construction methods treat the problem…

18
arXiv — NLP / Computation & Language research 25d ago

Harnessing Structural Context for Entity Alignment Foundation Models

arXiv:2606.06109v1 Announce Type: new Abstract: Entity alignment (EA) aims to identify equivalent entities across heterogeneous knowledge graphs (KGs) and is a key component of knowledge fusion and cross-KG reasoning. The recent EA foundation model demonstrates that alignment…

6
arXiv — NLP / Computation & Language research 25d ago

The Tell-Tale Norm: $\ell_2$ Magnitude as a Signal for Reasoning Dynamics in Large Language Models

arXiv:2606.06188v1 Announce Type: new Abstract: Recent work has sought to understand Large Language Models (LLMs) reasoning, yet a principled, model-intrinsic signal that captures its layer-wise reasoning dynamics remains underexplored. We bridge this gap by demonstrating that…

38
arXiv — NLP / Computation & Language research 25d ago

Latent Reasoning with Normalizing Flows

arXiv:2606.06447v1 Announce Type: new Abstract: Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and…

15
Hugging Face Daily Papers research 25d ago

Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image Editing

Abstract RE-Edit benchmark evaluates image editing systems on five reasoning dimensions to assess logical consistency beyond visual plausibility. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Diffusion-based image editing has achieved strong visual fidelity under natural language…

6
Hugging Face Daily Papers research 25d ago

The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs

Abstract Inference-time scaling is enhanced through constrained optimization that allocates computational resources based on economic principles, improving performance in resource-constrained environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Inference-time scaling has…

9
Hugging Face Daily Papers research 25d ago

Latent Reasoning with Normalizing Flows

Abstract Latent reasoning framework using normalizing flows preserves autoregressive generation advantages while enabling efficient, probabilistic intermediate computation in large language models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models often improve…

26
Hugging Face Daily Papers research 25d ago

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

Abstract Future-L1, an interleaved latent visual reasoning framework, improves video event prediction by maintaining visual semantics in latent space during autoregressive decoding, achieving state-of-the-art results on FutureBench and TwiFF-Bench benchmarks. Generated by…

20
Hugging Face Daily Papers research 25d ago

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

Abstract VideoKR presents a large-scale video reasoning dataset and benchmark designed to enhance knowledge-intensive video understanding through expert-domain content and human-in-the-loop example generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce VideoKR,…

24
Hugging Face Daily Papers research 25d ago

Unsupervised Skill Discovery for Agentic Data Analysis

Abstract DataCOPE is an unsupervised framework that discovers reusable data-analysis skills through verifier-guided exploration, improving analytical performance in both report-style and reasoning-style tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Inference-time skill…

28
r/LocalLLaMA community 25d ago

NVIDIA Nemotron 3 Ultra is out.

Not sure how much this is in the "local" world but interesting what they are putting out. https://developer.nvidia.com/blog/nvidia-nemotron-3-ultra-powers-faster-more-efficient-reasoning-for-long-running-agents/   submitted by   /u/justdoitanddont [link]   [comments]

33
r/LocalLLaMA community 25d ago

KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag)

The KV-cache quant race just got more interesting. Huawei just open-sourced KVarN , a KV-cache quantization method under Apache 2.0, drops into vLLM with one flag. Posting because the tradeoff it's claiming is genuinely different from what's already in the stack, and I'd like to…

20
Hugging Face Daily Papers research 25d ago

DAR: Deontic Reasoning with Agentic Harnesses

Abstract Deontic reasoning tasks require applying complex rules and policies, and an agentic approach enables models to dynamically access statutes, showing mixed performance improvements across different model strengths. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deontic…

7
NVIDIA Developer Blog official-blog 25d ago

NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents

Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete...

33
Hugging Face Daily Papers research 26d ago

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes

Abstract Vision-language models demonstrate strong performance on isolated spatial reasoning tasks but fail to maintain coherent spatial understanding and reliable actions during multi-turn interactive feedback in 3D environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

15
Hugging Face Daily Papers research 26d ago

Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions

Abstract Decentralized agent economies with auction-based competition and wealth accumulation enable emergent collective intelligence without central coordination, outperforming monolithic approaches in complex reasoning and optimization tasks. Generated by…

27
Vercel — AI dev-tools 26d ago

Nemotron 3 Ultra now available on AI Gateway

Nemotron 3 Ultra from Nvidia is now available on Vercel AI Gateway . Nemotron 3 Ultra is an open Mixture-of-Experts reasoning model built for orchestrating long-running agent workflows, with a 1M token context window. The model targets multi-turn agent workflows: planning, tool…

37
arXiv — Machine Learning research 26d ago

From Symbolic to Geometric: Enabling Spatial Reasoning in Large Language Models

arXiv:2606.04381v1 Announce Type: new Abstract: Recent large language models (LLMs) often appear to exhibit spatial reasoning ability; however, this capability is largely \emph{symbolic}, arising from pattern matching over spatial language rather than true \emph{geometric}…

34
arXiv — Machine Learning research 26d ago

Smart Picks in the Dark: Towards Efficient RLVR for Reasoning via Tracing Metacognitive Pivots

arXiv:2606.04503v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has greatly advanced large reasoning models (LRMs), but it requires timely training on a huge fully-annotated dataset. To this end, data-efficient RLVR methods have been widely…

5
arXiv — Machine Learning research 26d ago

GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling

arXiv:2606.04516v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) significantly advances LLM reasoning, yet it faces a dilemma: standard supervised scaling is throttled by high annotation costs, while unsupervised alternatives suffer from…

15
arXiv — Machine Learning research 26d ago

Rollout-Level Advantage-Prioritized Experience Replay for GRPO

arXiv:2606.04560v1 Announce Type: new Abstract: Reinforcement learning from verifiable rewards with GRPO is a standard approach for post-training reasoning LLMs. It remains sample inefficient. Each rollout is used for a single gradient update and then discarded. Naive replay is…

38
arXiv — NLP / Computation & Language research 26d ago

SaliMory: Orchestrating Cognitive Memory for Conversational Agents

arXiv:2606.04120v1 Announce Type: new Abstract: Conversational agents that serve as lifelong companions must maintain persistent memory across all interactions. However, simply expanding context windows with raw retrieval degrades reasoning quality, while training memory agents…

10
arXiv — NLP / Computation & Language research 26d ago

Deliberate Evolution: Agentic Reasoning for Sample-Efficient Symbolic Regression with LLMs

arXiv:2606.04360v1 Announce Type: new Abstract: Symbolic regression (SR) discovers compact mathematical expressions from data, yet recent LLM-based evolutionary methods remain sample-inefficient because they rely mainly on scalar feedback such as MSE. We identify a core…

37
arXiv — NLP / Computation & Language research 26d ago

MemoryDocDataSet: A Benchmark for Joint Conversational Memory and Long Document Reasoning

arXiv:2606.04442v1 Announce Type: new Abstract: AI systems increasingly need to combine two demanding capabilities: navigating multi-session conversation history and performing deep reading comprehension within long documents. Yet no existing benchmark evaluates both…

16
arXiv — NLP / Computation & Language research 26d ago

Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation

arXiv:2606.04454v1 Announce Type: new Abstract: Large language models have shown strong performance in natural language generation and downstream reasoning tasks, but they still struggle with logical consistency, factual grounding, and interpretability in complex multi-step…

15
arXiv — NLP / Computation & Language research 26d ago

Learning What to Learn: Stage-Specific Data Sets for SFT-then-RL in Small Language Model Reasoning

arXiv:2606.04466v1 Announce Type: new Abstract: Post-training Small Language Models (SLMs) for reasoning typically follows an SFT-then-RL pipeline, yet existing work rarely considers what data should be learned at each stage. We argue that data strategy should be aligned with…

24
arXiv — NLP / Computation & Language research 26d ago

Entity Binding Failures in Speech LLM Reasoning: Diagnosis and Chain-of-Thought Intervention

arXiv:2606.04474v1 Announce Type: new Abstract: Speech Large Language Models (SLLMs) underperform their text counterparts on complex reasoning. We reveal that this modality gap is not a uniform cognitive deficit. Evaluating three diverse SLLMs, we show speech-to-text (S2T)…

37
arXiv — NLP / Computation & Language research 26d ago

Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models

arXiv:2606.04535v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) offer bidirectional attention and parallel generation, enabling them to exploit global context and naturally support format-constrained tasks like parseable JSON or reasoning templates. While…

16
arXiv — NLP / Computation & Language research 26d ago

GRAIL: Gradient-Reweighted Advantages for Reinforcement Learning with Verifiable Rewards

arXiv:2606.04889v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (e.g. GRPO) is now a common way to improve mathematical reasoning in Large Language Models (LLMs). However, current methods usually broadcast one sequence-level advantage to all…

8
arXiv — NLP / Computation & Language research 26d ago

Caliper: Probing Lexical Anchors versus Causal Structure in LLMs

arXiv:2606.04915v1 Announce Type: new Abstract: Large language models reach 50 to 70% accuracy on causal reasoning benchmarks such as CLadder, but it is unclear whether this reflects structural reasoning or lexical pattern matching. We introduce Caliper, a controlled…

18
arXiv — NLP / Computation & Language research 26d ago

DeliChess: A Multi-party Dialogue Dataset for Deliberation in Chess Puzzle Solving

arXiv:2606.04987v1 Announce Type: new Abstract: Multi-party dialogue is a critical setting for studying collaborative reasoning and decision-making, yet existing datasets rarely focus on structured, in-depth complex reasoning tasks. We introduce DeliChess, a novel dataset of…

35
arXiv — NLP / Computation & Language research 26d ago

DAR: Deontic Reasoning with Agentic Harnesses

arXiv:2606.05009v1 Announce Type: new Abstract: Deontic reasoning is the task of answering questions by applying explicit rules and policies to case-specific facts, for example computing tax liability under a statute or determining the outcome of an immigration appeal. A key…

22

[NEW MODEL] SupraLabs just released a new model! - Supra-50M-Reasoning

Benchmark & Reality Check on Gemma 4 12B: Great model, but your local settings are probably breaking it (Fix inside)

Multimodal Music Recommendation System using LLMs

State commitment learning: training language models to distinguish computation from memory

Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents

Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models

What Objects Enable, Not What They Are: Functional Latent Spaces for Affordance Reasoning

Compress-Distill: Reasoning Trace Compression for Efficient Knowledge Distillation

HoT-SSM:Higher-order Temporal Knowledge Graph Reasoning with State Space Models for Health Care

On Advantage Estimates for Max@K Policy Gradients

Multi-Granularity Reasoning for Natural Language Inference

LoRi: Low-Rank Distillation for Implicit Reasoning

ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces

Beyond tokens: a unified framework for latent communication in LLM-based multi-agent systems

Narrative Knowledge Weaver: Narrative-Centric Retrieval-Augmented Reasoning for Long-Form Text Understanding

MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA

TARPO: Token-Wise Latent-Explicit Reasoning via Action-Routing Policy Optimization

IA-RAG: Interval-Algebra-Driven Temporal Reasoning for Dynamic Knowledge Retrieval

SkillComposer: Learning to Evolve Agent Skills for Specification and Generalization

Harnessing Structural Context for Entity Alignment Foundation Models

The Tell-Tale Norm: $\ell_2$ Magnitude as a Signal for Reasoning Dynamics in Large Language Models

Latent Reasoning with Normalizing Flows

Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image Editing

The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs

Latent Reasoning with Normalizing Flows

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

Unsupervised Skill Discovery for Agentic Data Analysis

NVIDIA Nemotron 3 Ultra is out.

KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag)

DAR: Deontic Reasoning with Agentic Harnesses

NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes

Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions

Nemotron 3 Ultra now available on AI Gateway

From Symbolic to Geometric: Enabling Spatial Reasoning in Large Language Models

Smart Picks in the Dark: Towards Efficient RLVR for Reasoning via Tracing Metacognitive Pivots

GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling

Rollout-Level Advantage-Prioritized Experience Replay for GRPO

SaliMory: Orchestrating Cognitive Memory for Conversational Agents

Deliberate Evolution: Agentic Reasoning for Sample-Efficient Symbolic Regression with LLMs

MemoryDocDataSet: A Benchmark for Joint Conversational Memory and Long Document Reasoning

Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation

Learning What to Learn: Stage-Specific Data Sets for SFT-then-RL in Small Language Model Reasoning

Entity Binding Failures in Speech LLM Reasoning: Diagnosis and Chain-of-Thought Intervention

Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models

GRAIL: Gradient-Reweighted Advantages for Reinforcement Learning with Verifiable Rewards

Caliper: Probing Lexical Anchors versus Causal Structure in LLMs

DeliChess: A Multi-party Dialogue Dataset for Deliberation in Chess Puzzle Solving

DAR: Deontic Reasoning with Agentic Harnesses