News / #reasoning Tag Reasoning 500 articles archived under #reasoning · RSS Sign in to follow r/LocalLLaMA community 25d ago [NEW MODEL] SupraLabs just released a new model! - Supra-50M-Reasoning SupraLabs just released a new model! - Supra-50M-Reasoning Hello again r/LocalLLaMA ! Supra-50M-Reasoning (ThinkSupra-50M) is the reasoning version of Supra-50M-Instruct. It produces a full thinking chain before every answer, fine-tuned from Supra-50M-Base using a custom… 14 r/LocalLLaMA community 25d ago Benchmark & Reality Check on Gemma 4 12B: Great model, but your local settings are probably breaking it (Fix inside) I completed a Python bug hunting benchmark with Gemma 4 12B. I used the Unsloth Dynamic Q5 GGUF model. The model has good capabilities. Default settings in LM Studio disable the reasoning. Fix the LM Studio reasoning configuration. LM Studio looks for Qwen tokens. Gemma 4 uses… 30 Hugging Face Daily Papers research 25d ago Multimodal Music Recommendation System using LLMs Abstract A multimodal framework for session-based music recommendation integrates audio, lyric, and semantic signals with LLM-based sequential reasoning to improve recommendation accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Music recommendation systems typically treat… 16 arXiv — Machine Learning research 25d ago State commitment learning: training language models to distinguish computation from memory arXiv:2606.05201v1 Announce Type: new Abstract: Reasoning language models do not distinguish tokens used for computation from tokens that constitute persistent state: once generated, all hidden thoughts remain in context and influence future predictions. As a result, downstream… 19 arXiv — Machine Learning research 25d ago Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents arXiv:2606.05263v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards improves reasoning and tool use, yet long-horizon language agents still learn unsupported evidence chains, belief drift, and shortcut actions that satisfy terminal checks. Existing… 5 arXiv — Machine Learning research 25d ago Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models arXiv:2606.05434v1 Announce Type: new Abstract: Group Relative Policy Optimisation (GRPO) has emerged as an effective reinforcement-learning algorithm for aligning language models on reasoning tasks, but it treats every token position and every sampled rollout symmetrically. We… 17 arXiv — Machine Learning research 25d ago What Objects Enable, Not What They Are: Functional Latent Spaces for Affordance Reasoning arXiv:2606.05533v1 Announce Type: new Abstract: Existing robot planning systems rely on appearance-based reasoning, where visual observations are encoded into latent spaces organized around object appearances (e.g., recognizing a "cart" based on how it looks). However, planning… 13 arXiv — Machine Learning research 25d ago Compress-Distill: Reasoning Trace Compression for Efficient Knowledge Distillation arXiv:2606.05988v1 Announce Type: new Abstract: Reasoning models produce long chain-of-thought traces that are costly to distill and encourage verbose student outputs. We study post-hoc compression of such traces before knowledge distillation. Two teachers, Qwen3.5-397B-A17B and… 30 arXiv — Machine Learning research 25d ago HoT-SSM:Higher-order Temporal Knowledge Graph Reasoning with State Space Models for Health Care arXiv:2606.05994v1 Announce Type: new Abstract: Medical knowledge graphs (MKGs) infused with clinical knowledge have been increasingly used to model electronic health records (EHRs) to support interpretable predictions in healthcare domain. However, existing MKG-based approaches… 31 arXiv — Machine Learning research 25d ago On Advantage Estimates for Max@K Policy Gradients arXiv:2606.06080v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards is widely used for post-training reasoning models, but sparse outcome rewards make exploration difficult. A complementary approach is to optimize inference-time objectives such as… 19 arXiv — NLP / Computation & Language research 25d ago Multi-Granularity Reasoning for Natural Language Inference arXiv:2606.05181v1 Announce Type: new Abstract: Natural Language Inference (NLI) is a fundamental task in natural language understanding that requires determining the logical relationship between a premise and a hypothesis. Despite the remarkable success of transformer-based… 31 arXiv — NLP / Computation & Language research 25d ago LoRi: Low-Rank Distillation for Implicit Reasoning arXiv:2606.05315v1 Announce Type: new Abstract: Implicit chain-of-thought (iCoT) methods aim to internalize reasoning in large language models, but often underperform explicit CoT prompting. We empirically find that hidden-state reasoning trajectories exhibit low-rank structure.… 36 arXiv — NLP / Computation & Language research 25d ago ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces arXiv:2606.05402v1 Announce Type: new Abstract: Large reasoning models (LRMs) produce reasoning traces with non-linear structures, such as backtracking and self-correction, that complicate the evaluation and monitoring of the reasoning process. We introduce ReasoningFlow, a… 30 arXiv — NLP / Computation & Language research 25d ago Beyond tokens: a unified framework for latent communication in LLM-based multi-agent systems arXiv:2606.05711v1 Announce Type: new Abstract: Multi-agent systems built on large language models (LLMs) have become a prevailing paradigm for tackling complex reasoning, planning, and tool-use tasks. The dominant communication protocol in such systems is natural language:… 24 arXiv — NLP / Computation & Language research 25d ago Narrative Knowledge Weaver: Narrative-Centric Retrieval-Augmented Reasoning for Long-Form Text Understanding arXiv:2606.05724v1 Announce Type: new Abstract: Long-form narrative QA requires reasoning over evolving story worlds rather than isolated passages: answers may depend on earlier goals, changing character states, social relations, causal triggers, temporal position, and later… 24 arXiv — NLP / Computation & Language research 25d ago MARDoc: A Memory-Aware Refinement Agent Framework for Multimodal Long Document QA arXiv:2606.05749v1 Announce Type: new Abstract: Iterative retrieval-reasoning agents have recently shown promise for multimodal long-document question answering. However, most existing systems maintain a single growing context that mixes retrieval traces, observations, and… 10 arXiv — NLP / Computation & Language research 25d ago TARPO: Token-Wise Latent-Explicit Reasoning via Action-Routing Policy Optimization arXiv:2606.05859v1 Announce Type: new Abstract: Latent reasoning has emerged as a promising alternative to discrete Chain-of-Thought (CoT) in large language models (LLMs), enabling more expressive reasoning by operating over continuous representations. However, the inherently… 7 arXiv — NLP / Computation & Language research 25d ago IA-RAG: Interval-Algebra-Driven Temporal Reasoning for Dynamic Knowledge Retrieval arXiv:2606.06044v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) has shown strong effectiveness in grounding Large Language Models (LLMs) with external knowledge. However, existing RAG and Graph RAG frameworks largely treat knowledge as static or associate… 13 arXiv — NLP / Computation & Language research 25d ago SkillComposer: Learning to Evolve Agent Skills for Specification and Generalization arXiv:2606.06079v1 Announce Type: new Abstract: Agent skills, which consist of reusable strategies that guide agent reasoning and action, have shown strong potential for improving model capability at inference time. However, current skill construction methods treat the problem… 18 arXiv — NLP / Computation & Language research 25d ago Harnessing Structural Context for Entity Alignment Foundation Models arXiv:2606.06109v1 Announce Type: new Abstract: Entity alignment (EA) aims to identify equivalent entities across heterogeneous knowledge graphs (KGs) and is a key component of knowledge fusion and cross-KG reasoning. The recent EA foundation model demonstrates that alignment… 6 arXiv — NLP / Computation & Language research 25d ago The Tell-Tale Norm: $\ell_2$ Magnitude as a Signal for Reasoning Dynamics in Large Language Models arXiv:2606.06188v1 Announce Type: new Abstract: Recent work has sought to understand Large Language Models (LLMs) reasoning, yet a principled, model-intrinsic signal that captures its layer-wise reasoning dynamics remains underexplored. We bridge this gap by demonstrating that… 38 arXiv — NLP / Computation & Language research 25d ago Latent Reasoning with Normalizing Flows arXiv:2606.06447v1 Announce Type: new Abstract: Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and… 15 Hugging Face Daily Papers research 25d ago Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image Editing Abstract RE-Edit benchmark evaluates image editing systems on five reasoning dimensions to assess logical consistency beyond visual plausibility. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Diffusion-based image editing has achieved strong visual fidelity under natural language… 6 Hugging Face Daily Papers research 25d ago The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs Abstract Inference-time scaling is enhanced through constrained optimization that allocates computational resources based on economic principles, improving performance in resource-constrained environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Inference-time scaling has… 9 Hugging Face Daily Papers research 25d ago Latent Reasoning with Normalizing Flows Abstract Latent reasoning framework using normalizing flows preserves autoregressive generation advantages while enabling efficient, probabilistic intermediate computation in large language models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models often improve… 26 Hugging Face Daily Papers research 25d ago Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction Abstract Future-L1, an interleaved latent visual reasoning framework, improves video event prediction by maintaining visual semantics in latent space during autoregressive decoding, achieving state-of-the-art results on FutureBench and TwiFF-Bench benchmarks. Generated by… 20 Hugging Face Daily Papers research 25d ago VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding Abstract VideoKR presents a large-scale video reasoning dataset and benchmark designed to enhance knowledge-intensive video understanding through expert-domain content and human-in-the-loop example generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce VideoKR,… 24 Hugging Face Daily Papers research 25d ago Unsupervised Skill Discovery for Agentic Data Analysis Abstract DataCOPE is an unsupervised framework that discovers reusable data-analysis skills through verifier-guided exploration, improving analytical performance in both report-style and reasoning-style tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Inference-time skill… 28 r/LocalLLaMA community 25d ago NVIDIA Nemotron 3 Ultra is out. Not sure how much this is in the "local" world but interesting what they are putting out. https://developer.nvidia.com/blog/nvidia-nemotron-3-ultra-powers-faster-more-efficient-reasoning-for-long-running-agents/   submitted by   /u/justdoitanddont [link]   [comments] 33 r/LocalLLaMA community 25d ago KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag) The KV-cache quant race just got more interesting. Huawei just open-sourced KVarN , a KV-cache quantization method under Apache 2.0, drops into vLLM with one flag. Posting because the tradeoff it's claiming is genuinely different from what's already in the stack, and I'd like to… 20 Hugging Face Daily Papers research 25d ago DAR: Deontic Reasoning with Agentic Harnesses Abstract Deontic reasoning tasks require applying complex rules and policies, and an agentic approach enables models to dynamically access statutes, showing mixed performance improvements across different model strengths. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deontic… 7 NVIDIA Developer Blog official-blog 25d ago NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete... 33 Hugging Face Daily Papers research 26d ago SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes Abstract Vision-language models demonstrate strong performance on isolated spatial reasoning tasks but fail to maintain coherent spatial understanding and reliable actions during multi-turn interactive feedback in 3D environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 15 Hugging Face Daily Papers research 26d ago Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions Abstract Decentralized agent economies with auction-based competition and wealth accumulation enable emergent collective intelligence without central coordination, outperforming monolithic approaches in complex reasoning and optimization tasks. Generated by… 27 Vercel — AI dev-tools 26d ago Nemotron 3 Ultra now available on AI Gateway Nemotron 3 Ultra from Nvidia is now available on Vercel AI Gateway . Nemotron 3 Ultra is an open Mixture-of-Experts reasoning model built for orchestrating long-running agent workflows, with a 1M token context window. The model targets multi-turn agent workflows: planning, tool… 37 arXiv — Machine Learning research 26d ago From Symbolic to Geometric: Enabling Spatial Reasoning in Large Language Models arXiv:2606.04381v1 Announce Type: new Abstract: Recent large language models (LLMs) often appear to exhibit spatial reasoning ability; however, this capability is largely \emph{symbolic}, arising from pattern matching over spatial language rather than true \emph{geometric}… 34 arXiv — Machine Learning research 26d ago Smart Picks in the Dark: Towards Efficient RLVR for Reasoning via Tracing Metacognitive Pivots arXiv:2606.04503v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has greatly advanced large reasoning models (LRMs), but it requires timely training on a huge fully-annotated dataset. To this end, data-efficient RLVR methods have been widely… 5 arXiv — Machine Learning research 26d ago GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling arXiv:2606.04516v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) significantly advances LLM reasoning, yet it faces a dilemma: standard supervised scaling is throttled by high annotation costs, while unsupervised alternatives suffer from… 15 arXiv — Machine Learning research 26d ago Rollout-Level Advantage-Prioritized Experience Replay for GRPO arXiv:2606.04560v1 Announce Type: new Abstract: Reinforcement learning from verifiable rewards with GRPO is a standard approach for post-training reasoning LLMs. It remains sample inefficient. Each rollout is used for a single gradient update and then discarded. Naive replay is… 38 arXiv — NLP / Computation & Language research 26d ago SaliMory: Orchestrating Cognitive Memory for Conversational Agents arXiv:2606.04120v1 Announce Type: new Abstract: Conversational agents that serve as lifelong companions must maintain persistent memory across all interactions. However, simply expanding context windows with raw retrieval degrades reasoning quality, while training memory agents… 10 arXiv — NLP / Computation & Language research 26d ago Deliberate Evolution: Agentic Reasoning for Sample-Efficient Symbolic Regression with LLMs arXiv:2606.04360v1 Announce Type: new Abstract: Symbolic regression (SR) discovers compact mathematical expressions from data, yet recent LLM-based evolutionary methods remain sample-inefficient because they rely mainly on scalar feedback such as MSE. We identify a core… 37 arXiv — NLP / Computation & Language research 26d ago MemoryDocDataSet: A Benchmark for Joint Conversational Memory and Long Document Reasoning arXiv:2606.04442v1 Announce Type: new Abstract: AI systems increasingly need to combine two demanding capabilities: navigating multi-session conversation history and performing deep reading comprehension within long documents. Yet no existing benchmark evaluates both… 16 arXiv — NLP / Computation & Language research 26d ago Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation arXiv:2606.04454v1 Announce Type: new Abstract: Large language models have shown strong performance in natural language generation and downstream reasoning tasks, but they still struggle with logical consistency, factual grounding, and interpretability in complex multi-step… 15 arXiv — NLP / Computation & Language research 26d ago Learning What to Learn: Stage-Specific Data Sets for SFT-then-RL in Small Language Model Reasoning arXiv:2606.04466v1 Announce Type: new Abstract: Post-training Small Language Models (SLMs) for reasoning typically follows an SFT-then-RL pipeline, yet existing work rarely considers what data should be learned at each stage. We argue that data strategy should be aligned with… 24 arXiv — NLP / Computation & Language research 26d ago Entity Binding Failures in Speech LLM Reasoning: Diagnosis and Chain-of-Thought Intervention arXiv:2606.04474v1 Announce Type: new Abstract: Speech Large Language Models (SLLMs) underperform their text counterparts on complex reasoning. We reveal that this modality gap is not a uniform cognitive deficit. Evaluating three diverse SLLMs, we show speech-to-text (S2T)… 37 arXiv — NLP / Computation & Language research 26d ago Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models arXiv:2606.04535v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) offer bidirectional attention and parallel generation, enabling them to exploit global context and naturally support format-constrained tasks like parseable JSON or reasoning templates. While… 16 arXiv — NLP / Computation & Language research 26d ago GRAIL: Gradient-Reweighted Advantages for Reinforcement Learning with Verifiable Rewards arXiv:2606.04889v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (e.g. GRPO) is now a common way to improve mathematical reasoning in Large Language Models (LLMs). However, current methods usually broadcast one sequence-level advantage to all… 8 arXiv — NLP / Computation & Language research 26d ago Caliper: Probing Lexical Anchors versus Causal Structure in LLMs arXiv:2606.04915v1 Announce Type: new Abstract: Large language models reach 50 to 70% accuracy on causal reasoning benchmarks such as CLadder, but it is unclear whether this reflects structural reasoning or lexical pattern matching. We introduce Caliper, a controlled… 18 arXiv — NLP / Computation & Language research 26d ago DeliChess: A Multi-party Dialogue Dataset for Deliberation in Chess Puzzle Solving arXiv:2606.04987v1 Announce Type: new Abstract: Multi-party dialogue is a critical setting for studying collaborative reasoning and decision-making, yet existing datasets rarely focus on structured, in-depth complex reasoning tasks. We introduce DeliChess, a novel dataset of… 35 arXiv — NLP / Computation & Language research 26d ago DAR: Deontic Reasoning with Agentic Harnesses arXiv:2606.05009v1 Announce Type: new Abstract: Deontic reasoning is the task of answering questions by applying explicit rules and policies to case-specific facts, for example computing tax liability under a statute or determining the outcome of an immigration appeal. A key… 22 Page 9 of 10 · 500 articles ← Newer Older →