News / #reasoning Tag Reasoning 500 articles archived under #reasoning · RSS Sign in to follow arXiv — NLP / Computation & Language research 26d ago Imbuing Large Language Models with Bidirectional Logic for Robust Chain Repair arXiv:2606.05030v1 Announce Type: new Abstract: Autoregressive chain-of-thought (CoT) reasoning in large language models (LLMs) is fundamentally forward-directed: each step conditions only on prior tokens. This unidirectional inductive bias renders even capable models… 31 arXiv — NLP / Computation & Language research 26d ago Boosting Self-Consistency with Ranking arXiv:2606.05054v1 Announce Type: new Abstract: Self-consistency improves large language models by sampling multiple reasoning paths and selecting the most frequent answer, but majority voting often fails to recover correct answers that are already present among the samples. We… 33 arXiv — NLP / Computation & Language research 26d ago Arithmetic Pedagogy for Language Models arXiv:2606.05106v1 Announce Type: new Abstract: We investigate whether methods of human mathematics pedagogy can guide the training of language models toward arithmetic reasoning. Building on the GASING method -- an Indonesian pedagogy that solves basic arithmetic through a… 32 arXiv — NLP / Computation & Language research 26d ago Streaming Communication in Multi-Agent Reasoning arXiv:2606.05158v1 Announce Type: new Abstract: Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to… 8 arXiv — NLP / Computation & Language research 26d ago VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark arXiv:2606.04244v1 Announce Type: cross Abstract: Multimodal large language models are increasingly capable of complex reasoning, yet their performance often degrades when they must externalize a problem through a tool and then reason over the tool's output, specifically when… 7 arXiv — NLP / Computation & Language research 26d ago StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis arXiv:2606.04246v1 Announce Type: cross Abstract: Automatic generation of RTL code for digital hardware designs remains challenging due to long-horizon reasoning, multi-step dependencies, and strict correctness constraints in Verilog and VHDL. We present StepPRM-RTL, a novel… 8 arXiv — NLP / Computation & Language research 26d ago Cascading Hallucination in Agentic RAG: The CHARM Framework for Detection and Mitigation arXiv:2606.04435v1 Announce Type: cross Abstract: Multi-step agentic retrieval-augmented generation (RAG) pipelines have demonstrated significant capability for complex reasoning tasks, yet remain vulnerable to a class of failure that existing hallucination detection mechanisms… 25 r/MachineLearning community 26d ago Best Visual Reasoning Model in 2026 (Including APIs) [D] For example, suppose I have a one-hour video and I provide it to ChatGPT or another AI model. If I ask complex reasoning questions about the video, which models are best suited for long-horizon video understanding and reasoning? Which models can produce the most reliable answers… 38 Hugging Face Daily Papers research 26d ago ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning Abstract ThoughtFold addresses over-thinking in large reasoning models by using fine-grained preference learning to identify and eliminate redundant explorations in chain-of-thought reasoning processes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large Reasoning Models (LRMs)… 13 Hugging Face Daily Papers research 26d ago MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation Abstract MapAgent is an industrial-grade agentic architecture that combines vision-language processing with constraint-aware reasoning to produce specification-compliant lane maps, achieving high automation rates in large-scale urban mapping. Generated by… 21 Hugging Face Daily Papers research 26d ago Streaming Communication in Multi-Agent Reasoning Abstract StreamMA enables efficient multi-agent reasoning by streaming intermediate results and leveraging reliable early steps to improve both latency and effectiveness across various reasoning tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-agent reasoning systems… 12 Hugging Face Daily Papers research 26d ago Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Abstract Deep-research agents can be audited using a claim-centric framework that identifies error spans in their reasoning trajectories, improving reliability assessment beyond just final answer evaluation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deep-research agents solve… 20 Hugging Face Daily Papers research 26d ago MemTrain: Self-Supervised Context Memory Training Abstract A self-supervised training framework called MemTrain enhances long-horizon language model agents' memory capabilities through proxy tasks optimized via GRPO, improving downstream reasoning performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Memory is an… 4 Hugging Face Daily Papers research 26d ago Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching Abstract Wide-baseline matching presents a challenging spatial reasoning testbed for multimodal large language models, requiring systematic evaluation and training frameworks that current models lack, prompting the introduction of ReasonMatch-Bench and Dynamic Correspondence… 28 Hugging Face Daily Papers research 26d ago Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning Abstract Agentic Chain-of-Thought Steering (ACTS) formulates reasoning steering as a Markov decision process to enable efficient, controllable chain-of-thought reasoning with token savings. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models improve final-answer… 19 Hugging Face Daily Papers research 26d ago KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks Abstract KVarN is a calibration-free KV-cache quantizer that uses Hadamard rotation and dual-scaling variance normalization to reduce error accumulation during autoregressive decoding in large language models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Test-time scaling is a… 28 OpenAI official-blog 26d ago Introducing new capabilities to GPT-Rosalind GPT-Rosalind advances life sciences research with enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow capabilities. 38 Hugging Face Daily Papers research 27d ago OCC-RAG: Optimal Cognitive Core for Faithful Question Answering Abstract Compact task-specialized language models demonstrate superior performance in multi-hop reasoning and faithfulness compared to larger general-purpose models through a novel training pipeline and structured reasoning traces. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 32 Hugging Face Daily Papers research 27d ago TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL Abstract TRON enables scalable and controllable reinforcement learning for visual reasoning through an online environment substrate that generates unlimited diverse training instances with verifiable answers. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement learning… 22 Hugging Face Daily Papers research 27d ago Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces Abstract Answer-correct long chain-of-thought traces can lead to different fine-tuning outcomes, with post-conclusion continuations identified as harmful to training, characterized by uncertainty-geometry mismatches and addressed through a lightweight boundary proxy method.… 26 Hugging Face Daily Papers research 27d ago World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning Abstract Controlled concrete reasoning combines visual simulation with abstract reasoning through a training method that uses privileged future information to improve prediction accuracy and robustness. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World models and multimodal… 19 Hugging Face Daily Papers research 27d ago Value-Aware Stochastic KV Cache Eviction for Reasoning Models Abstract Value-aware stochastic KV cache eviction method improves reasoning model accuracy under compression by protecting large-magnitude states and promoting diverse eviction decisions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reasoning models improve accuracy through… 9 arXiv — Machine Learning research 27d ago Spectral-Progressive Thought Flow for Lightweight Multimodal Reasoning arXiv:2606.02842v1 Announce Type: new Abstract: Multimodal spatial reasoning often relies on long chains of intermediate textual and visual thoughts, where accumulating visual tokens and dense cross-modal attention incur substantial computation and memory overhead. To address… 6 arXiv — Machine Learning research 27d ago Are we really tilting? The mechanics of reward guidance in flow and diffusion models arXiv:2606.02884v1 Announce Type: new Abstract: Reward guidance algorithms steer a learned generative process toward the reward-tilted measure at inference time. While empirically powerful, these methods are prone to reward hacking: the guided model over-optimizes the reward at… 11 arXiv — Machine Learning research 27d ago KForge: LLM-Driven Cross-Platform Kernel Generation for AI Accelerators arXiv:2606.02963v1 Announce Type: new Abstract: Production inference increasingly targets a heterogeneous mix of accelerators. Agentic pipelines interleave reasoning, tool calls, and multi-agent coordination, each with distinct compute and memory profiles. For optimal… 19 arXiv — Machine Learning research 27d ago MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency arXiv:2606.03014v1 Announce Type: new Abstract: Mixture-of-Agents (MoA) systems improve reasoning accuracy by routing each query to multiple expert LLMs and aggregating their outputs. Efficiently executing this workload on limited GPU resources has bottlenecks. Skill-based… 22 arXiv — Machine Learning research 27d ago Libra: Efficient Resource Management for Agentic RL Post-Training arXiv:2606.03077v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a standard post-training paradigm for large language models (LLMs), extending beyond preference alignment to complex reasoning and multi-turn agentic behaviors. In agentic RL, the rollout… 23 arXiv — Machine Learning research 27d ago FGRPO: Federated GRPO with Adaptive Aggregation on Non-IID Data arXiv:2606.03094v1 Announce Type: new Abstract: Recent advances in language models have established reinforcement learning as the primary paradigm for eliciting self-correction and long-chain reasoning. While group relative policy optimization (GRPO) offers superior scalability… 4 arXiv — Machine Learning research 27d ago Right Makes Might: Aligning Verified Hidden States Empowers RL Reasoning arXiv:2606.03234v1 Announce Type: new Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) has become the dominant approach for improving mathematical reasoning in large language models, yet current methods reduce each correct rollout to a single reward bit, ignoring… 21 arXiv — Machine Learning research 27d ago KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks arXiv:2606.03458v1 Announce Type: new Abstract: Test-time scaling is a powerful approach to obtain better reasoning in large language models, but it becomes memory-bottlenecked during long-horizon decoding, as the KV-cache grows. KV-cache quantization can help improve this, but… 27 arXiv — NLP / Computation & Language research 27d ago Adaptive Latent Agentic Reasoning arXiv:2606.02871v1 Announce Type: new Abstract: Large reasoning models improve performance by generating extended chain-of-thought (CoT) reasoning, but this behavior becomes inefficient when applied to LLM agents. Current LLM agents often generate verbose textual reasoning at… 19 arXiv — NLP / Computation & Language research 27d ago Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States arXiv:2606.02907v1 Announce Type: new Abstract: Linear probing of large language model (LLM) hidden states is widely used to claim that models learn distinct representations for different reasoning types. We test this by probing Qwen3-14B on three benchmarks spanning the… 21 arXiv — NLP / Computation & Language research 27d ago Hint-Guided Diversified Policy Optimization for LLM Reasoning arXiv:2606.03021v1 Announce Type: new Abstract: Recent developments in Large Language Models (LLMs) have showcased impressive reasoning capabilities, with Reinforcement Learning with Verifiable Rewards (RLVR) being a promising enhancement strategy. However, existing reward… 6 arXiv — NLP / Computation & Language research 27d ago PhotoCraft: Agentic Reasoning with Hierarchical Self-Evolving Memory for Deep Image Search arXiv:2606.03099v1 Announce Type: new Abstract: Deep Image Search requires multi-step reasoning over rich contextual cues, such as time, location, and event relations. However, most existing LLM-based agents are stateless and reactive, lacking persistent memory to maintain… 11 arXiv — NLP / Computation & Language research 27d ago Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling arXiv:2606.03102v1 Announce Type: new Abstract: Test-time scaling improves the reasoning performance of large language models but incurs substantial cost in both total computation and latency. Existing adaptive sampling methods partially mitigate this issue by dynamically… 21 arXiv — NLP / Computation & Language research 27d ago SagaQA: A Multi-hop Reasoning Benchmark for Long-form Narrative Understanding in TV Series arXiv:2606.03301v1 Announce Type: new Abstract: We introduce SagaQA, a long-form video benchmark for multi-hop reasoning over full-length TV series. Existing video reasoning benchmarks often emphasize local understanding of adjacent frames or clips. SagaQA addresses this gap by… 33 arXiv — NLP / Computation & Language research 27d ago Evaluating LLMs' Effectiveness on Real-World Consumer Device Repair Questions arXiv:2606.03331v1 Announce Type: new Abstract: Consumer device repair is an important but underexplored testbed for large language models (LLMs). Repair tasks require reasoning over incomplete problem descriptions, hardware-specific diagnostics, actionable troubleshooting, and… 38 arXiv — NLP / Computation & Language research 27d ago The Unsampled Truth: Psychometrics in SLMs Measure Prompt Artifacts, Not Psychological Constructs arXiv:2606.03357v1 Announce Type: new Abstract: When prompting SLMs for psychometric assessments, researchers assume the outputs reflect semantic reasoning. We evaluate this premise across 13 open-weights models (0.6B to 14B parameters) using a prompt variation framework that… 18 arXiv — NLP / Computation & Language research 27d ago Framing Migration News with LLMs: Structured CoT as a Support for Human Interpretation arXiv:2606.03761v1 Announce Type: new Abstract: Frame analysis of migration news is a socially consequential task: media scholars and researchers who study how migration is narrated need tools that are not only accurate, but transparent, auditable, and accessible within the… 25 arXiv — NLP / Computation & Language research 27d ago HybridThinker: Efficient Chain-of-Thought Reasoning via Compressed Memory and Transient Thought Steps arXiv:2606.03768v1 Announce Type: new Abstract: Extended chain-of-thought (CoT) traces improve LLM reasoning but incur substantial computational and memory costs. While existing CoT compression methods mitigate this by condensing thought steps into compact representations via… 26 arXiv — NLP / Computation & Language research 27d ago Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation? arXiv:2606.03782v1 Announce Type: new Abstract: Large language models (LLMs) offer a promising approach to machine translation (MT) for extremely low-resource languages by incorporating linguistic resources through in-context learning. However, LLMs often struggle to apply… 14 arXiv — NLP / Computation & Language research 27d ago Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models arXiv:2606.03793v1 Announce Type: new Abstract: Multimodal Large Language Models integrate visual perception into language reasoning, introducing a continuous attack surface susceptible to adversarial attacks. Prior work on MLLM robustness has focused largely on English-centric… 19 arXiv — NLP / Computation & Language research 27d ago Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning arXiv:2606.03965v1 Announce Type: new Abstract: Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spend tokens inefficiently and offer little inference-time control. Existing efficient reasoning methods control thinking… 6 arXiv — NLP / Computation & Language research 27d ago Quantifying Faithful Confidence Expression in Large Reasoning Models arXiv:2606.03969v1 Announce Type: new Abstract: Reliable uncertainty communication is critical to the trustworthiness of LLMs, yet faithful calibration (FC)--the alignment between models' intrinsic and (linguistically) expressed confidence--is a persistent failure mode. This… 35 arXiv — NLP / Computation & Language research 27d ago Attention Calibration for Position-Fair Dense Information Retrieval arXiv:2606.02737v1 Announce Type: cross Abstract: Dense retrieval models exhibit positional bias: retrieval effectiveness degrades when relevant information appears later in a passage (Zeng et al., 2025). We ask whether this bias can be reduced at inference time, without… 34 arXiv — NLP / Computation & Language research 27d ago Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection arXiv:2606.02812v1 Announce Type: cross Abstract: Modeling patient trajectories from longitudinal electronic health records (EHRs) requires reasoning over sparse, noisy, and long-context multimodal sequences. Existing LLM-based multi-agent systems address context length but… 38 Hugging Face Daily Papers research 27d ago MindZero: Learning Online Mental Reasoning With Zero Annotations Abstract MindZero presents a self-supervised reinforcement learning framework that enables multimodal large language models to perform efficient and robust online mental reasoning without requiring explicit mental state annotations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 35 Simon Willison community 27d ago Microsoft's new MAI models Microsoft announced two new text LLMs this morning - MAI-Thinking-1 (reasoning, 35B parameters, available to "select early partners") and MAI-Code-1-Flash (5B parameters, "purpose-built for GitHub Copilot and VS Code to deliver high performance and lower cost [...] rolling out… 17 r/LocalLLaMA community 27d ago Weird issue with OpenCode and Qwen3.6 I’m using Qwen3.6-27B running on my server with llama-server for AI coding with OpenCode. Sometimes for some reason, the response stops when its reasoning like if it has finished outputting the full response. I have to type “continue” and it continues working like if nothing… 30 Hugging Face Daily Papers research 27d ago SVI-Bench: A Dynamic Microworld for Strategic Video Intelligence Abstract Strategic Video Intelligence requires understanding, causal reasoning, and planning capabilities that current benchmarks fail to evaluate adequately, leading to significant performance gaps in complex cognitive tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct True… 10 Page 10 of 10 · 500 articles ← Newer