News / #reasoning Tag Reasoning 500 articles archived under #reasoning · RSS Sign in to follow arXiv — NLP / Computation & Language research 6d ago A specialized reasoning large language model for accelerating rare disease diagnosis: a randomized AI physician assistance trial arXiv:2606.24510v1 Announce Type: cross Abstract: Rare diseases affect millions of individuals worldwide, yet timely diagnosis remains a major public health challenge due to scarcity of specialized clinical expertise. While large language models (LLMs) show promise to support… 28 arXiv — NLP / Computation & Language research 6d ago Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering arXiv:2403.04890v4 Announce Type: replace Abstract: In this paper, we propose a modified version of the MedQA-USMLE dataset, named MEDQA-OPEN, which contains open-ended medical questions without options to mimic clinical scenarios, along with clinician-approved reasoned answers.… 7 arXiv — NLP / Computation & Language research 6d ago Benchmarking LLMs' Mathematical Reasoning with Unseen Random Variables Questions arXiv:2501.11790v5 Announce Type: replace Abstract: Recent studies have raised significant concerns regarding the reliability of current mathematics benchmarks, highlighting issues such as simplistic design and potential data contamination. Consequently, developing a reliable… 29 Hugging Face Daily Papers research 6d ago Are Text-to-Image Models Inductivist Turkeys? A Counterfactual Benchmark for Causal Reasoning Abstract Text-to-image models fail to generate counterfactual scenes because they rely on tightly coupled visual-textual patterns rather than causal reasoning, demonstrating limited understanding beyond pattern matching. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Text-to-image… 26 Hugging Face Daily Papers research 6d ago VeriEvol: Scaling Multimodal Mathematical Reasoning via Verifiable Evol-Instruct Abstract A novel framework called VeriEvol is introduced that addresses the challenge of scaling reinforcement learning for visual mathematical reasoning by ensuring reliable reward labels through a two-axis approach that separates prompt difficulty from answer reliability,… 17 Hugging Face Daily Papers research 6d ago Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning Abstract Data-centric approach using curated datasets and minimal GRPO setup significantly improves long-context reasoning in large language models, outperforming prior reinforcement learning methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Long-context reasoning is an… 15 Hugging Face Daily Papers research 6d ago A Verifiable Search Is Not a Learnable Chain-of-Thought Abstract Training models on chain-of-thought demonstrations fails for tasks requiring backtracking search because the forward derivation cannot be faithfully imitated, demonstrating a fundamental limitation in learning search procedures through demonstration. Generated by… 11 Hugging Face Daily Papers research 6d ago Deeper is Not Always Better: Mitigating the Alignment Tax via Confident Layer Decoding Abstract Autoregressive generation in large language models traditionally uses the final layer for token prediction, but a new decoding strategy dynamically selects more reliable intermediate layers based on entropy-guided search, improving reasoning performance with minimal… 34 r/LocalLLaMA community 7d ago Training a Qwen 3.5 4B/9B agent for multi-tool use: SFT first or go directly to RL? To train Qwen 3.5 4B or 9B for a custom multi-tool agent workflow and would appreciate guidance from people who have done this successfully. A few questions: SFT → RL or RL-only? - Is it still recommended to first do supervised fine-tuning (tool-calling traces, reasoning… 15 Hugging Face Daily Papers research 7d ago Dense Reward for Multi-View 3D Reasoning with Global Maps and Local Views Abstract DR-MV3D presents a map-grounded learning framework with dense rewards to improve multi-view 3D visual question answering through global map construction, view-trajectory planning, and egocentric grounding. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-view 3D… 15 Hugging Face Daily Papers research 7d ago Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation Abstract Trajectory-Augmented Policy Optimization (TAPO) enhances large language model reasoning by creating explicit corrective trajectories that preserve erroneous reasoning while incorporating natural-language diagnoses and corrections, outperforming traditional… 31 Hugging Face Daily Papers research 7d ago Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models Abstract Reinforcement learning approaches for improving LLM reasoning capabilities are enhanced by a Bayesian Manifold Curriculum framework that structures problem sampling based on task manifold relationships and endogenous non-stationarity. Generated by… 20 Hacker News — AI on Front Page community 7d ago VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO Article URL: https://arxiv.org/abs/2606.16140 Comments URL: https://news.ycombinator.com/item?id=48639240 Points: 211 # Comments: 85 26 r/LocalLLaMA community 7d ago NEX-N2-mini: "There is no Pareto frontier. I am Pareto". This Qwen3.5-MoE fine tune fixed 3.5 and 3.6 overthinking apparently on my tests. I have been testing all popular MoE for my Mac and it seems I just found gold: 3.5/3.6 level of reasoning (if not slightly superior) at a fraction of the reasoning tokens used (wasted). Dynamic plot with other benchmarks here: https://benchmark-yourself.streamlit.app/… 4 Hugging Face Daily Papers research 7d ago Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion Models Abstract Reflective Masking enables iterative local refinement in Mask Diffusion Models through lightweight post-training, supporting multi-turn reasoning without architectural changes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While reasoning on autoregressive (AR) models is… 26 r/LocalLLaMA community 8d ago 8-16 MI50s Minimax M3 @19 tps TG (peak) TL;DR Speeds are not too ugly for this old 2018 hardware but imo, not very usable for agentic coding (if you compare with qwen3.6 27B on 8 MI50 @ 50 tps TG 800 tps PP). More concerning is that the reasoning output is very very long and still didn’t check about the quality of… 27 r/LocalLLaMA community 9d ago GLM 5.2: 98% of max level intelligence with less than half of tokens usage According to this number of reasoning tokens from GLM 5.1 to GLM 5.2 more than doubled from 16.7k to 36.7k and for me as a local user with old junk Xeon setup this makes GLM 5.2 unusable to the extent where I had to shut down model after 12h of waiting it to respond to my math… 37 r/LocalLLaMA community 10d ago How do I set the right llama.cpp parameters? --n-gpu-layers all --ctx-size 0 --reasoning-budget 0 --presence-penalty 1.1 --repeat-penalty 1.1 How do I figure out the optimal llama.cpp parameters for my setup? llama.cpp + Open WebUI in Docker with an AMD GPU (16GB VRAM) running gemma 4 12b and 26b models. Is it all about… 13 Hugging Face Daily Papers research 10d ago Context-Aware RL for Agentic and Multimodal LLMs Abstract ContextRL enhances long-horizon reasoning and multimodal performance through reinforcement learning that rewards context selection for supporting query-answer pairs, achieving improvements over standard methods on diverse benchmarks. Generated by… 21 r/LocalLLaMA community 10d ago Watching a local AI voice assistant get dumber (A 9B to 0.8B agent experiment on my RTX 5060 Ti) I wanted to find the exact floor for running an intelligent, local voice assistant agent on consumer hardware. I kept the environment, tools, and prompts identical, I stepped the model sizes down through Qwen 3.5 9B, 4B, 2B, and 0.8B to see how agentic reasoning degrades. The… 12 r/LocalLLaMA community 10d ago Has anyone here used VibeThinker-3B outside benchmarks? Just curious, given the hype and benchmark numbers. Curious about real-world behavior: debugging, coding assistance, reasoning over messy prompts, local latency, failure modes, and whether it actually feels useful versus just optimized for verifiable evals.… 23 arXiv — NLP / Computation & Language research 11d ago Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models arXiv:2606.19404v1 Announce Type: cross Abstract: Hallucination detection in large language models (LLMs) is deployment-critical, and recent work shows that the spectrum of attention-derived graph Laplacians carries strong signal about reasoning quality. Prior spectral… 15 arXiv — Machine Learning research 11d ago Concept Flow Models: Anchoring Concept-Based Reasoning with Hierarchical Bottlenecks arXiv:2606.19489v1 Announce Type: new Abstract: Concept Bottleneck Models (CBMs) enhance interpretability by projecting learned features into a human-understandable concept space. Recent approaches leverage vision-language models to generate concept embeddings, reducing the need… 8 arXiv — Machine Learning research 11d ago Hard or Just Unreached? Diagnosing the Sampling Blind Spot in Math-Reasoning Difficulty Estimation arXiv:2606.19636v1 Announce Type: new Abstract: Math and science reasoning benchmarks rely on pass@k, the fraction of sampled chains that reach gold, as the canonical per-example difficulty signal. The same signal drives RL with verifiable rewards, math data curation, synthetic… 20 arXiv — NLP / Computation & Language research 11d ago Efficiently Representing Algorithms With Chain-of-Thought Transformers arXiv:2606.19697v1 Announce Type: cross Abstract: The increasing popularity of \emph{reasoning} models -- language models that output a series of reasoning or thought tokens before producing an answer -- is justified, in part, by theoretical results showing that chain-of-thought… 9 arXiv — NLP / Computation & Language research 11d ago Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models arXiv:2606.19750v1 Announce Type: cross Abstract: Reinforcement learning (RL) is a central approach for improving reasoning capabilities in large language models (LLMs), where training efficiency depends critically on how problems are sampled during optimization. Existing… 15 arXiv — Machine Learning research 11d ago ADaPT: Token-Level Decoupling for Efficient Large Reasoning Models arXiv:2606.19919v1 Announce Type: new Abstract: Large reasoning models rely on long chain-of-thought to achieve strong performance, but applying such reasoning uniformly incurs high computational cost. Existing efficiency-oriented methods attempt to shorten or mix reasoning… 11 arXiv — Machine Learning research 11d ago VIMPO: Value-Implicit Policy Optimization for LLMs arXiv:2606.20008v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards has become a central tool for improving the reasoning ability of large language models, but current methods face a trade-off between simplicity and credit assignment. Group-relative… 6 arXiv — NLP / Computation & Language research 11d ago What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis arXiv:2606.20075v1 Announce Type: cross Abstract: Latent Chain-of-Thought (CoT) internalizes reasoning within continuous hidden states, offering a promising alternative to verbose discrete reasoning traces. However, robust latent reasoning remains difficult because outcome… 36 arXiv — NLP / Computation & Language research 11d ago Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models arXiv:2606.19350v1 Announce Type: new Abstract: Large language models (LLMs) excel at multi-step reasoning but incur substantial inference cost. We introduce Causal Attribution Pruning (CAP), a training-free method that identifies critical attention heads by measuring their… 34 arXiv — NLP / Computation & Language research 11d ago Detecting Hallucinations for Large Language Model-based Knowledge Graph Reasoning arXiv:2606.19351v1 Announce Type: new Abstract: Knowledge graph (KG) reasoning infers new knowledge from existing facts and is widely applied in question answering, recommendation, and decision support. With the rapid development of large language models (LLMs), LLM-based KG… 25 arXiv — NLP / Computation & Language research 11d ago Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling arXiv:2606.19354v1 Announce Type: new Abstract: Test-time scaling (TTS) has emerged as a powerful paradigm for improving the reasoning performance of large language models (LLMs) by investing additional compute at inference time. A central component of TTS is the… 5 arXiv — NLP / Computation & Language research 11d ago Where Does Social Reasoning Come From? Capability Provenance in Language Models arXiv:2606.19625v1 Announce Type: new Abstract: We use training-data attribution as an interpretable tool for capability discovery, mapping which regions of the pretraining corpus support social-reasoning versus STEM-reasoning in OLMo3-7B. Training-data attribution measures how… 9 arXiv — NLP / Computation & Language research 11d ago Clusters are All You Need: Pre-Training the Tsetlin Machine with Semantic Clusters from Language Models for Interpretability arXiv:2606.19815v1 Announce Type: new Abstract: Pre-trained language models such as BERT achieve strong text classification performance but lack transparency, limiting their use in high-stakes settings. The Tsetlin Machine (TM) offers fully interpretable, clause-based reasoning… 25 arXiv — NLP / Computation & Language research 11d ago AtomMem: Building Simple and Effective Memory System for LLM Agents via Atomic Facts arXiv:2606.19847v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate strong reasoning and generation abilities, but their fixed context windows limit long-term information accumulation and reuse across multi-session interactions. Existing memory-augmented… 32 arXiv — NLP / Computation & Language research 11d ago GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMs arXiv:2606.19946v1 Announce Type: new Abstract: Activation steering controls model behavior by modifying intermediate hidden states at inference time without retraining. Existing methods handle only single-direction injection; when multiple semantic directions are superposed… 16 arXiv — NLP / Computation & Language research 11d ago MedRLM: Recursive Multimodal Health Intelligence for Long-Context Clinical Reasoning, Sensor-Guided Screening, Evidence-Grounded Decision Support, and Community-to-Tertiary Referral Optimization arXiv:2606.20164v1 Announce Type: new Abstract: Real-world clinical decision support requires reasoning over heterogeneous and longitudinal patient information rather than answering isolated medical questions. However, current medical large language models and… 29 arXiv — NLP / Computation & Language research 11d ago Think Again or Think Longer? Selective Verification for Budget-Aware Reasoning arXiv:2606.19808v1 Announce Type: cross Abstract: Test-time reasoning is increasingly used as a serving-time control knob, but extra reasoning is not uniformly valuable: it can repair failed attempts, waste compute on already-correct answers, or introduce harmful answer changes.… 25 arXiv — NLP / Computation & Language research 11d ago Med-R2: Perception and Reflection-driven Complex Reasoning for Medical Report Generation arXiv:2504.02885v2 Announce Type: replace Abstract: Automated medical report generation (MRG) is increasingly used to reduce the burden of manual reporting and for decision support. Large vision-language models (LVLMs) hold great promise for automated MRG due to their… 29 Hugging Face Daily Papers research 11d ago S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence Abstract S-Agent is a spatial reasoning framework that enhances visual language models with temporal memory and hierarchical spatial tools to enable continuous 3D world understanding from multi-view imagery. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Real-world spatial… 28 Hugging Face Daily Papers research 11d ago Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance Abstract A lightweight image inpainting framework achieves high-fidelity results with significantly reduced parameters and inference time through novel local-global interaction blocks and adaptive distillation strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While… 35 Hugging Face Daily Papers research 11d ago Thinking with Visual Grounding Abstract Visually grounded thinking integrates natural-language reasoning with explicit visual evidence grounding in vision-language models, improving reasoning accuracy through scalable synthesis and reinforcement learning techniques. Generated by… 34 Hugging Face Daily Papers research 11d ago REVES: REvision and VErification--Augmented Training for Test-Time Scaling Abstract A two-stage iterative framework alternates between data augmentation and policy optimization to improve LLM reasoning by leveraging intermediate correction steps, achieving superior performance on coding benchmarks and constraint satisfaction problems. Generated by… 23 TechCrunch — AI news-outlet 11d ago General Intuition in talks to raise $300M at around $2B valuation General Intuition is in talks to raise around $300 million at a roughly $2 billion valuation from backers including Jeff Bezos. The startup trains AI agents on spatial-temporal reasoning. 14 Hugging Face Daily Papers research 11d ago From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning Abstract A framework automates environment redesign in reinforcement learning for large language models by having the policy analyze failures and suggest configuration changes, achieving superior performance over larger proprietary models and fixed-environment baselines.… 6 OpenAI official-blog 11d ago Improving health intelligence in ChatGPT Learn how GPT-5.5 Instant improves ChatGPT’s health and wellness responses with stronger reasoning, better context, clearer communication, and physician-informed evaluations. 7 OpenAI official-blog 11d ago Using AI to help physicians diagnose rare genetic diseases affecting children Researchers used an OpenAI reasoning model to help diagnose rare diseases, identifying 18 new diagnoses in previously unsolved cases. 17 Hugging Face Daily Papers research 11d ago SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks Abstract SciOrch is a framework that uses a lightweight orchestrator model to coordinate multiple frontier LLMs for scientific reasoning, achieving superior performance through MCTS-based training and GRPO-style optimization while reducing API costs. Generated by… 31 Hugging Face Daily Papers research 12d ago Native Active Perception as Reasoning for Omni-Modal Understanding Abstract OmniAgent is a novel omni-modal agent that addresses long video understanding by using an iterative observation-thought-action cycle with active perception, achieving superior performance compared to larger models through efficient selective processing. Generated by… 24 Hugging Face Daily Papers research 12d ago Reinforcing Dual-Path Reasoning in Spatial Vision Language Models Abstract A unified framework for spatial vision-language models that combines linguistic deduction and 3D geometric reasoning through reinforcement learning, enabling robust spatial reasoning across diverse tasks and domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spatial… 9 Page 3 of 10 · 500 articles ← Newer Older →