News / #reasoning Tag Reasoning 72 articles archived under #reasoning · RSS Sign in to follow r/LocalLLaMA community 3h ago sensenova/SenseNova-U1-A3B-MoT · Hugging Face SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture 🚀 SenseNova U1 is a new series of native multimodal models that unifies multimodal understanding, reasoning, and generation within a monolithic architecture. It marks a fundamental… 37 llama.cpp releases dev-tools 5h ago b9133 server, webui: support continue generation on reasoning models ( #22727 ) server, webui : support continue generation on reasoning models ( #22727 ) Remove the throw blocking assistant prefill on reasoning models and orchestrate thinking tags around the prefilled message so the… 27 r/LocalLLaMA community 9h ago server, webui: support continue generation on reasoning models by ServeurpersoCom · Pull Request #22727 · ggml-org/llama.cpp now you can CONTINUE   submitted by   /u/jacek2023 [link]   [comments] 17 arXiv — Machine Learning research 15h ago LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models arXiv:2605.11011v1 Announce Type: new Abstract: Looped computation shows promise in improving the reasoning-oriented performance of LLMs by scaling test-time compute. However, existing approaches typically require either training recurrent models from scratch or applying… 37 arXiv — Machine Learning research 15h ago Efficient LLM Reasoning via Variational Posterior Guidance with Efficiency Awareness arXiv:2605.11019v1 Announce Type: new Abstract: Although large language models rely on chain-of-thought for complex reasoning, the overthinking phenomenon severely degrades inference efficiency. Existing reinforcement learning methods compress reasoning chains by designing… 23 arXiv — Machine Learning research 15h ago Latent Chain-of-Thought Improves Structured-Data Transformers arXiv:2605.11262v1 Announce Type: new Abstract: Chain-of-thought and more broadly test-time compute are known to augment the expressive capabilities of language models and have led to major innovations in reasoning. Motivated by this success, this paper explores latent… 24 arXiv — Machine Learning research 15h ago Drop the Act: Probe-Filtered RL for Faithful Chain-of-Thought Reasoning arXiv:2605.11467v1 Announce Type: new Abstract: Reasoning models post-hoc rationalize answers they have already committed to internally, producing chains of *reasoning theater*: deliberative-looking steps that contribute nothing to correctness. This wastes inference tokens,… 7 arXiv — Machine Learning research 15h ago Understanding and Preventing Entropy Collapse in RLVR with On-Policy Entropy Flow Optimization arXiv:2605.11491v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has become an effective paradigm for improving the reasoning ability of large language models. However, widely used RLVR algorithms, such as GRPO, often suffer from entropy… 12 arXiv — NLP / Computation & Language research 15h ago ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV arXiv:2605.11143v1 Announce Type: new Abstract: Reasoning benchmarks measure clinical performance on clean inputs. We evaluate the step before reasoning: retrieval over real EHR notes, where negation, temporality, and family-versus-patient attribution can flip a correct answer… 27 arXiv — NLP / Computation & Language research 15h ago An Empirical Study of Automating Agent Evaluation arXiv:2605.11378v1 Announce Type: new Abstract: Agent evaluation requires assessing complex multi-step behaviors involving tool use and intermediate reasoning, making it costly and expertise-intensive. A natural question arises: can frontier coding assistants reliably automate… 5 arXiv — NLP / Computation & Language research 15h ago Deep Reasoning in General Purpose Agents via Structured Meta-Cognition arXiv:2605.11388v1 Announce Type: new Abstract: Humans intuitively solve complex problems by flexibly shifting among reasoning modes: they plan, execute, revise intermediate goals, resolve ambiguity through associative judgment, and apply formal procedures to well-specified… 5 arXiv — NLP / Computation & Language research 15h ago Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting arXiv:2605.11538v1 Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) has emerged as a promising approach for improving the reasoning capabilities of large language models. However, it struggles to effectively balance the tradeoff between exploration and… 23 arXiv — NLP / Computation & Language research 15h ago OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models arXiv:2605.11629v1 Announce Type: new Abstract: Recent multimodal large language models (MLLMs) have shown strong chain-of-thought (CoT) reasoning ability on vision-language tasks, but their direct deployment in real-world systems is often limited by latency and resource… 38 arXiv — NLP / Computation & Language research 15h ago YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning arXiv:2605.11906v1 Announce Type: new Abstract: Preference optimization has become an important post-training paradigm for improving the reasoning abilities of large language models. Existing methods typically rely on externally constructed preference data, using preferred and… 31 arXiv — NLP / Computation & Language research 15h ago Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models arXiv:2605.12227v1 Announce Type: new Abstract: Adapting large language models (LLMs) to long-context tasks requires post-training methods that remain accurate and coherent over thousands of tokens. Existing approaches are limited in several ways: 1) off-policy methods such as… 12 arXiv — NLP / Computation & Language research 15h ago MedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answering arXiv:2605.12361v1 Announce Type: new Abstract: Evaluating large language models (LLMs) in the biomedical domain requires benchmarks that can distinguish reasoning from pattern matching and remain discriminative as model capabilities improve. Existing biomedical question… 6 arXiv — NLP / Computation & Language research 15h ago Scalable Token-Level Hallucination Detection in Large Language Models arXiv:2605.12384v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated remarkable capabilities, but they still frequently produce hallucinations. These hallucinations are difficult to detect in reasoning-intensive tasks, where the content appears coherent… 35 arXiv — NLP / Computation & Language research 15h ago ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging arXiv:2605.12419v1 Announce Type: new Abstract: Despite the rapid advancements in large language model (LLM) development, fine-tuning them for specific tasks often results in the catastrophic forgetting of their general, language-based reasoning abilities. This work investigates… 24 arXiv — NLP / Computation & Language research 15h ago Unlocking LLM Creativity in Science through Analogical Reasoning arXiv:2605.11258v1 Announce Type: cross Abstract: Autonomous science promises to augment scientific discovery, particularly in complex fields like biomedicine. However, this requires AI systems that can consistently generate novel and diverse solutions to open-ended problems. We… 22 arXiv — NLP / Computation & Language research 15h ago LatentRouter: Can We Choose the Right Multimodal Model Before Seeing Its Answer? arXiv:2605.11301v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) have heterogeneous strengths across OCR, chart understanding, spatial reasoning, visual question answering, cost, and latency. Effective MLLM routing therefore requires more than… 24 arXiv — NLP / Computation & Language research 15h ago Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models arXiv:2605.11374v1 Announce Type: cross Abstract: Test-time compute is widely believed to benefit only large reasoning models. We show it also helps small embedding models. Most modern embedding checkpoints are distilled from large LLM backbones and inherit their representation… 21 arXiv — NLP / Computation & Language research 15h ago fg-expo: Frontier-guided exploration-prioritized policy optimization via adaptive kl and gaussian curriculum arXiv:2605.11403v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become the standard paradigm for LLM mathematical reasoning, with Group Relative Policy Optimization (GRPO) serving as the dominant algorithm. We identify two overlooked… 38 arXiv — NLP / Computation & Language research 15h ago Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning arXiv:2605.11458v1 Announce Type: cross Abstract: On-policy self-distillation has become a strong recipe for LLM reasoning, where a privileged teacher supervises the student's own rollouts while conditioning on the reference solution. A design choice shared by nearly all such… 28 Simon Willison community 1d ago llm 0.32a2 Release: llm 0.32a2 A bunch of useful stuff in this LLM alpha, but the most important detail is this one: Most reasoning-capable OpenAI models now use the /v1/responses endpoint instead of /v1/chat/completions . This enables interleaved reasoning across tool calls for GPT-5… 22 Smol AI News news-outlet 1d ago not much happened today **Research-level reasoning benchmarks** are advancing with **439 new math problems** from **64 mathematicians** and expanded medical benchmarks in **Medmarks v1.0** covering **30 benchmarks** and **61 models**. **Google DeepMind's AI Co-Mathematician** achieves **48% on… 15 NVIDIA Developer Blog official-blog 5d ago Streaming Tokens and Tools: Multi-Turn Agentic Harness Support in NVIDIA Dynamo An agentic exchange must preserve a structured interaction: assistant turns interleave reasoning with one or more tool calls, and subsequent user turns return... 11 Smol AI News news-outlet 6d ago GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs **OpenAI** released **GPT-Realtime-2**, a voice model with **GPT-5-class reasoning**, tool use, interruption handling, and extended context windows up to **128K tokens**, achieving top scores on **Big Bench Audio** and **Conversational Dynamics** benchmarks. They also launched a… 21 MIT News — AI research 7d ago Games people — and machines — play: Untangling strategic reasoning to advance AI Assistant Professor Gabriele Farina mines the foundations of decision-making in complex multi-agent scenarios. 33 NVIDIA Developer Blog official-blog 8d ago How to Build In-Vehicle AI Agents with NVIDIA: From Cloud to Car The automotive cockpit is undergoing a fundamental shift from rule-based interfaces to agentic, multimodal AI systems capable of reasoning, planning, and... 10 NVIDIA Developer Blog official-blog 14d ago Powering AI Factories with NVIDIA Enterprise Reference Architectures The next wave of enterprise productivity is being built on AI factories. As organizations deploy agentic AI systems capable of reasoning, automation, and... 23 NVIDIA Developer Blog official-blog 15d ago NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single Efficient Open Model Agentic systems often reason across screens, documents, audio, video, and text within a single perception‑to‑action loop. However, they still rely on... 7 Smol AI News news-outlet 19d ago DeepSeek v4 **DeepSeek-V4** technical release features a **1.6T-parameter MoE with 49B active parameters** and **1M-token context**, showcasing hybrid attention and compressed KV schemes for major memory reductions. It ranks as the **#2 open-weights reasoning model** behind **Kimi K2.6**… 13 Vercel — AI dev-tools 20d ago Deepseek V4 on AI Gateway DeepSeek V4 is now available on Vercel AI Gateway . There are 2 model variants: DeepSeek V4 Pro and DeepSeek V4 Flash. A 1M token context window is the default across both models. DeepSeek V4 Pro focuses on agentic coding, formal mathematical reasoning, and long-horizon… 27 MIT News — AI research 21d ago Teaching AI models to say “I’m not sure” A new training method improves the reliability of AI confidence estimates without sacrificing performance, addressing a root cause of hallucination in reasoning models. 34 Smol AI News news-outlet 21d ago not much happened today **Alibaba** released **Qwen3.6-27B**, a dense, Apache 2.0 open coding model with thinking and non-thinking modes, outperforming the larger Qwen3.5-397B-A17B on multiple coding benchmarks including SWE-bench and Terminal-Bench. It supports native vision-language reasoning over… 15 OpenAI news 22d ago Introducing ChatGPT Images 2.0 ChatGPT Images 2.0 introduces a state-of-the-art image generation model with improved text rendering, multilingual support, and advanced visual reasoning. 9 NVIDIA Developer Blog official-blog 22d ago Run High-Throughput Reinforcement Learning Training with End-to-End FP8 Precision As LLMs transition from simple text generation to complex reasoning, reinforcement learning (RL) plays a central role. Algorithms like Group Relative Policy... 31 Smol AI News news-outlet 26d ago not much happened today **Anthropic** launched **Claude Design**, a prototyping tool powered by **Claude Opus 4.7**, targeting design workflows and competing with **Figma** and others. Benchmarks show **Opus 4.7** leading in coding and text tasks, with improved efficiency and adaptive reasoning, though… 7 Smol AI News news-outlet 27d ago Anthropic's Claude Opus 4.7 **Anthropic** launched **Claude Opus 4.7**, its most capable Opus model yet, featuring stronger coding and agentic performance, a new tokenizer, and improved long-context handling with a new **xhigh** reasoning tier. Benchmarks show substantial gains, including **SWE-bench Pro… 37 OpenAI news 27d ago Introducing GPT-Rosalind for life sciences research OpenAI introduces GPT-Rosalind, a frontier reasoning model built to accelerate drug discovery, genomics analysis, protein reasoning, and scientific research workflows. 32 Marcus on AI community 1mo ago Even more good news for the future of neurosymbolic AI And vindication for Apple’s unfairly maligned 2025 reasoning paper 36 Smol AI News news-outlet 1mo ago not much happened today **Meta Superintelligence Labs** launched **Muse Spark**, a natively multimodal reasoning model featuring tool use, visual chain of thought, and multi-agent orchestration. It is live on **meta.ai** and the Meta AI app with a private API preview and plans for open-sourcing future… 29 Smol AI News news-outlet 1mo ago not much happened today **Gemma 4** was launched by **Google** under an **Apache 2.0 license**, marking a significant open-model release focused on **reasoning, agentic workflows, multimodality, and on-device use**. It outperforms models 10x larger and has immediate ecosystem support including… 35 Vercel — AI dev-tools 1mo ago Qwen 3.6 Plus on AI Gateway Qwen 3.6 Plus from Alibaba is now available on Vercel AI Gateway . Compared to Qwen 3.5 Plus, this model adds stronger agentic coding capabilities, from frontend development to repository-level problem solving, along with improved multimodal perception and reasoning. It features… 19 Smol AI News news-outlet 1mo ago not much happened today **Anthropic** is reportedly introducing a new AI model tier called **Capybara**, which is larger and more intelligent than **Claude Opus 4.6**, showing improved performance in coding, academic reasoning, and cybersecurity. The model is speculated to be around **10 trillion… 38 NVIDIA Developer Blog official-blog 1mo ago Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and Safety Agentic AI is an ecosystem where specialized models work together to handle planning, reasoning, retrieval, and safety guardrailing. As these systems scale,... 37 Smol AI News news-outlet 1mo ago not much happened today **ARC-AGI-3** benchmark introduced by **@arcprize** and **François Chollet** resets the frontier for general agentic reasoning with humans solving 100% of tasks versus under 1% for current models, focusing on zero-preparation generalization and human-like learning efficiency.… 4 NVIDIA Developer Blog official-blog 1mo ago How NVIDIA Dynamo 1.0 Powers Multi-Node Inference at Production Scale Reasoning models are growing rapidly in size and are increasingly being integrated into agentic AI workflows that interact with other models and external tools.... 6 NVIDIA Developer Blog official-blog 1mo ago NVIDIA Vera CPU Delivers High Performance, Bandwidth, and Efficiency for AI Factories AI is evolving, and reasoning models are increasing token demand, placing new requirements on every layer of AI infrastructure. More than ever, compute must... 11 NVIDIA Developer Blog official-blog 1mo ago NVIDIA Vera Rubin POD: Seven Chips, Five Rack-Scale Systems, One AI Supercomputer Artificial intelligence is token-driven. Every prompt, reasoning step, and agent interaction generates tokens. Over the past year, token consumption has grown... 33 NVIDIA Developer Blog official-blog 2mo ago Scale Synthetic Data and Physical AI Reasoning with NVIDIA Cosmos World Foundation Models The next generation of AI-driven robots like humanoids and autonomous vehicles depends on high-fidelity, physics-aware training data. Without diverse and... 34 NVIDIA Developer Blog official-blog 2mo ago Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning Agentic AI systems need models with the specialized depth to solve dense technical problems autonomously. They must excel at reasoning, coding, and long-context... 6 NVIDIA Developer Blog official-blog 2mo ago Building Telco Reasoning Models for Autonomous Networks with NVIDIA NeMo Autonomous networks are quickly becoming one of the top priorities in telecommunications. According to the latest NVIDIA State of AI in Telecommunications... 25 Smol AI News news-outlet 2mo ago Gemini 3.1 Pro: 2x 3.0 on ARC-AGI 2 **Google** released **Gemini 3.1 Pro**, a developer preview integrated across the **Gemini app**, **NotebookLM**, **Gemini API / AI Studio**, and **Vertex AI**, highlighting a significant reasoning improvement with **ARC-AGI-2 = 77.1%** and strong coding and agentic-tool… 10 Smol AI News news-outlet 2mo ago Claude Sonnet 4.6: clean upgrade of 4.5, mostly better with some caveats **Anthropic** launched **Claude Sonnet 4.6**, an upgrade over Sonnet 4.5, featuring broad improvements in **coding, long-context reasoning, agent planning, knowledge work, and design**, plus a **1M-token context window (beta)**. Benchmarks show Sonnet 4.6 leading on **GDPval-AA… 4 Smol AI News news-outlet 3mo ago new Gemini 3 Deep Think, Anthropic $30B @ $380B, GPT-5.3-Codex Spark, MiniMax M2.5 **Google DeepMind** is rolling out the upgraded **Gemini 3 Deep Think V2** reasoning mode to **Google AI Ultra** subscribers and opening early access to the **Vertex AI / Gemini API** for select users. Key benchmark achievements include **ARC-AGI-2 at 84.6%**, **Humanity’s Last… 31 Ahead of AI (Sebastian Raschka) research 3mo ago Categories of Inference-Time Scaling for Improved LLM Reasoning And an Overview of Recent Inference-Scaling Papers 11 Hugging Face official-blog 4mo ago NVIDIA Cosmos Reason 2 Brings Advanced Reasoning To Physical AI Back to Articles NVIDIA Cosmos Reason 2 Brings Advanced Reasoning To Physical AI Enterprise + Article Published January 5, 2026 Upvote 64 Tsung-Yi Lin tsungyi nvidia Debraj Sinha debrajsinha nvidia NVIDIA today released Cosmos Reason 2 , the latest advancement in open, reasoning… 17 Smol AI News news-outlet 4mo ago not much happened today **Zhipu AI's GLM-4.7** release marks a significant improvement in **coding, complex reasoning, and tool use**, quickly gaining ecosystem adoption via Hugging Face and OpenRouter. **Xiaomi's MiMo-V2-Flash** is highlighted as a practical, cost-efficient mixture-of-experts model… 30 Hugging Face official-blog 5mo ago DeepMath: A lightweight math reasoning Agent with smolagents Back to Articles DeepMath: A lightweight math reasoning Agent with smolagents Published December 4, 2025 Update on GitHub Upvote 40 Daniel Fleischer danf Intel Moshe Berchansky mber Intel Moshe Wasserblat moshew Intel By Intel AI Software Group DeepMath is an aligned math… 22 Hugging Face official-blog 5mo ago Apriel-H1: The Surprising Key to Distilling Efficient Reasoning Models Back to Articles Apriel-H1: The Surprising Key to Distilling Efficient Reasoning Models Enterprise Article Published November 19, 2025 Upvote 34 Torsten Scholak tscholak ServiceNow-AI Oleksiy Ostapenko ostapeno ServiceNow-AI Raymond Li RaymondLi ServiceNow-AI Luke Kumar… 17 Google DeepMind official-blog 11mo ago Gemini 2.5: Updates to our family of thinking models Explore the latest Gemini 2.5 model updates with enhanced performance and accuracy: Gemini 2.5 Pro now stable, Flash generally available, and the new Flash-Lite in preview. 32 Google DeepMind official-blog 11mo ago Gemini 2.5: Our most intelligent models are getting even better Gemini 2.5 Pro continues to be loved by developers as the best model for coding, and 2.5 Flash is getting even better with a new update. We’re bringing new capabilities to our models, including Deep Think, an experimental enhanced reasoning mode for 2.5 Pro. 34 Lil'Log (Lilian Weng) research 12mo ago Why We Think Special thanks to John Schulman for a lot of super valuable feedback and direct edits on this post. Test time compute ( Graves et al. 2016 , Ling, et al. 2017 , Cobbe et al. 2021 ) and Chain-of-thought (CoT) ( Wei et al. 2022 , Nye et al. 2021 ), have led to significant… 25 Ahead of AI (Sebastian Raschka) research 12mo ago The State of Reinforcement Learning for LLM Reasoning Understanding GRPO and New Insights from Reasoning Model Papers 25 Google DeepMind official-blog 13mo ago Introducing Gemini 2.5 Flash Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off. 23 Ahead of AI (Sebastian Raschka) research 13mo ago First Look at Reasoning From Scratch: Chapter 1 Welcome to the next stage of large language models (LLMs): reasoning. LLMs have transformed how we process and generate text, but their success has been largely driven by statistical pattern recognition. However, new advances in reasoning methodologies now enable LLMs to tackle… 25 Ahead of AI (Sebastian Raschka) research 14mo ago The State of LLM Reasoning Model Inference Inference-Time Compute Scaling Methods to Improve Reasoning Models 26 Ahead of AI (Sebastian Raschka) research 15mo ago Understanding Reasoning LLMs Methods and Strategies for Building and Refining Reasoning Models 26 Maarten Grootendorst research 15mo ago A Visual Guide to Reasoning LLMs Exploring Test-Time Compute Techniques and DeepSeek-R1 9 Nonint (James Betker) research 16mo ago Beating ARC the hard way ARC is benchmark developed to test out of distribution reasoning and common sense in general solvers. It is specifically designed to be: Easily solvable by most humans Not amenable to any kind of brute-force solvers (e.g. try every permutation of a solution) Not able to be… 4 Eugene Yan research 23mo ago Prompting Fundamentals and How to Apply them Effectively Structured input/output, prefilling, n-shots prompting, chain-of-thought, reducing hallucinations, etc. 20