Tag

Reasoning

72 articles archived under #reasoning · RSS

r/LocalLLaMA community 3h ago

sensenova/SenseNova-U1-A3B-MoT · Hugging Face

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture 🚀 SenseNova U1 is a new series of native multimodal models that unifies multimodal understanding, reasoning, and generation within a monolithic architecture. It marks a fundamental…

37
llama.cpp releases dev-tools 5h ago

b9133

server, webui: support continue generation on reasoning models ( #22727 ) server, webui : support continue generation on reasoning models ( #22727 ) Remove the throw blocking assistant prefill on reasoning models and orchestrate thinking tags around the prefilled message so the…

27
r/LocalLLaMA community 9h ago

server, webui: support continue generation on reasoning models by ServeurpersoCom · Pull Request #22727 · ggml-org/llama.cpp

now you can CONTINUE   submitted by   /u/jacek2023 [link]   [comments]

17
arXiv — Machine Learning research 15h ago

LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models

arXiv:2605.11011v1 Announce Type: new Abstract: Looped computation shows promise in improving the reasoning-oriented performance of LLMs by scaling test-time compute. However, existing approaches typically require either training recurrent models from scratch or applying…

37
arXiv — Machine Learning research 15h ago

Efficient LLM Reasoning via Variational Posterior Guidance with Efficiency Awareness

arXiv:2605.11019v1 Announce Type: new Abstract: Although large language models rely on chain-of-thought for complex reasoning, the overthinking phenomenon severely degrades inference efficiency. Existing reinforcement learning methods compress reasoning chains by designing…

23
arXiv — Machine Learning research 15h ago

Latent Chain-of-Thought Improves Structured-Data Transformers

arXiv:2605.11262v1 Announce Type: new Abstract: Chain-of-thought and more broadly test-time compute are known to augment the expressive capabilities of language models and have led to major innovations in reasoning. Motivated by this success, this paper explores latent…

24
arXiv — Machine Learning research 15h ago

Drop the Act: Probe-Filtered RL for Faithful Chain-of-Thought Reasoning

arXiv:2605.11467v1 Announce Type: new Abstract: Reasoning models post-hoc rationalize answers they have already committed to internally, producing chains of *reasoning theater*: deliberative-looking steps that contribute nothing to correctness. This wastes inference tokens,…

7
arXiv — Machine Learning research 15h ago

Understanding and Preventing Entropy Collapse in RLVR with On-Policy Entropy Flow Optimization

arXiv:2605.11491v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has become an effective paradigm for improving the reasoning ability of large language models. However, widely used RLVR algorithms, such as GRPO, often suffer from entropy…

12
arXiv — NLP / Computation & Language research 15h ago

ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV

arXiv:2605.11143v1 Announce Type: new Abstract: Reasoning benchmarks measure clinical performance on clean inputs. We evaluate the step before reasoning: retrieval over real EHR notes, where negation, temporality, and family-versus-patient attribution can flip a correct answer…

27
arXiv — NLP / Computation & Language research 15h ago

An Empirical Study of Automating Agent Evaluation

arXiv:2605.11378v1 Announce Type: new Abstract: Agent evaluation requires assessing complex multi-step behaviors involving tool use and intermediate reasoning, making it costly and expertise-intensive. A natural question arises: can frontier coding assistants reliably automate…

5
arXiv — NLP / Computation & Language research 15h ago

Deep Reasoning in General Purpose Agents via Structured Meta-Cognition

arXiv:2605.11388v1 Announce Type: new Abstract: Humans intuitively solve complex problems by flexibly shifting among reasoning modes: they plan, execute, revise intermediate goals, resolve ambiguity through associative judgment, and apply formal procedures to well-specified…

5
arXiv — NLP / Computation & Language research 15h ago

Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting

arXiv:2605.11538v1 Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) has emerged as a promising approach for improving the reasoning capabilities of large language models. However, it struggles to effectively balance the tradeoff between exploration and…

23
arXiv — NLP / Computation & Language research 15h ago

OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models

arXiv:2605.11629v1 Announce Type: new Abstract: Recent multimodal large language models (MLLMs) have shown strong chain-of-thought (CoT) reasoning ability on vision-language tasks, but their direct deployment in real-world systems is often limited by latency and resource…

38
arXiv — NLP / Computation & Language research 15h ago

YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning

arXiv:2605.11906v1 Announce Type: new Abstract: Preference optimization has become an important post-training paradigm for improving the reasoning abilities of large language models. Existing methods typically rely on externally constructed preference data, using preferred and…

31
arXiv — NLP / Computation & Language research 15h ago

Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

arXiv:2605.12227v1 Announce Type: new Abstract: Adapting large language models (LLMs) to long-context tasks requires post-training methods that remain accurate and coherent over thousands of tokens. Existing approaches are limited in several ways: 1) off-policy methods such as…

12
arXiv — NLP / Computation & Language research 15h ago

MedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answering

arXiv:2605.12361v1 Announce Type: new Abstract: Evaluating large language models (LLMs) in the biomedical domain requires benchmarks that can distinguish reasoning from pattern matching and remain discriminative as model capabilities improve. Existing biomedical question…

6
arXiv — NLP / Computation & Language research 15h ago

Scalable Token-Level Hallucination Detection in Large Language Models

arXiv:2605.12384v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated remarkable capabilities, but they still frequently produce hallucinations. These hallucinations are difficult to detect in reasoning-intensive tasks, where the content appears coherent…

35
arXiv — NLP / Computation & Language research 15h ago

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

arXiv:2605.12419v1 Announce Type: new Abstract: Despite the rapid advancements in large language model (LLM) development, fine-tuning them for specific tasks often results in the catastrophic forgetting of their general, language-based reasoning abilities. This work investigates…

24
arXiv — NLP / Computation & Language research 15h ago

Unlocking LLM Creativity in Science through Analogical Reasoning

arXiv:2605.11258v1 Announce Type: cross Abstract: Autonomous science promises to augment scientific discovery, particularly in complex fields like biomedicine. However, this requires AI systems that can consistently generate novel and diverse solutions to open-ended problems. We…

22
arXiv — NLP / Computation & Language research 15h ago

LatentRouter: Can We Choose the Right Multimodal Model Before Seeing Its Answer?

arXiv:2605.11301v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) have heterogeneous strengths across OCR, chart understanding, spatial reasoning, visual question answering, cost, and latency. Effective MLLM routing therefore requires more than…

24
arXiv — NLP / Computation & Language research 15h ago

Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models

arXiv:2605.11374v1 Announce Type: cross Abstract: Test-time compute is widely believed to benefit only large reasoning models. We show it also helps small embedding models. Most modern embedding checkpoints are distilled from large LLM backbones and inherit their representation…

21
arXiv — NLP / Computation & Language research 15h ago

fg-expo: Frontier-guided exploration-prioritized policy optimization via adaptive kl and gaussian curriculum

arXiv:2605.11403v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become the standard paradigm for LLM mathematical reasoning, with Group Relative Policy Optimization (GRPO) serving as the dominant algorithm. We identify two overlooked…

38
arXiv — NLP / Computation & Language research 15h ago

Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

arXiv:2605.11458v1 Announce Type: cross Abstract: On-policy self-distillation has become a strong recipe for LLM reasoning, where a privileged teacher supervises the student's own rollouts while conditioning on the reference solution. A design choice shared by nearly all such…

28
Simon Willison community 1d ago

llm 0.32a2

Release: llm 0.32a2 A bunch of useful stuff in this LLM alpha, but the most important detail is this one: Most reasoning-capable OpenAI models now use the /v1/responses endpoint instead of /v1/chat/completions . This enables interleaved reasoning across tool calls for GPT-5…

22
Smol AI News news-outlet 1d ago

not much happened today

**Research-level reasoning benchmarks** are advancing with **439 new math problems** from **64 mathematicians** and expanded medical benchmarks in **Medmarks v1.0** covering **30 benchmarks** and **61 models**. **Google DeepMind's AI Co-Mathematician** achieves **48% on…

15
NVIDIA Developer Blog official-blog 5d ago

Streaming Tokens and Tools: Multi-Turn Agentic Harness Support in NVIDIA Dynamo

An agentic exchange must preserve a structured interaction: assistant turns interleave reasoning with one or more tool calls, and subsequent user turns return...

11
Smol AI News news-outlet 6d ago

GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs

**OpenAI** released **GPT-Realtime-2**, a voice model with **GPT-5-class reasoning**, tool use, interruption handling, and extended context windows up to **128K tokens**, achieving top scores on **Big Bench Audio** and **Conversational Dynamics** benchmarks. They also launched a…

21
MIT News — AI research 7d ago

Games people — and machines — play: Untangling strategic reasoning to advance AI

Assistant Professor Gabriele Farina mines the foundations of decision-making in complex multi-agent scenarios.

33
NVIDIA Developer Blog official-blog 8d ago

How to Build In-Vehicle AI Agents with NVIDIA: From Cloud to Car

The automotive cockpit is undergoing a fundamental shift from rule-based interfaces to agentic, multimodal AI systems capable of reasoning, planning, and...

10
NVIDIA Developer Blog official-blog 14d ago

Powering AI Factories with NVIDIA Enterprise Reference Architectures

The next wave of enterprise productivity is being built on AI factories. As organizations deploy agentic AI systems capable of reasoning, automation, and...

23
NVIDIA Developer Blog official-blog 15d ago

NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single Efficient Open Model

Agentic systems often reason across screens, documents, audio, video, and text within a single perception‑to‑action loop. However, they still rely on...

7
Smol AI News news-outlet 19d ago

DeepSeek v4

**DeepSeek-V4** technical release features a **1.6T-parameter MoE with 49B active parameters** and **1M-token context**, showcasing hybrid attention and compressed KV schemes for major memory reductions. It ranks as the **#2 open-weights reasoning model** behind **Kimi K2.6**…

13
Vercel — AI dev-tools 20d ago

Deepseek V4 on AI Gateway

DeepSeek V4 is now available on Vercel AI Gateway . There are 2 model variants: DeepSeek V4 Pro and DeepSeek V4 Flash. A 1M token context window is the default across both models. DeepSeek V4 Pro focuses on agentic coding, formal mathematical reasoning, and long-horizon…

27
MIT News — AI research 21d ago

Teaching AI models to say “I’m not sure”

A new training method improves the reliability of AI confidence estimates without sacrificing performance, addressing a root cause of hallucination in reasoning models.

34
Smol AI News news-outlet 21d ago

not much happened today

**Alibaba** released **Qwen3.6-27B**, a dense, Apache 2.0 open coding model with thinking and non-thinking modes, outperforming the larger Qwen3.5-397B-A17B on multiple coding benchmarks including SWE-bench and Terminal-Bench. It supports native vision-language reasoning over…

15
OpenAI news 22d ago

Introducing ChatGPT Images 2.0

ChatGPT Images 2.0 introduces a state-of-the-art image generation model with improved text rendering, multilingual support, and advanced visual reasoning.

9
NVIDIA Developer Blog official-blog 22d ago

Run High-Throughput Reinforcement Learning Training with End-to-End FP8 Precision

As LLMs transition from simple text generation to complex reasoning, reinforcement learning (RL) plays a central role. Algorithms like Group Relative Policy...

31
Smol AI News news-outlet 26d ago

not much happened today

**Anthropic** launched **Claude Design**, a prototyping tool powered by **Claude Opus 4.7**, targeting design workflows and competing with **Figma** and others. Benchmarks show **Opus 4.7** leading in coding and text tasks, with improved efficiency and adaptive reasoning, though…

7
Smol AI News news-outlet 27d ago

Anthropic's Claude Opus 4.7

**Anthropic** launched **Claude Opus 4.7**, its most capable Opus model yet, featuring stronger coding and agentic performance, a new tokenizer, and improved long-context handling with a new **xhigh** reasoning tier. Benchmarks show substantial gains, including **SWE-bench Pro…

37
OpenAI news 27d ago

Introducing GPT-Rosalind for life sciences research

OpenAI introduces GPT-Rosalind, a frontier reasoning model built to accelerate drug discovery, genomics analysis, protein reasoning, and scientific research workflows.

32
Marcus on AI community 1mo ago

Even more good news for the future of neurosymbolic AI

And vindication for Apple’s unfairly maligned 2025 reasoning paper

36
Smol AI News news-outlet 1mo ago

not much happened today

**Meta Superintelligence Labs** launched **Muse Spark**, a natively multimodal reasoning model featuring tool use, visual chain of thought, and multi-agent orchestration. It is live on **meta.ai** and the Meta AI app with a private API preview and plans for open-sourcing future…

29
Smol AI News news-outlet 1mo ago

not much happened today

**Gemma 4** was launched by **Google** under an **Apache 2.0 license**, marking a significant open-model release focused on **reasoning, agentic workflows, multimodality, and on-device use**. It outperforms models 10x larger and has immediate ecosystem support including…

35
Vercel — AI dev-tools 1mo ago

Qwen 3.6 Plus on AI Gateway

Qwen 3.6 Plus from Alibaba is now available on Vercel AI Gateway . Compared to Qwen 3.5 Plus, this model adds stronger agentic coding capabilities, from frontend development to repository-level problem solving, along with improved multimodal perception and reasoning. It features…

19
Smol AI News news-outlet 1mo ago

not much happened today

**Anthropic** is reportedly introducing a new AI model tier called **Capybara**, which is larger and more intelligent than **Claude Opus 4.6**, showing improved performance in coding, academic reasoning, and cybersecurity. The model is speculated to be around **10 trillion…

38
NVIDIA Developer Blog official-blog 1mo ago

Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and Safety

Agentic AI is an ecosystem where specialized models work together to handle planning, reasoning, retrieval, and safety guardrailing. As these systems scale,...

37
Smol AI News news-outlet 1mo ago

not much happened today

**ARC-AGI-3** benchmark introduced by **@arcprize** and **François Chollet** resets the frontier for general agentic reasoning with humans solving 100% of tasks versus under 1% for current models, focusing on zero-preparation generalization and human-like learning efficiency.…

4
NVIDIA Developer Blog official-blog 1mo ago

How NVIDIA Dynamo 1.0 Powers Multi-Node Inference at Production Scale

Reasoning models are growing rapidly in size and are increasingly being integrated into agentic AI workflows that interact with other models and external tools....

6
NVIDIA Developer Blog official-blog 1mo ago

NVIDIA Vera CPU Delivers High Performance, Bandwidth, and Efficiency for AI Factories

AI is evolving, and reasoning models are increasing token demand, placing new requirements on every layer of AI infrastructure. More than ever, compute must...

11
NVIDIA Developer Blog official-blog 1mo ago

NVIDIA Vera Rubin POD: Seven Chips, Five Rack-Scale Systems, One AI Supercomputer

Artificial intelligence is token-driven. Every prompt, reasoning step, and agent interaction generates tokens. Over the past year, token consumption has grown...

33
NVIDIA Developer Blog official-blog 2mo ago

Scale Synthetic Data and Physical AI Reasoning with NVIDIA Cosmos World Foundation Models

The next generation of AI-driven robots like humanoids and autonomous vehicles depends on high-fidelity, physics-aware training data. Without diverse and...

34
NVIDIA Developer Blog official-blog 2mo ago

Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning

Agentic AI systems need models with the specialized depth to solve dense technical problems autonomously. They must excel at reasoning, coding, and long-context...

6
NVIDIA Developer Blog official-blog 2mo ago

Building Telco Reasoning Models for Autonomous Networks with NVIDIA NeMo

Autonomous networks are quickly becoming one of the top priorities in telecommunications. According to the latest NVIDIA State of AI in Telecommunications...

25
Smol AI News news-outlet 2mo ago

Gemini 3.1 Pro: 2x 3.0 on ARC-AGI 2

**Google** released **Gemini 3.1 Pro**, a developer preview integrated across the **Gemini app**, **NotebookLM**, **Gemini API / AI Studio**, and **Vertex AI**, highlighting a significant reasoning improvement with **ARC-AGI-2 = 77.1%** and strong coding and agentic-tool…

10
Smol AI News news-outlet 2mo ago

Claude Sonnet 4.6: clean upgrade of 4.5, mostly better with some caveats

**Anthropic** launched **Claude Sonnet 4.6**, an upgrade over Sonnet 4.5, featuring broad improvements in **coding, long-context reasoning, agent planning, knowledge work, and design**, plus a **1M-token context window (beta)**. Benchmarks show Sonnet 4.6 leading on **GDPval-AA…

4
Smol AI News news-outlet 3mo ago

new Gemini 3 Deep Think, Anthropic $30B @ $380B, GPT-5.3-Codex Spark, MiniMax M2.5

**Google DeepMind** is rolling out the upgraded **Gemini 3 Deep Think V2** reasoning mode to **Google AI Ultra** subscribers and opening early access to the **Vertex AI / Gemini API** for select users. Key benchmark achievements include **ARC-AGI-2 at 84.6%**, **Humanity’s Last…

31
Ahead of AI (Sebastian Raschka) research 3mo ago

Categories of Inference-Time Scaling for Improved LLM Reasoning

And an Overview of Recent Inference-Scaling Papers

11
Hugging Face official-blog 4mo ago

NVIDIA Cosmos Reason 2 Brings Advanced Reasoning To Physical AI

Back to Articles NVIDIA Cosmos Reason 2 Brings Advanced Reasoning To Physical AI Enterprise + Article Published January 5, 2026 Upvote 64 Tsung-Yi Lin tsungyi nvidia Debraj Sinha debrajsinha nvidia NVIDIA today released Cosmos Reason 2 , the latest advancement in open, reasoning…

17
Smol AI News news-outlet 4mo ago

not much happened today

**Zhipu AI's GLM-4.7** release marks a significant improvement in **coding, complex reasoning, and tool use**, quickly gaining ecosystem adoption via Hugging Face and OpenRouter. **Xiaomi's MiMo-V2-Flash** is highlighted as a practical, cost-efficient mixture-of-experts model…

30
Hugging Face official-blog 5mo ago

DeepMath: A lightweight math reasoning Agent with smolagents

Back to Articles DeepMath: A lightweight math reasoning Agent with smolagents Published December 4, 2025 Update on GitHub Upvote 40 Daniel Fleischer danf Intel Moshe Berchansky mber Intel Moshe Wasserblat moshew Intel By Intel AI Software Group DeepMath is an aligned math…

22
Hugging Face official-blog 5mo ago

Apriel-H1: The Surprising Key to Distilling Efficient Reasoning Models

Back to Articles Apriel-H1: The Surprising Key to Distilling Efficient Reasoning Models Enterprise Article Published November 19, 2025 Upvote 34 Torsten Scholak tscholak ServiceNow-AI Oleksiy Ostapenko ostapeno ServiceNow-AI Raymond Li RaymondLi ServiceNow-AI Luke Kumar…

17
Google DeepMind official-blog 11mo ago

Gemini 2.5: Updates to our family of thinking models

Explore the latest Gemini 2.5 model updates with enhanced performance and accuracy: Gemini 2.5 Pro now stable, Flash generally available, and the new Flash-Lite in preview.

32
Google DeepMind official-blog 11mo ago

Gemini 2.5: Our most intelligent models are getting even better

Gemini 2.5 Pro continues to be loved by developers as the best model for coding, and 2.5 Flash is getting even better with a new update. We’re bringing new capabilities to our models, including Deep Think, an experimental enhanced reasoning mode for 2.5 Pro.

34
Lil'Log (Lilian Weng) research 12mo ago

Why We Think

Special thanks to John Schulman for a lot of super valuable feedback and direct edits on this post. Test time compute ( Graves et al. 2016 , Ling, et al. 2017 , Cobbe et al. 2021 ) and Chain-of-thought (CoT) ( Wei et al. 2022 , Nye et al. 2021 ), have led to significant…

25
Ahead of AI (Sebastian Raschka) research 12mo ago

The State of Reinforcement Learning for LLM Reasoning

Understanding GRPO and New Insights from Reasoning Model Papers

25
Google DeepMind official-blog 13mo ago

Introducing Gemini 2.5 Flash

Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off.

23
Ahead of AI (Sebastian Raschka) research 13mo ago

First Look at Reasoning From Scratch: Chapter 1

Welcome to the next stage of large language models (LLMs): reasoning. LLMs have transformed how we process and generate text, but their success has been largely driven by statistical pattern recognition. However, new advances in reasoning methodologies now enable LLMs to tackle…

25
Ahead of AI (Sebastian Raschka) research 14mo ago

The State of LLM Reasoning Model Inference

Inference-Time Compute Scaling Methods to Improve Reasoning Models

26
Ahead of AI (Sebastian Raschka) research 15mo ago

Understanding Reasoning LLMs

Methods and Strategies for Building and Refining Reasoning Models

26
Maarten Grootendorst research 15mo ago

A Visual Guide to Reasoning LLMs

Exploring Test-Time Compute Techniques and DeepSeek-R1

9
Nonint (James Betker) research 16mo ago

Beating ARC the hard way

ARC is benchmark developed to test out of distribution reasoning and common sense in general solvers. It is specifically designed to be: Easily solvable by most humans Not amenable to any kind of brute-force solvers (e.g. try every permutation of a solution) Not able to be…

4
Eugene Yan research 23mo ago

Prompting Fundamentals and How to Apply them Effectively

Structured input/output, prefilling, n-shots prompting, chain-of-thought, reducing hallucinations, etc.

20

sensenova/SenseNova-U1-A3B-MoT · Hugging Face

b9133

server, webui: support continue generation on reasoning models by ServeurpersoCom · Pull Request #22727 · ggml-org/llama.cpp

LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models

Efficient LLM Reasoning via Variational Posterior Guidance with Efficiency Awareness

Latent Chain-of-Thought Improves Structured-Data Transformers

Drop the Act: Probe-Filtered RL for Faithful Chain-of-Thought Reasoning

Understanding and Preventing Entropy Collapse in RLVR with On-Policy Entropy Flow Optimization

ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV

An Empirical Study of Automating Agent Evaluation

Deep Reasoning in General Purpose Agents via Structured Meta-Cognition

Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting

OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models

YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning

Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

MedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answering

Scalable Token-Level Hallucination Detection in Large Language Models

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

Unlocking LLM Creativity in Science through Analogical Reasoning

LatentRouter: Can We Choose the Right Multimodal Model Before Seeing Its Answer?

Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models

fg-expo: Frontier-guided exploration-prioritized policy optimization via adaptive kl and gaussian curriculum

Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

llm 0.32a2

not much happened today

Streaming Tokens and Tools: Multi-Turn Agentic Harness Support in NVIDIA Dynamo

GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs

Games people — and machines — play: Untangling strategic reasoning to advance AI

How to Build In-Vehicle AI Agents with NVIDIA: From Cloud to Car

Powering AI Factories with NVIDIA Enterprise Reference Architectures

NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single Efficient Open Model

DeepSeek v4

Deepseek V4 on AI Gateway

Teaching AI models to say “I’m not sure”

not much happened today

Introducing ChatGPT Images 2.0

Run High-Throughput Reinforcement Learning Training with End-to-End FP8 Precision

not much happened today

Anthropic's Claude Opus 4.7

Introducing GPT-Rosalind for life sciences research

Even more good news for the future of neurosymbolic AI

not much happened today

not much happened today

Qwen 3.6 Plus on AI Gateway

not much happened today

Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and Safety

not much happened today

How NVIDIA Dynamo 1.0 Powers Multi-Node Inference at Production Scale

NVIDIA Vera CPU Delivers High Performance, Bandwidth, and Efficiency for AI Factories

NVIDIA Vera Rubin POD: Seven Chips, Five Rack-Scale Systems, One AI Supercomputer

Scale Synthetic Data and Physical AI Reasoning with NVIDIA Cosmos World Foundation Models

Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning

Building Telco Reasoning Models for Autonomous Networks with NVIDIA NeMo

Gemini 3.1 Pro: 2x 3.0 on ARC-AGI 2

Claude Sonnet 4.6: clean upgrade of 4.5, mostly better with some caveats

new Gemini 3 Deep Think, Anthropic $30B @ $380B, GPT-5.3-Codex Spark, MiniMax M2.5

Categories of Inference-Time Scaling for Improved LLM Reasoning

NVIDIA Cosmos Reason 2 Brings Advanced Reasoning To Physical AI

not much happened today

DeepMath: A lightweight math reasoning Agent with smolagents

Apriel-H1: The Surprising Key to Distilling Efficient Reasoning Models

Gemini 2.5: Updates to our family of thinking models

Gemini 2.5: Our most intelligent models are getting even better

Why We Think

The State of Reinforcement Learning for LLM Reasoning

Introducing Gemini 2.5 Flash

First Look at Reasoning From Scratch: Chapter 1

The State of LLM Reasoning Model Inference

Understanding Reasoning LLMs

A Visual Guide to Reasoning LLMs

Beating ARC the hard way

Prompting Fundamentals and How to Apply them Effectively