News / #reasoning Tag Reasoning 500 articles archived under #reasoning · RSS Sign in to follow Hugging Face Daily Papers research 13d ago Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning Abstract Prompt-Level Distillation extracts reasoning patterns from teacher models to enhance student model performance while maintaining interpretability and reducing latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Advanced reasoning typically requires Chain-of-Thought… 18 Smol AI News news-outlet 14d ago GLM 5.2: the top Frontend Coding model in the world, IndexShare reduces costs **Z.ai released GLM-5.2**, an MIT-licensed open-weight frontier model targeting **coding and long-horizon agentic tasks** with a **1M-token context window** and **two reasoning-effort modes**. It features a **744B-parameter mixture-of-experts architecture** with **40B active… 14 Hugging Face Daily Papers research 14d ago Implicit Reasoning for Large Language Model-based Generative Recommendation Abstract Large Language Models for generative recommendation face challenges with semantic IDs disrupting natural-language reasoning, prompting a lightweight implicit reasoning approach that outperforms explicit methods while reducing computational costs. Generated by… 16 arXiv — Machine Learning research 14d ago Beyond Accuracy: Measuring Bias Acknowledgment in Chain-of-Thought Reasoning for Responsible AI Evaluation arXiv:2606.15127v1 Announce Type: new Abstract: Reasoning models are increasingly used in settings where the final answer is not the only object of review: educational tools may show students intermediate steps, decision-support systems may require human oversight, and audit… 11 arXiv — Machine Learning research 14d ago Semantic Reasoning in Medicine: The Role of Knowledge Graphs Across Five Key Domains arXiv:2606.15155v1 Announce Type: new Abstract: Knowledge graphs (KGs) have emerged as a promising solution for integrating and reasoning over complex biomedical and clinical data in healthcare. By representing structured relationships among entities such as diseases, drugs,… 17 arXiv — Machine Learning research 14d ago Understanding Diversity Collapse in RLVR via the Lens of Overtraining arXiv:2606.15455v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a key approach for enhancing the reasoning abilities of large language models. However, RLVR often suffers from \emph{diversity collapse}: Pass@$1$ improves while… 6 arXiv — Machine Learning research 14d ago Localizing Credit at the Divergence: Path-Conditioned Self-Distillation for LLM Reasoning arXiv:2606.15576v1 Announce Type: new Abstract: Reinforcement learning from verifiable rewards assigns a single scalar to each rollout, leaving token-level credit assignment underspecified in long reasoning traces. On-policy self-distillation addresses this by letting the same… 6 arXiv — Machine Learning research 14d ago Is Code Better Than Language for Algorithmic Reasoning arXiv:2606.15589v1 Announce Type: new Abstract: For tool-augmented language models, comparing natural-language reasoning with code-execution pipelines is difficult because the comparison changes both the intermediate representation and the execution mechanism. We separate these… 29 arXiv — Machine Learning research 14d ago Formalizing and Mitigating Structural Distortion in LLM Attention for Zero-Shot Graph Reasoning arXiv:2606.15633v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown promise for reasoning over Text-Attributed Graphs (TAGs). However, applying LLMs to graphs requires linearizing their structure into sequences, introducing distortion rooted in the graph… 9 arXiv — Machine Learning research 14d ago ReQAT: Achieving Full-Precision Reasoning Accuracy with 4-bit Floating-Point Quantization-Aware Training arXiv:2606.15682v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) achieve strong problem-solving through long chain-of-thought, but their deployment is constrained by the high cost of full-precision inference and growing KV cache footprints. Microscaled FP4 formats… 35 arXiv — NLP / Computation & Language research 14d ago CoRA: Confidence-Rationale Alignment for Reliable Chain-of-Thought Reasoning arXiv:2606.14961v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning can improve LLM performance, but high answer confidence may be misleading when the accompanying CoT rationale is plausible yet incomplete or poorly supported. We study confidence--rationale… 21 arXiv — NLP / Computation & Language research 14d ago Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning arXiv:2606.15007v1 Announce Type: new Abstract: We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. We pre-trained Nemotron 3 Ultra on 20 trillion text tokens, then extended the context… 19 arXiv — NLP / Computation & Language research 14d ago Stop When Further Reasoning Won't Help: Attention-State Adaptive Generation in Reasoning Models arXiv:2606.15070v1 Announce Type: new Abstract: By incorporating test-time compute scaling, large reasoning models (LRMs) can solve complex problems through explicit chain-of-thought (CoT) reasoning processes. However, they often suffer from overthinking, resulting in redundant… 23 arXiv — NLP / Computation & Language research 14d ago Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale arXiv:2606.15079v1 Announce Type: new Abstract: Efficient and scalable agentic intelligence requires models that can deliver both low-latency responses and strong reasoning capabilities while remaining practical to train, serve, and deploy. In this report, we present Ling-2.6… 16 arXiv — NLP / Computation & Language research 14d ago AdaMame: A Training Recipe for Adaptive Multilingual Reasoning arXiv:2606.15080v1 Announce Type: new Abstract: While Large Reasoning Models (LRMs) show strong performance in English, they often fail to reason in the language of the query, a phenomenon known as language collapse. Existing RL-based fixes typically add a binary language… 31 arXiv — NLP / Computation & Language research 14d ago Adapting Reinforcement Learning with Chain-of-Thought Supervision for Explainable Detection of Hateful and Propagandistic Memes arXiv:2606.15307v1 Announce Type: new Abstract: Hateful and propagandistic memes exploit the interplay between images and text to convey harmful intent that neither modality reveals alone. Although thinking-based multimodal large language models (MLLMs) have advanced… 21 arXiv — NLP / Computation & Language research 14d ago Let LLMs Judge Each Other: Multi-Agent Peer-Reviewed Reasoning for Medical Question Answering arXiv:2606.15419v1 Announce Type: new Abstract: Objective: To enhance the accuracy, interpretability, and robustness of large language models (LLMs) in medical question answering (MedQA). Method: We designed a multi-agent peer-reviewed reasoning method in which multiple LLM… 22 arXiv — NLP / Computation & Language research 14d ago Vernier: Probing Representational Misalignment Behind Lexical Gaps in Causal Reasoning arXiv:2606.15733v1 Announce Type: new Abstract: Instruction-tuned language models can answer the same causal-reasoning question differently after its English variable names are replaced by type-preserving placeholders, although the structural causal model and the gold answer are… 21 arXiv — NLP / Computation & Language research 14d ago ttda704 at SemEval-2026 Task 6: Structured Chain-of-Thought Prompting for Political Evasion Detection arXiv:2606.15770v1 Announce Type: new Abstract: This paper describes our system for SemEval-2026 Task 6, which addresses the classification of political evasion strategies in English question-answer pairs extracted from U.S. presidential interviews. We systematically compare two… 31 arXiv — NLP / Computation & Language research 14d ago When Correct Edges Cannot Be Verified: A Provenance Gap in Incomplete KGQA and a Provenance-Favoring Completion Policy arXiv:2606.15833v1 Announce Type: new Abstract: Incomplete Knowledge Graph Question Answering (IKGQA) requires completing missing edges to continue reasoning. A growing line of work verifies completed edges against retrieved text, treating textual support as a proxy for edge… 10 arXiv — NLP / Computation & Language research 14d ago SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks arXiv:2606.15872v1 Announce Type: new Abstract: Frontier scientific reasoning remains a major challenge for large language models (LLMs), where even the strongest commercial systems fall short of expert-level performance. A closer look at model behavior reveals substantial… 27 arXiv — NLP / Computation & Language research 14d ago Free Energy Heuristics: Fast-And-Frugal Cognition as Active Inference Under Uncertain Precision arXiv:2606.15877v1 Announce Type: new Abstract: Chain-of-thought (CoT) improves large language models' performance in math and symbolic reasoning. But on planning, contested ethics, and tasks where the model cannot check itself, more reasoning makes things worse. Both effects… 8 arXiv — NLP / Computation & Language research 14d ago Neuron Level Analysis of Large Language Model in Legal Domain Reasoning arXiv:2606.15884v1 Announce Type: new Abstract: We presented a neuron-level analysis of legal-domain reasoning in LLMs, comparing it with other applied domain tasks across seven open-weight models. Using neuron attribution scores to rank and suppress influential neurons, we… 4 arXiv — NLP / Computation & Language research 14d ago Formalize Once, Edit the Rest: Efficient Lean-Based Answer Selection for Math Reasoning arXiv:2606.15972v1 Announce Type: new Abstract: With large language models (LLMs) increasingly applied to mathematical reasoning, formal proof assistants such as Lean can be leveraged to verify reasoning outputs with machine-checkable rigor, enabling use cases such as answer… 30 arXiv — NLP / Computation & Language research 14d ago A Large-Scale Multi-Dimensional Empirical Study of LLMs for Conversation Summarization arXiv:2606.15974v1 Announce Type: new Abstract: Despite the significant advancement of LLMs in conversation summarization, their evaluation remains limited by insufficient scenarios, input lengths, and sample sizes. Furthermore, existing benchmarks often omit frontier reasoning… 30 arXiv — NLP / Computation & Language research 14d ago From Argument Components to Graphs: A Multi-Agent Debate with Confidence Gating for Argument Relations arXiv:2606.16047v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly assessed and utilized in the field of Argument Mining (AM), thanks to their strong general reasoning capabilities. However, standard training-free models often miss sophisticated… 7 arXiv — NLP / Computation & Language research 14d ago Towards Pareto-Optimal Tool-Integrated Agents with Pareto Ranking Policy Optimization arXiv:2606.16111v1 Announce Type: new Abstract: Recent advances in tool-integrated language agents have significantly improved their ability to solve complex reasoning tasks. However, existing alignment methods predominantly focus on maximizing task accuracy, while overlooking… 10 arXiv — NLP / Computation & Language research 14d ago GRACE: Step-Level Benchmark for Faithful Reasoning over Context arXiv:2606.16151v1 Announce Type: new Abstract: Many reasoning tasks require models to reason over input context, from document-grounded question answering to rule-based deduction. Chain-of-Thought (CoT) prompting produces traces that appear transparent, yet individual steps can… 15 arXiv — NLP / Computation & Language research 14d ago Weaving Multi-Source Evidence for Biomedical Reasoning: The BioMedHop Benchmark and BioWeave Framework arXiv:2606.16211v1 Announce Type: new Abstract: Biomedical question answering (QA) increasingly requires reasoning over interacting entities, where supporting evidence is scattered across biomedical knowledge graphs, literature documents, and web-accessible resources. However,… 36 arXiv — NLP / Computation & Language research 14d ago Creative Collision: Directorial Persona Steering and Competition in Large Language Models arXiv:2606.16240v1 Announce Type: new Abstract: Activation steering has emerged as a powerful tool for shaping the behaviour of large language models at inference time, yet most prior work injects a \emph{single} semantic direction into the residual stream. We study the richer… 38 arXiv — NLP / Computation & Language research 14d ago Tyler: Typed Latent Reasoning for Language Models -- When to Think, What to Compute, and How Much to Allocate arXiv:2606.16360v1 Announce Type: new Abstract: Chain-of-thought (CoT) prompting improves reasoning in large language models (LLMs) by externalizing intermediate computation as discrete text tokens, but this textual interface also introduces redundancy and inference overhead.… 16 arXiv — NLP / Computation & Language research 14d ago A Mechanistic Understanding of Pronoun Fidelity in LLMs arXiv:2606.16407v1 Announce Type: new Abstract: Faithful and robust pronoun use is important for fair and coherent generations, yet large language models largely fail when multiple referents use different pronouns. To study the interplay of reasoning, repetition, and bias in… 35 Hugging Face Daily Papers research 14d ago Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Abstract Nemotron 3 Ultra is a large-scale language model featuring hybrid Mamba-Attention architecture with 550 billion parameters, achieving high inference throughput and extended context length through specialized training techniques. Generated by… 5 Hugging Face Daily Papers research 14d ago VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models Abstract VibeThinker-3B demonstrates that compact models can achieve state-of-the-art performance on verifiable reasoning tasks through specialized training techniques, challenging conventional scaling assumptions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct This technical… 16 arXiv — NLP / Computation & Language research 15d ago SuperThoughts: Reasoning Tokens in Superposition arXiv:2606.13862v1 Announce Type: cross Abstract: Long Chain-of-Thought (CoT) reasoning improves LLM problem-solving but is computationally expensive due to sequential token generation. While recent works explore reasoning in continuous latent spaces to bypass discrete token… 12 arXiv — Machine Learning research 15d ago EM-NeSy: Expectation Maximization for Neurosymbolic Learning arXiv:2606.14463v1 Announce Type: new Abstract: Neurosymbolic (NeSy) models integrate neural networks and symbolic reasoning for robust and interpretable AI. State-of-the-art NeSy models require that the symbolic component is expressed in a differentiable way, often complicating… 38 arXiv — Machine Learning research 15d ago When to Write and When to Suppress: Route-Specialized Dual Adapters for Memory-Assisted Knowledge Editing arXiv:2606.14668v1 Announce Type: new Abstract: Knowledge editing systems must update selected facts while preserving nearby but irrelevant behavior. This paper studies this problem in a memory-assisted setting where an edit memory is retrieved at inference time and a… 36 arXiv — NLP / Computation & Language research 15d ago Which Models Perform Better in Inheritance Reasoning? arXiv:2606.13751v1 Announce Type: new Abstract: This paper presents the participation of team PSL in the QIAS 2026 Shared Task on Arabic Islamic inheritance reasoning. The task evaluates the ability of large language models to solve inheritance cases that require legal… 6 arXiv — NLP / Computation & Language research 15d ago QIAS 2026: Overview of the Shared Task on Islamic Inheritance Reasoning arXiv:2606.13756v1 Announce Type: new Abstract: This paper presents a comprehensive overview of the QIAS 2026 shared task, organized as part of the OSACT7 Workshop and co-located with LREC 2026. The shared task was designed to evaluate the ability of large language models to… 35 arXiv — NLP / Computation & Language research 15d ago Implicit Reasoning for Large Language Model-based Generative Recommendation arXiv:2606.14142v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly adopted as backbones for Generative Recommendation (GR), promising access to pretrained world knowledge. Yet reliably invoking this knowledge for GR remains poorly understood. A key… 10 arXiv — NLP / Computation & Language research 15d ago AgentSpec: Understanding Embodied Agent Scaffolds Through Controlled Composition arXiv:2606.14674v1 Announce Type: new Abstract: LLM agents are increasingly built not as single model calls, but as scaffolded systems that combine reasoning, memory, reflection, action execution, and learning. While such scaffolds often improve performance, they are often… 13 arXiv — NLP / Computation & Language research 15d ago CORA: Analyzing and bridging thinking-answer gap in Multimodal RLVR via Consistency-Oriented Reasoning Alignment arXiv:2606.14691v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has successfully elicited the reasoning capabilities of large language models, motivating its extension to multimodal scenarios. Existing methods primarily focus on improving… 34 arXiv — NLP / Computation & Language research 15d ago AdaSR: Adaptive Streaming Reasoning with Hierarchical Relative Policy Optimization arXiv:2606.14694v1 Announce Type: new Abstract: Large reasoning models typically follow a read-then-think paradigm: they observe the complete input, reason over a static context, and then produce the answer. Yet many real-world scenarios are inherently dynamic, such as audio and… 4 arXiv — NLP / Computation & Language research 15d ago Poker Arena: Multi-Axis Profiling of Strategic Reasoning and Memory in LLMs arXiv:2606.13815v1 Announce Type: cross Abstract: Strategic reasoning under uncertainty underpins consequential decisions in negotiation, finance, and policy, but prevailing game-play benchmarks collapse heterogeneous reasoning dimensions into a single scalar, leaving the… 37 arXiv — NLP / Computation & Language research 15d ago GitOfThoughts: Version-Controlled Reasoning and Agent Memory You Can Replay, Diff, and Merge arXiv:2606.14470v1 Announce Type: cross Abstract: Large language model (LLM) reasoning is ephemeral: chains of thought vanish with the context window, pruned search branches leave no record, and memory buffers cannot be diffed, merged, or audited. Every other complex software… 37 arXiv — NLP / Computation & Language research 15d ago ClinHallu: A Benchmark for Diagnosing Stage-Wise Hallucinations in Medical MLLM Reasoning arXiv:2606.14697v1 Announce Type: cross Abstract: Building trustworthy medical multimodal large language models (MLLMs) is critical for reliable clinical decision support. Existing medical hallucination benchmarks mainly focus on data collection, but often ignore where… 4 arXiv — NLP / Computation & Language research 15d ago MET-Bench: Multimodal Entity Tracking for Evaluating the Limitations of Vision-Language and Reasoning Models arXiv:2502.10886v3 Announce Type: replace Abstract: Entity state tracking is a necessary component of world modeling that requires maintaining coherent representations of entities over time. Previous work has benchmarked entity tracking performance in purely text-based tasks. We… 23 arXiv — NLP / Computation & Language research 15d ago Reward-SQL: Boosting Text-to-SQL via Stepwise Execution-Aware Reasoning and Process-Supervised Rewards arXiv:2505.04671v3 Announce Type: replace Abstract: Recent advances in large language models (LLMs) trained with reinforcement learning (RL) have improved Text-to-SQL performance. However, RL-based approaches still struggle with complex queries due to two key limitations:… 18 arXiv — NLP / Computation & Language research 15d ago Pragmatic Inference for Moral Reasoning Acquisition: Generalization via Metapragmatic Links arXiv:2509.24102v5 Announce Type: replace Abstract: While moral reasoning has emerged as a promising research direction for large language models (LLMs), achieving robust generalization remains a critical challenge. This challenge arises from the gap between what is said and… 27 arXiv — NLP / Computation & Language research 15d ago C2-Faith: Benchmarking LLM Judges for Causal and Coverage Faithfulness in Chain-of-Thought Reasoning arXiv:2603.05167v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used as judges of chain-of-thought (CoT) reasoning, yet it remains unclear whether they can reliably assess process faithfulness rather than merely answer plausibility. We introduce… 20 Page 5 of 10 · 500 articles ← Newer Older →