News / #reasoning Tag Reasoning 500 articles archived under #reasoning · RSS Sign in to follow arXiv — NLP / Computation & Language research 4d ago Zero-shot Tweet-Level Stance Detection Enhanced by External Knowledge and Reflective Chain-of-Thought Reasoning arXiv:2606.26571v1 Announce Type: new Abstract: Zero-shot tweet-level stance detection confronts two primary challenges: (1) mitigating the context sparsity inherent in short texts, and (2) establishing the relevance between implicit targets and textual content. While existing… 35 arXiv — NLP / Computation & Language research 4d ago Beyond Logical Forms: LLM-Extracted Patterns for Fallacy Classification arXiv:2606.26698v1 Announce Type: new Abstract: In today's fast-paced information era, logical fallacies, defined as defective patterns of reasoning, inevitably contribute to the growth of information disorder. However, often fallacies appear in nuanced forms that complicate… 37 arXiv — NLP / Computation & Language research 4d ago Information-Aware KV Cache Compression for Long Reasoning arXiv:2606.26875v1 Announce Type: new Abstract: Reasoning capability has advanced rapidly in large language models (LLMs), leading to an increasing size of key-value (KV) cache in both prefilling and decoding stages. Existing KV cache compression methods mainly rely on attention… 15 arXiv — NLP / Computation & Language research 4d ago ReaORE: Reasoning-Guided Progressive Open Relation Extraction Empowered by Large Reasoning Models arXiv:2606.26986v1 Announce Type: new Abstract: Open Relation Extraction (OpenRE) requires a model to extract unseen relations between head and tail entities from unstructured text for real-world applications. The core challenge of OpenRE lies in achieving reliable… 13 arXiv — NLP / Computation & Language research 4d ago Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization arXiv:2606.27025v1 Announce Type: new Abstract: Building general-purpose role-playing agents that faithfully portray any character from a natural-language profile remains challenging. The dominant paradigm -- supervised fine-tuning -- encourages behavioral mimicry without deep,… 16 arXiv — NLP / Computation & Language research 4d ago The Riddle Riddle: Testing Flexible Reasoning in Large Language Models and Humans arXiv:2606.27103v1 Announce Type: new Abstract: Humans flexibly adapt their reasoning strategies to the requirements of a given problem. Large language models (LLMs) have performed well on many cognitive tasks, however, it is unclear whether this accuracy is a result of pattern… 9 arXiv — NLP / Computation & Language research 4d ago Multilingual Reasoning Cascades Need More Context arXiv:2606.27306v1 Announce Type: new Abstract: Translation cascades for reasoning translate the query from another language to English, reason in English, and translate the answer back to the original language. This is a competitive approach to multilingual reasoning, but… 7 arXiv — NLP / Computation & Language research 4d ago The Verification Horizon: No Silver Bullet for Coding Agent Rewards arXiv:2606.26300v1 Announce Type: cross Abstract: A classical intuition holds that verifying a solution is easier than producing one. For today's coding agents, this intuition is being inverted: as foundation models develop stronger reasoning capabilities and engineering… 24 arXiv — NLP / Computation & Language research 4d ago Narration-of-Thought: Inference-Time Scaffolding for Defeasible Ethical Reasoning in Large Language Models arXiv:2606.26366v1 Announce Type: cross Abstract: Standard chain-of-thought on moral dilemmas exhibits two failure modes: stakeholder collapse (the trace names at most one party with a stake in the outcome) and uncertainty suppression (no explicit unknowns or hedges before… 29 arXiv — NLP / Computation & Language research 4d ago Staying VIGILant: Mitigating Visual Laziness via Counterfactual Visual Alignment in MLLMs arXiv:2606.26387v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) extend large language models (LLMs) with visual perception, enabling joint reasoning over images and text. Despite inheriting strong reasoning capabilities from LLMs, they remain prone to… 19 arXiv — NLP / Computation & Language research 4d ago Humans Disengage, Reasoning Models Persist: Separating Difficulty Registration from Deliberation Allocation arXiv:2606.26502v1 Announce Type: cross Abstract: Large reasoning models (LRMs) take longer on harder problems, just as humans do. This surface similarity hides an opposite pattern within items. When an LRM gets a problem wrong, it spends more tokens than when it gets the same… 29 arXiv — NLP / Computation & Language research 4d ago Do Safety Guardrails Need to Reason? LeanGuard: A Fast and Light Approach for Robust Moderation arXiv:2606.26686v1 Announce Type: cross Abstract: In order to screen a prompt or a response, the recent guardrail methods generate a chain-of-thought (CoT) before they issue a verdict. This design follows a common belief that step-by-step reasoning improves a decision. However,… 17 arXiv — NLP / Computation & Language research 4d ago Vis-CoT: A Human-in-the-Loop Framework for Interactive Visualization and Intervention in LLM Chain-of-Thought Reasoning arXiv:2509.01412v3 Announce Type: replace Abstract: Large language models (LLMs) show strong reasoning via chain-of-thought (CoT) prompting, but the process is opaque, which makes verification, debugging, and control difficult in high-stakes settings. We present Vis-CoT, a… 37 Hugging Face Daily Papers research 4d ago OpenBioRQ: Unsolved Biomedical Research Questions for Agents Abstract A new biomedical benchmark evaluates agentic models' ability to verify sources and avoid false citations by testing unsolved research questions with no answer keys, revealing significant failures in retrieval-grounded reasoning and tool usage. Generated by… 9 Hugging Face Daily Papers research 4d ago Confidence-Aware Tool Orchestration for Robust Video Understanding Abstract Robust-TO addresses the Blind Trust Problem in video reasoning by integrating per-frame trustworthiness into an agentic framework that improves accuracy under realistic perturbations through calibrated evidence weighting and reliability-aware reasoning. Generated by… 17 r/LocalLLaMA community 4d ago Qwen 3.6 27b GLM 5.2 fine-tune? Hi everyone, Since both models are open weights and GLM seems to find that secret to frontier model reasoning, why don't we see any Qwen GLM finetune yet? Is it because GLM 5.2 is recent and finetune and datasets take time or the community is just not interested in the finetune?… 28 Hugging Face Daily Papers research 4d ago Do Thinking Tokens Help with Safety? Abstract Research reveals that reasoning models' safety outcomes are predictable from early hidden representations, with deliberation appearing but not substantially influencing final responses, and current safety interventions inadvertently suppress genuine deliberation… 25 Hugging Face Daily Papers research 4d ago Forecasting Future Behavior as a Learning Task Abstract Behavior Forecasters are trained to predict large reasoning model outputs from single trajectories, outperforming large language models while requiring significantly less computational cost. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Trust in an AI system is often… 24 Hugging Face Daily Papers research 4d ago ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation Abstract ReNIO enhances on-policy distillation for language models by reweighting negative trajectories based on token-level probability ratios, improving reasoning performance in mathematical and code generation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct On-policy… 25 Smol AI News news-outlet 4d ago not much happened today **Z.ai's GLM-5.2** leads in coding and agent benchmarks with top scores like **1595** on Code Arena: Frontend and **34.29%** reasoning accuracy with zero failures. Databricks improved GLM-5.2 speed to **392 tok/s** using hardware and optimizations. **Ornith-1.0**, a new… 13 Hugging Face Daily Papers research 4d ago RL-Index: Reinforcement Learning for Retrieval Index Reasoning Abstract RL-Index introduces an agentic indexing framework that shifts reasoning from query time to indexing stage by using LLM-generated rationales and reinforcement learning to improve retrieval effectiveness and reduce latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 25 Hugging Face Daily Papers research 5d ago V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning Abstract A novel label-free framework for visual reasoning called V-Zero is presented, which uses contrastive evidence gating to improve fine-grained visual reasoning without requiring annotated answer labels, achieving faster training than traditional methods. Generated by… 12 arXiv — Machine Learning research 5d ago Holographic Memory for Zero-Shot Compositional Reasoning in Knowledge Graphs: A Mechanistic Study of Where and Why It Fails arXiv:2606.24948v1 Announce Type: new Abstract: Knowledge graph embedding (KGE) models predict single-hop links well but have no mechanism for zero-shot compositional queries: multi-hop questions whose relation chains never appeared during training. Holographic Reduced… 31 arXiv — Machine Learning research 5d ago MacroLens: A Multi-Task Benchmark for Contextual Financial Reasoning under Macroeconomic Scenarios arXiv:2606.24950v1 Announce Type: new Abstract: Financial decision-making is contextual: forecasting prices, valuing companies, and assessing event exposure weigh price history, accounting fundamentals, macroeconomic regime, and contemporaneous text. A benchmark over these four… 25 arXiv — Machine Learning research 5d ago ExTra: Exploratory Trajectory Optimization for Language Model Reinforcement Learning arXiv:2606.24994v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) for language-model reasoning can fail at both extremes of task difficulty: easy prompts often produce all-correct, low-diversity rollout groups with little gradient signal,… 25 arXiv — Machine Learning research 5d ago Geo-Strat-RL: Learning Geological Event Reasoning from Verifiable Tasks arXiv:2606.25000v1 Announce Type: new Abstract: To evaluate whether vision-language models can reason about geological histories, it is necessary to construct observations for which the underlying process history is known. Furthermore, reasoning over geological histories is not… 6 arXiv — Machine Learning research 5d ago Multi-Stream Temporal Fusion for Financial Fraud Detection arXiv:2606.25007v1 Announce Type: new Abstract: Financial fraud detection in digital banking requires reasoning over multiple heterogeneous event streams -- transactions, login sessions, risk signals -- that individually appear benign but collectively reveal fraudulent patterns.… 14 arXiv — NLP / Computation & Language research 5d ago Do Thinking Tokens Help with Safety? arXiv:2606.25013v1 Announce Type: cross Abstract: Today's reasoning models use thinking tokens to attain stronger performance on benchmarks than their instruction-tuned counterparts. It is also generally believed that this more "deliberative" mode should improve alignment and… 37 arXiv — NLP / Computation & Language research 5d ago Hybrid-IR: Dual-Path Hybrid Retrieval with Iterative Reasoning for Complex Medical Question Answering arXiv:2606.25338v1 Announce Type: new Abstract: Large language models (LLMs) have shown promising performance across a wide range of biomedical applications, including medical question answering (QA), yet they remain prone to hallucinations and outdated knowledge. Although… 9 arXiv — NLP / Computation & Language research 5d ago Efficient and Trainable Language Model Test-Time Scaling via Local Branch Routing arXiv:2606.25354v1 Announce Type: new Abstract: Test-time scaling improves language-model reasoning, but existing approaches often face a difficult trade-off: long chain-of-thought sampling remains single-threaded, while sentence- or solution-level search can be computationally… 22 arXiv — NLP / Computation & Language research 5d ago Riazi-8B: An Urdu Large Language Model for Mathematical Reasoning arXiv:2606.25568v1 Announce Type: new Abstract: Recent LLMs demonstrate strong mathematical reasoning capabilities, but existing gains rely heavily on English-centric training resources and benchmarks. As a result, reasoning performance degrades substantially in low-resource… 27 arXiv — NLP / Computation & Language research 5d ago OPERA: Aligning Open-Ended Reasoning via Objective Perplexity-based Reinforcement Learning arXiv:2606.25757v1 Announce Type: new Abstract: Reinforcement Learning (RL) has enabled LLMs to excel in objective reasoning tasks such as mathematics and code generation. However, applying RL to open-ended tasks, such as creative writing, remains challenging because… 22 arXiv — NLP / Computation & Language research 5d ago RAVEN: Long-Horizon Reasoning & Navigation with a Visuo-Spatio-Temporal Memory arXiv:2606.25206v1 Announce Type: cross Abstract: Long-term robot deployment requires a compact and scalable memory that preserves fine-grained visual semantics, grounds observations in space and time, and enables efficient storage and retrieval. In this paper, we propose RAVEN,… 21 arXiv — NLP / Computation & Language research 5d ago Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning arXiv:2606.25524v1 Announce Type: cross Abstract: Large language models (LLMs) reach high accuracy in mathematical reasoning, but individual traces on the same problem diverge; some arrive at the correct answer while others fail. Prior work analyzes failure at the step, chunk,… 32 arXiv — NLP / Computation & Language research 5d ago How Robust is OCR-Reasoning? Evaluating OCR-Reasoning Robustness of Vision-Language Models under Visual Perturbations arXiv:2606.26041v1 Announce Type: cross Abstract: Vision-language models (VLMs) have achieved strong performance on OCR-based benchmarks and increasingly focused on text-rich understanding, but their robustness under controlled visual degradation remains insufficiently… 29 arXiv — NLP / Computation & Language research 5d ago Scale or Reason? A Compute-Equivalent Analysis of Reasoning Distillation arXiv:2509.22193v2 Announce Type: replace Abstract: Distilling reasoning traces from strong teacher models has become the standard recipe for building capable small language models. Yet reasoning traces are 5-20$\times$ longer than standard instruction fine-tuning (IFT) outputs,… 19 arXiv — NLP / Computation & Language research 5d ago Reinforcement Learning Improves Traversal of Parametric Knowledge in LLMs arXiv:2511.05933v2 Announce Type: replace Abstract: Reinforcement learning (RL) is often credited with improving language model reasoning at the expense of knowledge. We challenge this narrative by showing that reasoning models consistently outperform their instruction-tuned… 11 arXiv — NLP / Computation & Language research 5d ago ConPress: Learning Efficient Reasoning from Multi-Question Contextual Pressure arXiv:2602.01472v2 Announce Type: replace Abstract: Large reasoning models (LRMs) typically solve reasoning-intensive tasks by generating long chain-of-thought (CoT) traces, leading to substantial inference overhead. We identify a reproducible inference-time phenomenon, termed… 5 Hugging Face Daily Papers research 5d ago Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do Abstract Multimodal Chain-of-Thought reasoning shows selective effectiveness across different tasks, with limitations in maintaining visual introspection during reasoning processes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Chain-of-Thought (CoT) has become a standard method… 17 Hugging Face Daily Papers research 5d ago IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation Abstract Implicit Visual Chain-of-Thought decomposes visual conditioning into structural and semantic cascades for improved structure-aware image generation with sketch supervision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Unified multi-modal large language models (MLLMs)… 7 Hugging Face Daily Papers research 5d ago AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning Abstract Large language models face challenges in archive-grounded reasoning tasks involving evidence retrieval and synthesis across diverse document collections, with performance varying significantly across domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language… 26 arXiv — Machine Learning research 6d ago Weight-Space Geometry of Offline Reasoning Training arXiv:2606.23740v1 Announce Type: new Abstract: Offline reinforcement-learning losses (RFT, RIFT, DFT, Offline GRPO, DPO) are widely used to distill reasoning from large teachers into smaller students, and are typically compared on downstream accuracy alone. We ask whether they… 6 arXiv — Machine Learning research 6d ago A Survey on Federated Causal Discovery and Inference arXiv:2606.23741v1 Announce Type: new Abstract: Causal reasoning, which encompasses the discovery of causal structures and the inference of causal effects, is fundamental to data-driven decision making. In practice, data for reliable causal analysis are often distributed across… 7 arXiv — NLP / Computation & Language research 6d ago Blockwise Policy-Drift Gating for On-Policy Distillation arXiv:2606.24084v1 Announce Type: cross Abstract: On-policy distillation (OPD) trains a student policy using teacher signals computed on trajectories sampled by the student itself. Recent work shows that sampled-token OPD can be fragile on long-horizon reasoning tasks and that… 30 arXiv — Machine Learning research 6d ago Reasoning as Attractor Dynamics: Latent Memory Retrieval via Gibbs-Weighted Energy Minimization arXiv:2606.24543v1 Announce Type: new Abstract: Large Language Models (LLMs) are traditionally viewed as autoregressive generators. However, from the perspective of collective computation, they function as high-dimensional Dense Associative Memories that store complex reasoning… 24 arXiv — NLP / Computation & Language research 6d ago CALIBER: Calibrating Confidence Before and After Reasoning in Language Models arXiv:2606.24281v1 Announce Type: new Abstract: Reasoning language models are increasingly asked not only to answer difficult questions, but also to estimate their likelihood of success. Existing methods typically elicit confidence only once: either before thinking or after… 4 arXiv — NLP / Computation & Language research 6d ago AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning arXiv:2606.24526v1 Announce Type: new Abstract: Large language models are increasingly deployed as agents that reason over documents rather than answer from parametric knowledge. We study archive-grounded reasoning: locating sparse evidence across a large, messy collection of… 36 arXiv — NLP / Computation & Language research 6d ago Qwen-AgentWorld: Language World Models for General Agents arXiv:2606.24597v1 Announce Type: new Abstract: A world model predicts environment dynamics based on current observations and actions, serving as a core cognitive mechanism for reasoning and planning. In this work, we investigate how world modeling based on language models can… 8 arXiv — NLP / Computation & Language research 6d ago Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs arXiv:2606.23938v1 Announce Type: cross Abstract: Driving VLA models incorporating Chain-of-Thought (CoT) reasoning are attractive because they leverage pretrained VLM representations and expose intermediate decisions in natural language, yet current rationales often lack the… 4 arXiv — NLP / Computation & Language research 6d ago Age of LLM: A Strategic 1v1 Benchmark for Reasoning, Diplomacy and Reliability of Large Language Models under Fog of War arXiv:2606.24391v1 Announce Type: cross Abstract: We introduce Age of LLM, a turn-based 1v1 benchmark in which two LLMs face off on a 13x7 grid to destroy the enemy base. Three stressors are deliberate: fog of war, full diplomacy (messages, ceasefires, ultimatums; uranium kept… 29 Page 2 of 10 · 500 articles ← Newer Older →