arXiv — NLP / Computation & Language

500 articles archived · Visit source ↗ · RSS

arXiv — NLP / Computation & Language research 1d ago

Formalizing Latent Thoughts: Four Axioms of Thought Representation in LLMs

arXiv:2606.27378v1 Announce Type: new Abstract: We introduce an axiomatic evaluation framework for latent thought representations in LLMs, comprising metrics that are independent of downstream benchmark scores and reveal representational failures that benchmark accuracy masks.…

29
arXiv — NLP / Computation & Language research 1d ago

Position: The Term "Machine Unlearning" Is Overused in LLMs

arXiv:2606.27379v1 Announce Type: new Abstract: Large language models increasingly face demands to "forget" training data, knowledge, or behaviors due to regulatory deletion obligations, copyright/licensing disputes, and safety or product-policy requirements. This position paper…

15
arXiv — NLP / Computation & Language research 1d ago

A Survey of Automated Presentation Coaching: Systems, Methods, and Open Challenges

arXiv:2606.27380v1 Announce Type: new Abstract: Automated coaching for oral presentations sits at the intersection of computer-assisted pronunciation training (CAPT), prosody modeling, and speech synthesis, yet no prior work has systematically surveyed and compared existing…

6
arXiv — NLP / Computation & Language research 1d ago

Causal Connections: Leveraging Multilingual Fine-Tuning for Financial QA@FinCausal 2026

arXiv:2606.27446v1 Announce Type: new Abstract: This paper describes team HSA_CORAL's submission to the FinCausal 2026 shared task on extracting cause-effect relations from financial narratives via extractive question answering in English and Spanish. We compare three modeling…

4
arXiv — NLP / Computation & Language research 1d ago

Developmental approach reveals the statistical learning of Neural Language Models: Transformers generalize from the most abstract statistical patterns

arXiv:2606.27460v1 Announce Type: new Abstract: In this study, we use a developmental approach to investigate the statistical learning and mental representation of neural language models (NLM). A series of Generative Transformer models are trained on a synthetic grammar. The…

4
arXiv — NLP / Computation & Language research 1d ago

Supersede: Diagnosing and Training the Memory-Update Gap in LLM Agents

arXiv:2606.27472v1 Announce Type: new Abstract: Large language model (LLM) agents operate over long, multi-session interactions in which facts change: a user moves, a price updates, a plan is revised. Acting correctly requires using the current value of a fact and discarding…

16
arXiv — NLP / Computation & Language research 1d ago

The Context-Ready Transformer

arXiv:2606.27538v1 Announce Type: new Abstract: We introduce the context-ready transformer, a new recurrent neural network architecture built from a D-layer transformer block that pre-contextualizes each token before it enters the block. During left-to-right generation, a…

26
arXiv — NLP / Computation & Language research 1d ago

EntMTP: Accelerating LLM Inference with Entropy Guided Multi Token Prediction

arXiv:2606.27550v1 Announce Type: new Abstract: Multi-token prediction has been shown to increase data density during training, improve downstream text-generation quality, and serves as the defacto approach for self-speculative decoding. Existing foundation and open source…

29
arXiv — NLP / Computation & Language research 1d ago

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

arXiv:2606.27595v1 Announce Type: new Abstract: Web-agent benchmarks overwhelmingly measure depth -- pinning one obscure answer behind a chain of constraints -- while breadth, exhaustively enumerating a closed set and filling each item's attributes, is barely evaluated,…

32
arXiv — NLP / Computation & Language research 1d ago

Narrative-UFET: Narrative Generation for Ultra-Fine Entity Typing

arXiv:2606.27598v1 Announce Type: new Abstract: Ultra-fine entity typing (UFET) assigns highly specific types to entity mentions, but current approaches struggle with types in the long tail. We hypothesize that a key limitation is the reliance on sentence-level context, since…

14
arXiv — NLP / Computation & Language research 1d ago

Masked Language Flow Models

arXiv:2606.27617v1 Announce Type: new Abstract: Masked Diffusion Models (MDMs) promise fast, parallel language generation, but their reverse transition factorises across token positions -- an approximation that breaks down in the few-step sampling regime where parallel…

14
arXiv — NLP / Computation & Language research 1d ago

Cross-Platform Chinese Offensive Comment Detection via Dual-Threshold Hard Example Mining

arXiv:2606.27629v1 Announce Type: new Abstract: Cross-platform deployment of offensive comment detection for Chinese social media suffers performance degradation. The paper proposes a dual-threshold hard mining method to address this. First, the clean-Chinese-base RoBERTa is…

16
arXiv — NLP / Computation & Language research 1d ago

Yuvion LLM: An Adversarially-Aware Large Language Model for Content And AI Safety

arXiv:2606.27632v1 Announce Type: new Abstract: As large language models are increasingly deployed in real-world systems, safety failures can still lead to harmful outputs and dangerous misuse. We argue that the essence of safety is adversarial: many failures arise not from…

29
arXiv — NLP / Computation & Language research 1d ago

When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search

arXiv:2606.27669v1 Announce Type: new Abstract: Search agents powered by large language models (LLMs) are increasingly used to solve complex information-seeking tasks, requiring multi-step retrieval and reasoning to fulfill user goals. However, existing benchmarks often assume…

27
arXiv — NLP / Computation & Language research 1d ago

From Signals to Transfer: A Factorised Study of Probe-Based Uncertainty Estimation in Large Language Models

arXiv:2606.27679v1 Announce Type: new Abstract: Probe-based uncertainty estimation (UE) has emerged as a prominent approach to detect hallucinations in Large Language Models (LLMs) by learning uncertainty from internal model signals. Yet, recent methods vary simultaneously…

22
arXiv — NLP / Computation & Language research 1d ago

Mitigating LLM-based p-Hacking by Preregistering for the Next LLM

arXiv:2606.27687v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to generate, classify, and annotate data whose outputs feed downstream hypothesis tests. However, LLM-based research is easy to p-hack: a researcher can tune the prompts, decoding…

32
arXiv — NLP / Computation & Language research 1d ago

Mitigating Position Bias in Transformers via Layer-Specific Positional Embedding Scaling

arXiv:2606.27705v1 Announce Type: new Abstract: Large Language Models (LLMs) still struggle with the ``lost-in-the-middle'' problem, where critical information located in the middle of long-context inputs is often underrepresented or lost. While existing methods attempt to…

4
arXiv — NLP / Computation & Language research 1d ago

Low-Agreeableness Persona Conditioning for Safe LLM Fine-Tuning

arXiv:2606.27709v1 Announce Type: new Abstract: Recent work has shown that fine-tuning large language models (LLMs) for social warmth degrades factual reliability and increases sycophancy. We investigate a related but distinct failure mode: warmth fine-tuning also weakens…

22
arXiv — NLP / Computation & Language research 1d ago

Do Speech Emphasis Models Generalize across Languages and Emotions?

arXiv:2606.27717v1 Announce Type: new Abstract: Prosodic emphasis varies across languages, emotions, and speaking styles, yet existing emphasis detection models are largely trained and evaluated on monolingual neutral read speech. We introduce MMEE (Multilingual Multi-Emotion…

12
arXiv — NLP / Computation & Language research 1d ago

Enhancing Numerical Prediction in LLMs via Smooth MMD Alignment

arXiv:2606.27731v1 Announce Type: new Abstract: Despite their strong general capabilities, large language models (LLMs) often remain unreliable when outputs must be numerically precise. A key reason is the training objective: standard cross-entropy treats numeric tokens as…

31
arXiv — NLP / Computation & Language research 1d ago

KG2Cypher: Data-Centric Pipeline for Building Enterprise Text-to-Cypher Systems

arXiv:2606.27742v1 Announce Type: new Abstract: Enterprise Knowledge Graphs (KGs) are increasingly used for internal search, analytics, and question answering, but building natural-language interfaces for private enterprise graphs remains costly. We present KG2Cypher, a…

14
arXiv — NLP / Computation & Language research 1d ago

Output-Space Allocation Costs for Calibration-Guided LLM Compression: An Empirical Study

arXiv:2606.27785v1 Announce Type: new Abstract: Training-free compression methods for large language models (LLMs) often use calibration data to guide compression decisions. ROCKET, a recent method combining sparse-dictionary factorization with multi-choice knapsack problem…

30
arXiv — NLP / Computation & Language research 1d ago

SHIFT: Gate-Modulated Activation Steering for Knowledge Conflict Mitigation in Retrieval-Augmented Generation

arXiv:2606.27786v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) enhances LLMs by incorporating external knowledge to support response generation. However, conflicts between retrieved context and parametric knowledge have emerged as a critical challenge in…

16
arXiv — NLP / Computation & Language research 1d ago

NLL-Guided Full-Attention Layer Selection for Training-Free Sliding-Window Adaptation

arXiv:2606.27791v1 Announce Type: new Abstract: Hybrid attention models that mix full and sliding-window attention across layers offer a promising approach to efficient long-context inference, but the critical question of \emph{which layers} should retain full attention remains…

19
arXiv — NLP / Computation & Language research 1d ago

Position Bias Correction is Insufficient for One-Pass Attention Sorting

arXiv:2606.27793v1 Announce Type: new Abstract: Long-context language models suffer from position bias, where information in middle positions is underutilized. Attention Sorting addresses this by iteratively reordering documents based on attention patterns, but its multiple…

9
arXiv — NLP / Computation & Language research 1d ago

Learning Complementary Action Modeling from Automotive Maintenance Instructions

arXiv:2606.27808v1 Announce Type: new Abstract: A minute lexical variation can reverse the procedural meaning of an instruction even when the rest of the sentence remains unchanged. In automotive maintenance instructions, this pattern often appears when an action phrase turns an…

8
arXiv — NLP / Computation & Language research 1d ago

A Study of Temporal Fusion Strategies for Named Entity Recognition in Historical Texts

arXiv:2606.27881v1 Announce Type: new Abstract: Temporal variation poses a unique challenge for named entity recognition (NER) in historical texts, where entities drift in surface form and salience across time. While language models (LMs) have made progress in various NLP tasks,…

22
arXiv — NLP / Computation & Language research 1d ago

Triadic Werewolf: A Jester Role for Multi-Hop Theory of Mind in LLMs

arXiv:2606.27909v1 Announce Type: new Abstract: Theory-of-mind evaluations of large language models typically use dyadic social-deduction games, where every observable cue points to a single hidden side, so a model with strong language priors can score well without ever…

15
arXiv — NLP / Computation & Language research 1d ago

VASAE: Naming SAE Dictionary Directions with Vocabulary-Aligned Anchoring

arXiv:2606.27941v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) provide useful decompositions of Transformer residual streams, but their learned features are usually named post hoc rather than directly connected to the Transformer's token vocabulary. We introduce…

35
arXiv — NLP / Computation & Language research 1d ago

An Empirical Analysis of Factual Errors in Human-Written Text and its Application

arXiv:2606.27959v1 Announce Type: new Abstract: Factual Error Detection (FED), which is the task of identifying factually incorrect spans in a given text, has long been recognized as an important research problem. However, with the rapid rise of large language models (LLMs),…

21
arXiv — NLP / Computation & Language research 1d ago

From Black-Box to Clinical Insight: A Multi-Stage Explainable Framework for Speech-Based Cognitive Impairment Detection

arXiv:2606.27973v1 Announce Type: new Abstract: Speech-based cognitive impairment detection offers a noninvasive, accessible alternative to costly biomarker assays, yet transformer-based models remain clinically uninterpretable. We propose a multi-stage explainability framework…

23
arXiv — NLP / Computation & Language research 1d ago

ToxiREX: A Dataset on Toxic REasoning in ConteXt

arXiv:2606.27981v1 Announce Type: new Abstract: We introduce a new, contextual, multilingual dataset called ToxiREX: Toxic REasoning in ConteXt. The dataset consists of threads of Reddit comments and structured characterizations of what the comments imply, following a systematic…

5
arXiv — NLP / Computation & Language research 1d ago

Dialogue to Detection: A Multimodal Hybrid NLP Pipeline for Insurance Fraud Detection

arXiv:2606.28002v1 Announce Type: new Abstract: Insurance fraud imposes substantial financial losses and operational inefficiencies, raising premiums and impacting trust among legitimate policyholders. Early detection at FNOL remains a persistent challenge. Existing approaches…

25
arXiv — NLP / Computation & Language research 1d ago

The Signal-Coverage Matrix: Stratifying Type and Semantic Errors in Statement Autoformalization

arXiv:2606.28013v1 Announce Type: new Abstract: Headline type-correctness (TC\%) of LLM autoformalization has climbed from $\sim$53\% to $\sim$76\% in two years, yet this scalar conceals which errors each method resolves. We propose a signal-coverage matrix that crosses the Lean…

23
arXiv — NLP / Computation & Language research 1d ago

A Tree-of-Thoughts Inspired Hybrid Approach for Legal Case Judgement Summarization using LLMs

arXiv:2606.28044v1 Announce Type: new Abstract: In recent times, Large Language Models (LLMs) are increasingly being used for legal case judgement summarization. Most prior works have tried traditional extractive and abstractive summarization of case judgements. However, hybrid…

34
arXiv — NLP / Computation & Language research 1d ago

Can LLMs Judge Better Than They Generate? Evaluating Task Asymmetry, Mechanistic Interpretability and Transferability for In-Context QA

arXiv:2606.28050v1 Announce Type: new Abstract: LLM-as-a-Judge and self-evaluation pipelines implicitly assume that evaluation is easier than generation. We test this in a controlled in-context QA setting where a context passage is the sole information source and each model…

29
arXiv — NLP / Computation & Language research 1d ago

MultiHashFormer: Hash-based Generative Language Models

arXiv:2606.28057v1 Announce Type: new Abstract: Language models (LMs) represent tokens using embedding matrices that scale linearly with the vocabulary size. To constrain the parameter footprint, prior work proposes hashing many tokens into a single vector within encoder-only…

15
arXiv — NLP / Computation & Language research 1d ago

Mechanism-Driven Monitors for Preemptive Detection of LLM Training Instability

arXiv:2606.28116v1 Announce Type: new Abstract: Frontier large language model training consumes massive accelerator fleets and long wall-clock computation, making stability failures costly when they occur. After a numerical or a hyperparameter fault has already destabilized the…

31
arXiv — NLP / Computation & Language research 1d ago

From Tokens to States: LLMs as a Special Case of World Models and the Continuous Path Beyond

arXiv:2606.28127v1 Announce Type: new Abstract: The AI community has framed the relationship between large language models (LLMs) and world models as a dichotomy: LLMs predict tokens; world models simulate reality. Yann LeCun argues in 2022 that reaching general intelligence…

25
arXiv — NLP / Computation & Language research 1d ago

Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction

arXiv:2606.28186v1 Announce Type: new Abstract: Predicting human item difficulty is central to educational assessment, where reliable estimates support fairness and effective test construction. Existing methods often depend on costly human calibration or item-level textual…

35
arXiv — NLP / Computation & Language research 1d ago

Vision-Default, Prior-Override: Causal Mechanisms of Perception-Knowledge Conflict in Vision-Language Models

arXiv:2606.28273v1 Announce Type: new Abstract: Vision-language models must reconcile visual evidence with memorized world knowledge when the two conflict. How they resolve this conflict shapes the reliability of multimodal systems, yet prior work characterizes it behaviorally…

31
arXiv — NLP / Computation & Language research 1d ago

CalBrief: A Pilot Diagnostic Benchmark for Evidence-Calibrated Scientific Briefing with Large Language Models

arXiv:2606.27383v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used as research assistants, yet it remains unclear whether they can calibrate research takeaways to the strength and scope of the supporting evidence. We study evidence-calibrated…

17
arXiv — NLP / Computation & Language research 1d ago

Recall Before Rerank: Benchmarking Deep Learning Models for Large-Scale Code-to-Code Retrieval

arXiv:2606.27401v1 Announce Type: cross Abstract: Semantic code search and clone detection are essential for software development, maintenance, and reuse. This paper evaluates the effectiveness, efficiency, and scalability of contemporary deep learning models for first-stage…

35
arXiv — NLP / Computation & Language research 1d ago

Delayed Verification Destabilizes Multi-Agent LLM Belief: Instability Thresholds and Optimal Corrector Placement

arXiv:2606.27409v1 Announce Type: cross Abstract: Multi-agent large language model (LLM) systems often rely on verifier and critic agents to suppress hallucinations, but verification is delayed. During this delay, false claims can propagate through the agent network. We model…

25
arXiv — NLP / Computation & Language research 1d ago

Cluster, Route, Escalate: Cascaded Framework for Cost-Aware LLM Serving

arXiv:2606.27457v1 Announce Type: cross Abstract: Efficient deployment of large language models (LLMs) in production forces a trade-off between accuracy and cost. Operators often default to a single model that is either expensive for easy queries or insufficient for hard ones.…

20
arXiv — NLP / Computation & Language research 1d ago

DMV-Bench: Diagnosing Long-Horizon Multimodal Agents' Visual Memory with Incidental Cue Injection

arXiv:2606.27499v1 Announce Type: cross Abstract: Research on agent memory has matured rapidly, but almost entirely on the text side: few existing benchmarks ask, in an interactive environment, when an agent genuinely needs to remember what it saw rather than what it could write…

11
arXiv — NLP / Computation & Language research 1d ago

Aloe-Vision: Robust Vision-Language Models for Healthcare

arXiv:2606.27500v1 Announce Type: cross Abstract: Large Vision-Language Models (LVLMs) specialized in healthcare are emerging as a promising research direction due to their potential impact in clinical and biomedical applications. However, progress is constrained by the scarcity…

28
arXiv — NLP / Computation & Language research 1d ago

The Curse of Multiple Mediators: Hidden Interaction Effects in Activation Patching

arXiv:2606.27510v1 Announce Type: cross Abstract: Activation patching is the primary tool in mechanistic interpretability. It attributes causal responsibility for a model behavior to each of its individual components by estimating its natural indirect effect (NIE). Re-deriving…

4
arXiv — NLP / Computation & Language research 1d ago

DysLexLens: A Low-Resource LLM Framework for Analysing Dyslexic Learners Insights from Online Forums

arXiv:2606.27619v1 Announce Type: cross Abstract: Dyslexic learners increasingly use artificial intelligence (AI) tools to support reading, writing, organisation, and study-related tasks. However, their lived experiences with these tools remain largely underexamined. This paper…

23
arXiv — NLP / Computation & Language research 1d ago

Textual Belief States for World Models: Identifiable Representation Learning Under Strict Mediation

arXiv:2606.27681v1 Announce Type: cross Abstract: World models in partially observed environments rely on latent representations that summarize interaction history, but in many modern LLM-based architectures predictive performance fails to reflect representation quality due to…

11

Formalizing Latent Thoughts: Four Axioms of Thought Representation in LLMs

Position: The Term "Machine Unlearning" Is Overused in LLMs

A Survey of Automated Presentation Coaching: Systems, Methods, and Open Challenges

Causal Connections: Leveraging Multilingual Fine-Tuning for Financial QA@FinCausal 2026

Developmental approach reveals the statistical learning of Neural Language Models: Transformers generalize from the most abstract statistical patterns

Supersede: Diagnosing and Training the Memory-Update Gap in LLM Agents

The Context-Ready Transformer

EntMTP: Accelerating LLM Inference with Entropy Guided Multi Token Prediction

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

Narrative-UFET: Narrative Generation for Ultra-Fine Entity Typing

Masked Language Flow Models

Cross-Platform Chinese Offensive Comment Detection via Dual-Threshold Hard Example Mining

Yuvion LLM: An Adversarially-Aware Large Language Model for Content And AI Safety

When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search

From Signals to Transfer: A Factorised Study of Probe-Based Uncertainty Estimation in Large Language Models

Mitigating LLM-based p-Hacking by Preregistering for the Next LLM

Mitigating Position Bias in Transformers via Layer-Specific Positional Embedding Scaling

Low-Agreeableness Persona Conditioning for Safe LLM Fine-Tuning

Do Speech Emphasis Models Generalize across Languages and Emotions?

Enhancing Numerical Prediction in LLMs via Smooth MMD Alignment

KG2Cypher: Data-Centric Pipeline for Building Enterprise Text-to-Cypher Systems

Output-Space Allocation Costs for Calibration-Guided LLM Compression: An Empirical Study

SHIFT: Gate-Modulated Activation Steering for Knowledge Conflict Mitigation in Retrieval-Augmented Generation

NLL-Guided Full-Attention Layer Selection for Training-Free Sliding-Window Adaptation

Position Bias Correction is Insufficient for One-Pass Attention Sorting

Learning Complementary Action Modeling from Automotive Maintenance Instructions

A Study of Temporal Fusion Strategies for Named Entity Recognition in Historical Texts

Triadic Werewolf: A Jester Role for Multi-Hop Theory of Mind in LLMs

VASAE: Naming SAE Dictionary Directions with Vocabulary-Aligned Anchoring

An Empirical Analysis of Factual Errors in Human-Written Text and its Application

From Black-Box to Clinical Insight: A Multi-Stage Explainable Framework for Speech-Based Cognitive Impairment Detection

ToxiREX: A Dataset on Toxic REasoning in ConteXt

Dialogue to Detection: A Multimodal Hybrid NLP Pipeline for Insurance Fraud Detection

The Signal-Coverage Matrix: Stratifying Type and Semantic Errors in Statement Autoformalization

A Tree-of-Thoughts Inspired Hybrid Approach for Legal Case Judgement Summarization using LLMs

Can LLMs Judge Better Than They Generate? Evaluating Task Asymmetry, Mechanistic Interpretability and Transferability for In-Context QA

MultiHashFormer: Hash-based Generative Language Models

Mechanism-Driven Monitors for Preemptive Detection of LLM Training Instability

From Tokens to States: LLMs as a Special Case of World Models and the Continuous Path Beyond

Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction

Vision-Default, Prior-Override: Causal Mechanisms of Perception-Knowledge Conflict in Vision-Language Models

CalBrief: A Pilot Diagnostic Benchmark for Evidence-Calibrated Scientific Briefing with Large Language Models

Recall Before Rerank: Benchmarking Deep Learning Models for Large-Scale Code-to-Code Retrieval

Delayed Verification Destabilizes Multi-Agent LLM Belief: Instability Thresholds and Optimal Corrector Placement

Cluster, Route, Escalate: Cascaded Framework for Cost-Aware LLM Serving

DMV-Bench: Diagnosing Long-Horizon Multimodal Agents' Visual Memory with Incidental Cue Injection

Aloe-Vision: Robust Vision-Language Models for Healthcare

The Curse of Multiple Mediators: Hidden Interaction Effects in Activation Patching

DysLexLens: A Low-Resource LLM Framework for Analysing Dyslexic Learners Insights from Online Forums

Textual Belief States for World Models: Identifiable Representation Learning Under Strict Mediation