arXiv — NLP / Computation & Language
500 articles archived · Visit source ↗ · RSS
-
arXiv — NLP / Computation & Language research 1d ago
Formalizing Latent Thoughts: Four Axioms of Thought Representation in LLMs
arXiv:2606.27378v1 Announce Type: new Abstract: We introduce an axiomatic evaluation framework for latent thought representations in LLMs, comprising metrics that are independent of downstream benchmark scores and reveal representational failures that benchmark accuracy masks.…
29 -
arXiv — NLP / Computation & Language research 1d ago
Position: The Term "Machine Unlearning" Is Overused in LLMs
arXiv:2606.27379v1 Announce Type: new Abstract: Large language models increasingly face demands to "forget" training data, knowledge, or behaviors due to regulatory deletion obligations, copyright/licensing disputes, and safety or product-policy requirements. This position paper…
15 -
arXiv — NLP / Computation & Language research 1d ago
A Survey of Automated Presentation Coaching: Systems, Methods, and Open Challenges
arXiv:2606.27380v1 Announce Type: new Abstract: Automated coaching for oral presentations sits at the intersection of computer-assisted pronunciation training (CAPT), prosody modeling, and speech synthesis, yet no prior work has systematically surveyed and compared existing…
6 -
arXiv — NLP / Computation & Language research 1d ago
Causal Connections: Leveraging Multilingual Fine-Tuning for Financial QA@FinCausal 2026
arXiv:2606.27446v1 Announce Type: new Abstract: This paper describes team HSA_CORAL's submission to the FinCausal 2026 shared task on extracting cause-effect relations from financial narratives via extractive question answering in English and Spanish. We compare three modeling…
4 -
arXiv — NLP / Computation & Language research 1d ago
Developmental approach reveals the statistical learning of Neural Language Models: Transformers generalize from the most abstract statistical patterns
arXiv:2606.27460v1 Announce Type: new Abstract: In this study, we use a developmental approach to investigate the statistical learning and mental representation of neural language models (NLM). A series of Generative Transformer models are trained on a synthetic grammar. The…
4 -
arXiv — NLP / Computation & Language research 1d ago
Supersede: Diagnosing and Training the Memory-Update Gap in LLM Agents
arXiv:2606.27472v1 Announce Type: new Abstract: Large language model (LLM) agents operate over long, multi-session interactions in which facts change: a user moves, a price updates, a plan is revised. Acting correctly requires using the current value of a fact and discarding…
16 -
arXiv — NLP / Computation & Language research 1d ago
The Context-Ready Transformer
arXiv:2606.27538v1 Announce Type: new Abstract: We introduce the context-ready transformer, a new recurrent neural network architecture built from a D-layer transformer block that pre-contextualizes each token before it enters the block. During left-to-right generation, a…
26 -
arXiv — NLP / Computation & Language research 1d ago
EntMTP: Accelerating LLM Inference with Entropy Guided Multi Token Prediction
arXiv:2606.27550v1 Announce Type: new Abstract: Multi-token prediction has been shown to increase data density during training, improve downstream text-generation quality, and serves as the defacto approach for self-speculative decoding. Existing foundation and open source…
29 -
arXiv — NLP / Computation & Language research 1d ago
Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents
arXiv:2606.27595v1 Announce Type: new Abstract: Web-agent benchmarks overwhelmingly measure depth -- pinning one obscure answer behind a chain of constraints -- while breadth, exhaustively enumerating a closed set and filling each item's attributes, is barely evaluated,…
32 -
arXiv — NLP / Computation & Language research 1d ago
Narrative-UFET: Narrative Generation for Ultra-Fine Entity Typing
arXiv:2606.27598v1 Announce Type: new Abstract: Ultra-fine entity typing (UFET) assigns highly specific types to entity mentions, but current approaches struggle with types in the long tail. We hypothesize that a key limitation is the reliance on sentence-level context, since…
14 -
arXiv — NLP / Computation & Language research 1d ago
Masked Language Flow Models
arXiv:2606.27617v1 Announce Type: new Abstract: Masked Diffusion Models (MDMs) promise fast, parallel language generation, but their reverse transition factorises across token positions -- an approximation that breaks down in the few-step sampling regime where parallel…
14 -
arXiv — NLP / Computation & Language research 1d ago
Cross-Platform Chinese Offensive Comment Detection via Dual-Threshold Hard Example Mining
arXiv:2606.27629v1 Announce Type: new Abstract: Cross-platform deployment of offensive comment detection for Chinese social media suffers performance degradation. The paper proposes a dual-threshold hard mining method to address this. First, the clean-Chinese-base RoBERTa is…
16 -
arXiv — NLP / Computation & Language research 1d ago
Yuvion LLM: An Adversarially-Aware Large Language Model for Content And AI Safety
arXiv:2606.27632v1 Announce Type: new Abstract: As large language models are increasingly deployed in real-world systems, safety failures can still lead to harmful outputs and dangerous misuse. We argue that the essence of safety is adversarial: many failures arise not from…
29 -
arXiv — NLP / Computation & Language research 1d ago
When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search
arXiv:2606.27669v1 Announce Type: new Abstract: Search agents powered by large language models (LLMs) are increasingly used to solve complex information-seeking tasks, requiring multi-step retrieval and reasoning to fulfill user goals. However, existing benchmarks often assume…
27 -
arXiv — NLP / Computation & Language research 1d ago
From Signals to Transfer: A Factorised Study of Probe-Based Uncertainty Estimation in Large Language Models
arXiv:2606.27679v1 Announce Type: new Abstract: Probe-based uncertainty estimation (UE) has emerged as a prominent approach to detect hallucinations in Large Language Models (LLMs) by learning uncertainty from internal model signals. Yet, recent methods vary simultaneously…
22 -
arXiv — NLP / Computation & Language research 1d ago
Mitigating LLM-based p-Hacking by Preregistering for the Next LLM
arXiv:2606.27687v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to generate, classify, and annotate data whose outputs feed downstream hypothesis tests. However, LLM-based research is easy to p-hack: a researcher can tune the prompts, decoding…
32 -
arXiv — NLP / Computation & Language research 1d ago
Mitigating Position Bias in Transformers via Layer-Specific Positional Embedding Scaling
arXiv:2606.27705v1 Announce Type: new Abstract: Large Language Models (LLMs) still struggle with the ``lost-in-the-middle'' problem, where critical information located in the middle of long-context inputs is often underrepresented or lost. While existing methods attempt to…
4 -
arXiv — NLP / Computation & Language research 1d ago
Low-Agreeableness Persona Conditioning for Safe LLM Fine-Tuning
arXiv:2606.27709v1 Announce Type: new Abstract: Recent work has shown that fine-tuning large language models (LLMs) for social warmth degrades factual reliability and increases sycophancy. We investigate a related but distinct failure mode: warmth fine-tuning also weakens…
22 -
arXiv — NLP / Computation & Language research 1d ago
Do Speech Emphasis Models Generalize across Languages and Emotions?
arXiv:2606.27717v1 Announce Type: new Abstract: Prosodic emphasis varies across languages, emotions, and speaking styles, yet existing emphasis detection models are largely trained and evaluated on monolingual neutral read speech. We introduce MMEE (Multilingual Multi-Emotion…
12 -
arXiv — NLP / Computation & Language research 1d ago
Enhancing Numerical Prediction in LLMs via Smooth MMD Alignment
arXiv:2606.27731v1 Announce Type: new Abstract: Despite their strong general capabilities, large language models (LLMs) often remain unreliable when outputs must be numerically precise. A key reason is the training objective: standard cross-entropy treats numeric tokens as…
31 -
arXiv — NLP / Computation & Language research 1d ago
KG2Cypher: Data-Centric Pipeline for Building Enterprise Text-to-Cypher Systems
arXiv:2606.27742v1 Announce Type: new Abstract: Enterprise Knowledge Graphs (KGs) are increasingly used for internal search, analytics, and question answering, but building natural-language interfaces for private enterprise graphs remains costly. We present KG2Cypher, a…
14 -
arXiv — NLP / Computation & Language research 1d ago
Output-Space Allocation Costs for Calibration-Guided LLM Compression: An Empirical Study
arXiv:2606.27785v1 Announce Type: new Abstract: Training-free compression methods for large language models (LLMs) often use calibration data to guide compression decisions. ROCKET, a recent method combining sparse-dictionary factorization with multi-choice knapsack problem…
30 -
arXiv — NLP / Computation & Language research 1d ago
SHIFT: Gate-Modulated Activation Steering for Knowledge Conflict Mitigation in Retrieval-Augmented Generation
arXiv:2606.27786v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) enhances LLMs by incorporating external knowledge to support response generation. However, conflicts between retrieved context and parametric knowledge have emerged as a critical challenge in…
16 -
arXiv — NLP / Computation & Language research 1d ago
NLL-Guided Full-Attention Layer Selection for Training-Free Sliding-Window Adaptation
arXiv:2606.27791v1 Announce Type: new Abstract: Hybrid attention models that mix full and sliding-window attention across layers offer a promising approach to efficient long-context inference, but the critical question of \emph{which layers} should retain full attention remains…
19 -
arXiv — NLP / Computation & Language research 1d ago
Position Bias Correction is Insufficient for One-Pass Attention Sorting
arXiv:2606.27793v1 Announce Type: new Abstract: Long-context language models suffer from position bias, where information in middle positions is underutilized. Attention Sorting addresses this by iteratively reordering documents based on attention patterns, but its multiple…
9 -
arXiv — NLP / Computation & Language research 1d ago
Learning Complementary Action Modeling from Automotive Maintenance Instructions
arXiv:2606.27808v1 Announce Type: new Abstract: A minute lexical variation can reverse the procedural meaning of an instruction even when the rest of the sentence remains unchanged. In automotive maintenance instructions, this pattern often appears when an action phrase turns an…
8 -
arXiv — NLP / Computation & Language research 1d ago
A Study of Temporal Fusion Strategies for Named Entity Recognition in Historical Texts
arXiv:2606.27881v1 Announce Type: new Abstract: Temporal variation poses a unique challenge for named entity recognition (NER) in historical texts, where entities drift in surface form and salience across time. While language models (LMs) have made progress in various NLP tasks,…
22 -
arXiv — NLP / Computation & Language research 1d ago
Triadic Werewolf: A Jester Role for Multi-Hop Theory of Mind in LLMs
arXiv:2606.27909v1 Announce Type: new Abstract: Theory-of-mind evaluations of large language models typically use dyadic social-deduction games, where every observable cue points to a single hidden side, so a model with strong language priors can score well without ever…
15 -
arXiv — NLP / Computation & Language research 1d ago
VASAE: Naming SAE Dictionary Directions with Vocabulary-Aligned Anchoring
arXiv:2606.27941v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) provide useful decompositions of Transformer residual streams, but their learned features are usually named post hoc rather than directly connected to the Transformer's token vocabulary. We introduce…
35 -
arXiv — NLP / Computation & Language research 1d ago
An Empirical Analysis of Factual Errors in Human-Written Text and its Application
arXiv:2606.27959v1 Announce Type: new Abstract: Factual Error Detection (FED), which is the task of identifying factually incorrect spans in a given text, has long been recognized as an important research problem. However, with the rapid rise of large language models (LLMs),…
21 -
arXiv — NLP / Computation & Language research 1d ago
From Black-Box to Clinical Insight: A Multi-Stage Explainable Framework for Speech-Based Cognitive Impairment Detection
arXiv:2606.27973v1 Announce Type: new Abstract: Speech-based cognitive impairment detection offers a noninvasive, accessible alternative to costly biomarker assays, yet transformer-based models remain clinically uninterpretable. We propose a multi-stage explainability framework…
23 -
arXiv — NLP / Computation & Language research 1d ago
ToxiREX: A Dataset on Toxic REasoning in ConteXt
arXiv:2606.27981v1 Announce Type: new Abstract: We introduce a new, contextual, multilingual dataset called ToxiREX: Toxic REasoning in ConteXt. The dataset consists of threads of Reddit comments and structured characterizations of what the comments imply, following a systematic…
5 -
arXiv — NLP / Computation & Language research 1d ago
Dialogue to Detection: A Multimodal Hybrid NLP Pipeline for Insurance Fraud Detection
arXiv:2606.28002v1 Announce Type: new Abstract: Insurance fraud imposes substantial financial losses and operational inefficiencies, raising premiums and impacting trust among legitimate policyholders. Early detection at FNOL remains a persistent challenge. Existing approaches…
25 -
arXiv — NLP / Computation & Language research 1d ago
The Signal-Coverage Matrix: Stratifying Type and Semantic Errors in Statement Autoformalization
arXiv:2606.28013v1 Announce Type: new Abstract: Headline type-correctness (TC\%) of LLM autoformalization has climbed from $\sim$53\% to $\sim$76\% in two years, yet this scalar conceals which errors each method resolves. We propose a signal-coverage matrix that crosses the Lean…
23 -
arXiv — NLP / Computation & Language research 1d ago
A Tree-of-Thoughts Inspired Hybrid Approach for Legal Case Judgement Summarization using LLMs
arXiv:2606.28044v1 Announce Type: new Abstract: In recent times, Large Language Models (LLMs) are increasingly being used for legal case judgement summarization. Most prior works have tried traditional extractive and abstractive summarization of case judgements. However, hybrid…
34 -
arXiv — NLP / Computation & Language research 1d ago
Can LLMs Judge Better Than They Generate? Evaluating Task Asymmetry, Mechanistic Interpretability and Transferability for In-Context QA
arXiv:2606.28050v1 Announce Type: new Abstract: LLM-as-a-Judge and self-evaluation pipelines implicitly assume that evaluation is easier than generation. We test this in a controlled in-context QA setting where a context passage is the sole information source and each model…
29 -
arXiv — NLP / Computation & Language research 1d ago
MultiHashFormer: Hash-based Generative Language Models
arXiv:2606.28057v1 Announce Type: new Abstract: Language models (LMs) represent tokens using embedding matrices that scale linearly with the vocabulary size. To constrain the parameter footprint, prior work proposes hashing many tokens into a single vector within encoder-only…
15 -
arXiv — NLP / Computation & Language research 1d ago
Mechanism-Driven Monitors for Preemptive Detection of LLM Training Instability
arXiv:2606.28116v1 Announce Type: new Abstract: Frontier large language model training consumes massive accelerator fleets and long wall-clock computation, making stability failures costly when they occur. After a numerical or a hyperparameter fault has already destabilized the…
31 -
arXiv — NLP / Computation & Language research 1d ago
From Tokens to States: LLMs as a Special Case of World Models and the Continuous Path Beyond
arXiv:2606.28127v1 Announce Type: new Abstract: The AI community has framed the relationship between large language models (LLMs) and world models as a dichotomy: LLMs predict tokens; world models simulate reality. Yann LeCun argues in 2022 that reaching general intelligence…
25 -
arXiv — NLP / Computation & Language research 1d ago
Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction
arXiv:2606.28186v1 Announce Type: new Abstract: Predicting human item difficulty is central to educational assessment, where reliable estimates support fairness and effective test construction. Existing methods often depend on costly human calibration or item-level textual…
35 -
arXiv — NLP / Computation & Language research 1d ago
Vision-Default, Prior-Override: Causal Mechanisms of Perception-Knowledge Conflict in Vision-Language Models
arXiv:2606.28273v1 Announce Type: new Abstract: Vision-language models must reconcile visual evidence with memorized world knowledge when the two conflict. How they resolve this conflict shapes the reliability of multimodal systems, yet prior work characterizes it behaviorally…
31 -
arXiv — NLP / Computation & Language research 1d ago
CalBrief: A Pilot Diagnostic Benchmark for Evidence-Calibrated Scientific Briefing with Large Language Models
arXiv:2606.27383v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used as research assistants, yet it remains unclear whether they can calibrate research takeaways to the strength and scope of the supporting evidence. We study evidence-calibrated…
17 -
arXiv — NLP / Computation & Language research 1d ago
Recall Before Rerank: Benchmarking Deep Learning Models for Large-Scale Code-to-Code Retrieval
arXiv:2606.27401v1 Announce Type: cross Abstract: Semantic code search and clone detection are essential for software development, maintenance, and reuse. This paper evaluates the effectiveness, efficiency, and scalability of contemporary deep learning models for first-stage…
35 -
arXiv — NLP / Computation & Language research 1d ago
Delayed Verification Destabilizes Multi-Agent LLM Belief: Instability Thresholds and Optimal Corrector Placement
arXiv:2606.27409v1 Announce Type: cross Abstract: Multi-agent large language model (LLM) systems often rely on verifier and critic agents to suppress hallucinations, but verification is delayed. During this delay, false claims can propagate through the agent network. We model…
25 -
arXiv — NLP / Computation & Language research 1d ago
Cluster, Route, Escalate: Cascaded Framework for Cost-Aware LLM Serving
arXiv:2606.27457v1 Announce Type: cross Abstract: Efficient deployment of large language models (LLMs) in production forces a trade-off between accuracy and cost. Operators often default to a single model that is either expensive for easy queries or insufficient for hard ones.…
20 -
arXiv — NLP / Computation & Language research 1d ago
DMV-Bench: Diagnosing Long-Horizon Multimodal Agents' Visual Memory with Incidental Cue Injection
arXiv:2606.27499v1 Announce Type: cross Abstract: Research on agent memory has matured rapidly, but almost entirely on the text side: few existing benchmarks ask, in an interactive environment, when an agent genuinely needs to remember what it saw rather than what it could write…
11 -
arXiv — NLP / Computation & Language research 1d ago
Aloe-Vision: Robust Vision-Language Models for Healthcare
arXiv:2606.27500v1 Announce Type: cross Abstract: Large Vision-Language Models (LVLMs) specialized in healthcare are emerging as a promising research direction due to their potential impact in clinical and biomedical applications. However, progress is constrained by the scarcity…
28 -
arXiv — NLP / Computation & Language research 1d ago
The Curse of Multiple Mediators: Hidden Interaction Effects in Activation Patching
arXiv:2606.27510v1 Announce Type: cross Abstract: Activation patching is the primary tool in mechanistic interpretability. It attributes causal responsibility for a model behavior to each of its individual components by estimating its natural indirect effect (NIE). Re-deriving…
4 -
arXiv — NLP / Computation & Language research 1d ago
DysLexLens: A Low-Resource LLM Framework for Analysing Dyslexic Learners Insights from Online Forums
arXiv:2606.27619v1 Announce Type: cross Abstract: Dyslexic learners increasingly use artificial intelligence (AI) tools to support reading, writing, organisation, and study-related tasks. However, their lived experiences with these tools remain largely underexamined. This paper…
23 -
arXiv — NLP / Computation & Language research 1d ago
Textual Belief States for World Models: Identifiable Representation Learning Under Strict Mediation
arXiv:2606.27681v1 Announce Type: cross Abstract: World models in partially observed environments rely on latent representations that summarize interaction history, but in many modern LLM-based architectures predictive performance fails to reflect representation quality due to…
11