arXiv — NLP / Computation & Language

500 articles archived · Visit source ↗ · RSS

arXiv — NLP / Computation & Language research 4d ago

Mapping Political-Elite Networks in Europe with a Multilingual Joint Entity-Relation Extraction Pipeline

arXiv:2606.27347v1 Announce Type: new Abstract: Whether political elites organise into rent-seeking coalitions that capture public resources or civic networks that sustain governance is a central question in comparative politics. Yet observing these complex, informal, and…

11
arXiv — NLP / Computation & Language research 4d ago

Neural Speaker Diarization via Multilingual Training: Evaluation on Low-Resource Nepali-Hindi Speech

arXiv:2606.26144v1 Announce Type: cross Abstract: Speaker diarization, the task of determining "who spoke when" in a multi-speaker recording, is a critical component in applications such as meeting transcription, accessibility tools, and multilingual information retrieval. While…

36
arXiv — NLP / Computation & Language research 4d ago

From Clicks to Intent: Cross-Platform Session Embeddings with LLM-Distilled Taxonomy for Financial Services Recommendations

arXiv:2606.26277v1 Announce Type: cross Abstract: Sequential user behavior modeling is widely adopted in industrial recommender systems; however, significant gaps remain in financial services, where pre-login web interactions and authenticated in-app experiences differ…

24
arXiv — NLP / Computation & Language research 4d ago

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

arXiv:2606.26300v1 Announce Type: cross Abstract: A classical intuition holds that verifying a solution is easier than producing one. For today's coding agents, this intuition is being inverted: as foundation models develop stronger reasoning capabilities and engineering…

24
arXiv — NLP / Computation & Language research 4d ago

Axon: A Synthesizing Superoptimizer for Tensor Programs

arXiv:2606.26344v1 Announce Type: cross Abstract: Writing high performance kernels for AI accelerators requires deep expertise in tiling, instruction selection, data layout, and operator fusion placing a significant burden on programmers. In this paper, we focus on tile based AI…

33
arXiv — NLP / Computation & Language research 4d ago

Narration-of-Thought: Inference-Time Scaffolding for Defeasible Ethical Reasoning in Large Language Models

arXiv:2606.26366v1 Announce Type: cross Abstract: Standard chain-of-thought on moral dilemmas exhibits two failure modes: stakeholder collapse (the trace names at most one party with a stake in the outcome) and uncertainty suppression (no explicit unknowns or hedges before…

29
arXiv — NLP / Computation & Language research 4d ago

Staying VIGILant: Mitigating Visual Laziness via Counterfactual Visual Alignment in MLLMs

arXiv:2606.26387v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) extend large language models (LLMs) with visual perception, enabling joint reasoning over images and text. Despite inheriting strong reasoning capabilities from LLMs, they remain prone to…

19
arXiv — NLP / Computation & Language research 4d ago

DualEval: Joint Model-Item Calibration for Unified LLM Evaluation

arXiv:2606.26429v1 Announce Type: cross Abstract: Current LLM evaluation relies on two complementary but often disconnected signals: static benchmarks with objective correctness labels and arena-style preference data that better reflect open-ended user interactions. We introduce…

24
arXiv — NLP / Computation & Language research 4d ago

Epiphany-Aware KV Cache Eviction Without the Attention Matrix

arXiv:2606.26472v1 Announce Type: cross Abstract: As reasoning models emit chains of thought tens of thousands of tokens long, KV cache increasingly becomes a deployment bottleneck. Existing cache eviction methods rank tokens by attention weight, which is a noisy importance…

21
arXiv — NLP / Computation & Language research 4d ago

Adaptive Evaluation of Out-of-Band Defenses Against Prompt Injection in LLM Agents

arXiv:2606.26479v1 Announce Type: cross Abstract: Recent work (2024 to 2026) has converged on a strategy for defending tool-using LLM agents against indirect prompt injection: rather than training the model to refuse malicious instructions, enforce security outside the model…

38
arXiv — NLP / Computation & Language research 4d ago

Humans Disengage, Reasoning Models Persist: Separating Difficulty Registration from Deliberation Allocation

arXiv:2606.26502v1 Announce Type: cross Abstract: Large reasoning models (LRMs) take longer on harder problems, just as humans do. This surface similarity hides an opposite pattern within items. When an LRM gets a problem wrong, it spends more tokens than when it gets the same…

29
arXiv — NLP / Computation & Language research 4d ago

Compiler-Driven Approximation Tuning for Hyperdimensional Computing

arXiv:2606.26547v1 Announce Type: cross Abstract: As Moore's law reaches its physical and economic limits, domain-specific approaches are increasingly employed to accelerate machine learning workloads. Hyperdimensional Computing (HDC) represents one such emerging paradigm,…

6
arXiv — NLP / Computation & Language research 4d ago

Adversarial Diffusion Across Modalities: A Fusion Survey of Attacks, Defenses, and Evaluation for Text, Vision, and Vision-Language Models

arXiv:2606.26566v1 Announce Type: cross Abstract: Adversarial evaluation of AI systems has matured along four largely disconnected tracks: diffusion-based attacks on text and large language models (LLMs), diffusion-based attacks on image classifiers, jailbreak pipelines against…

18
arXiv — NLP / Computation & Language research 4d ago

From Weights to Features: SAE-Guided Activation Regularization for LLM Continual Learning

arXiv:2606.26629v1 Announce Type: cross Abstract: Weight-space regularization methods such as Elastic Weight Consolidation (EWC) are the standard approach to catastrophic forgetting in continual learning. However, those methods tend to underperform when applied to large language…

15
arXiv — NLP / Computation & Language research 4d ago

Do Safety Guardrails Need to Reason? LeanGuard: A Fast and Light Approach for Robust Moderation

arXiv:2606.26686v1 Announce Type: cross Abstract: In order to screen a prompt or a response, the recent guardrail methods generate a chain-of-thought (CoT) before they issue a verdict. This design follows a common belief that step-by-step reasoning improves a decision. However,…

17
arXiv — NLP / Computation & Language research 4d ago

HyperDFlash: MHC-Aligned Block Speculative Decoding with Gated Residual Reduction

arXiv:2606.26744v1 Announce Type: cross Abstract: We present HyperDFlash, a block-parallel speculative decoding framework tailored to the novel multi-hyper-connection (MHC) architecture proposed by DeepSeek-V4. Despite the strong initial-token drafting performance of the native…

10
arXiv — NLP / Computation & Language research 4d ago

Structure Before Collapse: Transient semantic geometry in next-token prediction

arXiv:2606.26749v1 Announce Type: cross Abstract: Neural Collapse predicts that balanced one-hot classification pushes model representations to be equally far from each other; a symmetric configuration that depends only on the output label and ignores any semantic similarity in…

29
arXiv — NLP / Computation & Language research 4d ago

Reproducibility Study of "AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models"

arXiv:2606.26783v1 Announce Type: cross Abstract: Fang et al. (2025) introduced a null-space constrained projection, named AlphaEdit, for locate-then-edit knowledge editing methods, theoretically guaranteeing that edits do not disrupt previously preserved knowledge, and reports…

20
arXiv — NLP / Computation & Language research 4d ago

AIGP: An LLM-Based Framework for Long-Term Value Alignment in E-Commerce Pricing

arXiv:2606.26787v1 Announce Type: cross Abstract: Traditional dynamic pricing models in large-scale e-commerce suffer from limited interpretability, poor utilization of unstructured information, and misalignment with long-term business objectives such as cumulative Gross…

26
arXiv — NLP / Computation & Language research 4d ago

KARLA: Knowledge-base Augmented Retrieval for Language Models

arXiv:2606.26807v1 Announce Type: cross Abstract: We propose a new method that allows an LLM to automatically pull in factual knowledge from a knowledge base during token generation. This means that (1)~factual knowledge in the LLM output can be updated without retraining the…

12
arXiv — NLP / Computation & Language research 4d ago

AgentX: Towards Agent-Driven Self-Iteration of Industrial Recommender Systems

arXiv:2606.26859v1 Announce Type: cross Abstract: Recommendation algorithm iteration is moving from an artisanal, engineer-bound process toward an industrialized research loop, but this transition remains blocked by a structural execution bottleneck: the idea-to-launch cycle…

10
arXiv — NLP / Computation & Language research 4d ago

Jailbreaking for the Average Jane: Choosing Optimal Jailbreaks via Bandit Algorithms for Automatically Enhanced Queries

arXiv:2606.26936v1 Announce Type: cross Abstract: With a profusion of jailbreaks for LLMs now widely known, a growing concern is that non-expert malicious actors ("the average Jane") could elicit actionable responses to malicious requests. In this work, we examine whether this…

36
arXiv — NLP / Computation & Language research 4d ago

Einstein World Models

arXiv:2606.26969v1 Announce Type: cross Abstract: Does intelligence require the ability to reason about phenomena beyond direct experience? It is natural to suspect that some complex thought cannot be captured through language alone. However, of particular concern to this work,…

28
arXiv — NLP / Computation & Language research 4d ago

Just how sure are you? Improving Verbalized Uncertainty Calibration in Medical VQA

arXiv:2606.27023v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) applied to Medical Visual Question Answering (VQA) tend to produce overconfident outputs regardless of actual correctness, and existing verbalized confidence calibration methods, developed…

15
arXiv — NLP / Computation & Language research 4d ago

HarmVideoBench: Benchmarking Harmful Video Understanding in Large Multimodal Models

arXiv:2606.27187v1 Announce Type: cross Abstract: Large vision-language models (LVLMs) have recently shown immense potential in automated content moderation, sparking growing interest in developing harmful-video benchmarks. However, we identify two primary limitations in…

25
arXiv — NLP / Computation & Language research 4d ago

Ask, Don't Judge: Binary Questions for Interpretable LLM Evaluation and Self-Improvement

arXiv:2606.27226v1 Announce Type: cross Abstract: Evaluating LLM outputs remains a major bottleneck in NLP: human evaluation is expensive and slow, lexical metrics correlate poorly with human judgments on open-ended generation, and holistic LLM judges often produce opaque scores…

14
arXiv — NLP / Computation & Language research 4d ago

The Geometry of Updates: Fisher Alignment at Vocabulary Scale

arXiv:2606.27242v1 Announce Type: cross Abstract: Training-free source selection for LLM families with shared vocabularies arises in scientific string domains such as SMILES, protein, and genomic sequences, where candidate corpora share a tokenizer but differ in prediction…

38
arXiv — NLP / Computation & Language research 4d ago

DanceOPD: On-Policy Generative Field Distillation

arXiv:2606.27377v1 Announce Type: cross Abstract: Modern image generation demands a single model that unifies diverse capabilities, including text-to-image (T2I), local editing, and global editing. However, these capabilities are rarely naturally aligned and often conflict. For…

18
arXiv — NLP / Computation & Language research 4d ago

Tuning Language Models by Mixture-of-Depths Ensemble

arXiv:2410.13077v2 Announce Type: replace Abstract: Transformer-based Large Language Models (LLMs) traditionally rely on final-layer loss for finetuning and final-layer representations for predictions, potentially overlooking the predictive power embedded in late layers.…

35
arXiv — NLP / Computation & Language research 4d ago

A Systematic Survey of Semantic Role Labeling in the Era of Pretrained Language Models

arXiv:2502.08660v4 Announce Type: replace Abstract: Semantic role labeling (SRL) is a central natural language processing task for understanding predicate-argument structures within texts and enabling downstream applications. Despite extensive research, comprehensive surveys…

14
arXiv — NLP / Computation & Language research 4d ago

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

arXiv:2506.15681v4 Announce Type: replace Abstract: Recent advancements in vision-language models (VLMs) have leveraged large language models (LLMs) to achieve performance on par with closed-source systems like GPT-4V. However, deploying these models in real-world scenarios,…

16
arXiv — NLP / Computation & Language research 4d ago

Somatic in the East, Psychological in the West?: Investigating Clinically-Grounded Cross-Cultural Depression Symptom Expression in LLMs

arXiv:2508.03247v2 Announce Type: replace Abstract: Prior clinical psychology research shows that Western individuals with depression tend to report psychological symptoms, while Eastern individuals report somatic ones. We test whether Large Language Models (LLMs), which are…

5
arXiv — NLP / Computation & Language research 4d ago

Vis-CoT: A Human-in-the-Loop Framework for Interactive Visualization and Intervention in LLM Chain-of-Thought Reasoning

arXiv:2509.01412v3 Announce Type: replace Abstract: Large language models (LLMs) show strong reasoning via chain-of-thought (CoT) prompting, but the process is opaque, which makes verification, debugging, and control difficult in high-stakes settings. We present Vis-CoT, a…

37
arXiv — NLP / Computation & Language research 5d ago

Graph-Based Phonetic Error Correction of Noisy ASR

arXiv:2606.24889v1 Announce Type: new Abstract: Automatic speech recognition (ASR) systems, despite low overall word error rates, produce residual lexical errors that disproportionately affect semantically critical tokens such as named entities, negations, and sentiment-bearing…

37
arXiv — NLP / Computation & Language research 5d ago

Small edits, large models: How Wikipedia advocacy shapes LLM values

arXiv:2606.24890v1 Announce Type: new Abstract: Can a small group of volunteers shape how AI systems discuss animal welfare, just by editing Wikipedia? We show that they can. Wikipedia appears in nearly every major language model training dataset and is weighted more heavily…

14
arXiv — NLP / Computation & Language research 5d ago

AgentOdyssey: Open-Ended Long-Horizon Text Game Generation for Test-Time Continual Learning Agents

arXiv:2606.24893v1 Announce Type: new Abstract: For agents to learn continuously from interaction with the world at test time, they must be able to explore effectively, acquire new world knowledge and skills, retain relevant episodic experiences, and plan over long horizons. To…

17
arXiv — NLP / Computation & Language research 5d ago

Error-Aware TF-IDF Retrieval-Augmented Generation for ASR Error Correction

arXiv:2606.24915v1 Announce Type: new Abstract: End-to-end automatic speech recognition systems frequently hallucinate rare entities and domain-specific terms, especially in low-resource languages. While retrieval-augmented generation frameworks can mitigate these errors using…

18
arXiv — NLP / Computation & Language research 5d ago

Perfect Detection, Failed Control: The Geometry of Knowing vs. Steering in Language Models

arXiv:2606.24952v1 Announce Type: new Abstract: A central aspiration of mechanistic interpretability is controllability: if we know where a behavior is represented in a model's activations, we should be able to modify it. This rests on a hidden premise -- that the direction…

26
arXiv — NLP / Computation & Language research 5d ago

Dustin: Draft-Augmented Sparse Verification for Efficient Long-Context Generation with Speculative Decoding

arXiv:2606.24957v1 Announce Type: new Abstract: While speculative decoding improves inference throughput for multi-batch long-context Large Language Models (LLMs), its efficiency is often limited by a verification bottleneck where Key-Value (KV) cache loading dominates latency.…

19
arXiv — NLP / Computation & Language research 5d ago

LLM Performance on a Real, Double-Marked GCSE Benchmark

arXiv:2606.24973v1 Announce Type: new Abstract: We introduce a dataset of 32,534 double-marked real student responses to GCSE mock exams (GCSEs are the UK's national exams, taken at age ~16), spanning 328 questions across five subjects and including handwritten work. We test…

26
arXiv — NLP / Computation & Language research 5d ago

LLM-Based Scientific Peer Review: Methods, Benchmarks, and Reliability Challenges

arXiv:2606.25057v1 Announce Type: new Abstract: The rapid growth of scientific submissions has pushed traditional peer review toward its scalability limits, motivating the exploration of large language models (LLMs) as intelligent automated evaluation assistants. Although recent…

11
arXiv — NLP / Computation & Language research 5d ago

Dream at SemEval-2026 Task 13: SALSA for Single-Pass Machine-Generated Code Detection

arXiv:2606.25102v1 Announce Type: new Abstract: Large language models have transformed code generation, raising concerns around authorship, assessment integrity, and software trust. SemEval-2026 Task 13 Subtask A operationalizes detection as binary classification over code…

28
arXiv — NLP / Computation & Language research 5d ago

The cognitive, affective, and behavioral expression of self-stigma among people who use drugs in online substance use communities

arXiv:2606.25143v1 Announce Type: new Abstract: Objectives: To develop a codebook for self-stigma across cognitive, affective, and behavioral domains, and to estimate the prevalence, co-occurrence, and temporal patterns of these indicators in Reddit posts by people who use…

12
arXiv — NLP / Computation & Language research 5d ago

Hitting a Moving Target: Test-Time Adaptation for AI Text Detection under Continual Distribution Shift

arXiv:2606.25152v1 Announce Type: new Abstract: Deployed approaches for AI text detection often rely on training-time access to labeled datasets of both human-written and AI-generated text. This approach is vulnerable to three types of distribution shifts that occur continually…

10
arXiv — NLP / Computation & Language research 5d ago

What Intermediate Layers Know: Detecting Jailbreaks from Entropy Dynamics

arXiv:2606.25182v1 Announce Type: new Abstract: Jailbreak attacks reveal a persistent weakness in aligned Large Language Models: carefully crafted prompts can elicit policy-violating responses despite safety training. While most defenses operate at the prompt or output level, it…

5
arXiv — NLP / Computation & Language research 5d ago

Towards Structuring an Arabic-English Machine-Readable Dictionary Using Parsing Expression Grammars

arXiv:2606.25231v1 Announce Type: new Abstract: Dictionaries are rich sources of lexical information about words that is required for many applications of natural language processing and human language technology. However, publishers prepare printed dictionaries for human usage…

8
arXiv — NLP / Computation & Language research 5d ago

Automatic Generation of Highlights for Academic Paper Via Prompt-based Learning

arXiv:2606.25253v1 Announce Type: new Abstract: Highlights provide a concise summary of the main contributions of an academic paper and help readers quickly understand its focus. However, many journals do not provide highlights, which limits their use in literature retrieval,…

17
arXiv — NLP / Computation & Language research 5d ago

Improved Large Language Diffusion Models

arXiv:2606.25331v1 Announce Type: new Abstract: Modern large language models are predominantly trained with autoregressive factorization and causal attention. We present \emph{iLLaDA}, an 8B masked diffusion language model trained from scratch with fully bidirectional attention.…

10
arXiv — NLP / Computation & Language research 5d ago

Hybrid-IR: Dual-Path Hybrid Retrieval with Iterative Reasoning for Complex Medical Question Answering

arXiv:2606.25338v1 Announce Type: new Abstract: Large language models (LLMs) have shown promising performance across a wide range of biomedical applications, including medical question answering (QA), yet they remain prone to hallucinations and outdated knowledge. Although…

9
arXiv — NLP / Computation & Language research 5d ago

Efficient and Trainable Language Model Test-Time Scaling via Local Branch Routing

arXiv:2606.25354v1 Announce Type: new Abstract: Test-time scaling improves language-model reasoning, but existing approaches often face a difficult trade-off: long chain-of-thought sampling remains single-threaded, while sentence- or solution-level search can be computationally…

22

Mapping Political-Elite Networks in Europe with a Multilingual Joint Entity-Relation Extraction Pipeline

Neural Speaker Diarization via Multilingual Training: Evaluation on Low-Resource Nepali-Hindi Speech

From Clicks to Intent: Cross-Platform Session Embeddings with LLM-Distilled Taxonomy for Financial Services Recommendations

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

Axon: A Synthesizing Superoptimizer for Tensor Programs

Narration-of-Thought: Inference-Time Scaffolding for Defeasible Ethical Reasoning in Large Language Models

Staying VIGILant: Mitigating Visual Laziness via Counterfactual Visual Alignment in MLLMs

DualEval: Joint Model-Item Calibration for Unified LLM Evaluation

Epiphany-Aware KV Cache Eviction Without the Attention Matrix

Adaptive Evaluation of Out-of-Band Defenses Against Prompt Injection in LLM Agents

Humans Disengage, Reasoning Models Persist: Separating Difficulty Registration from Deliberation Allocation

Compiler-Driven Approximation Tuning for Hyperdimensional Computing

Adversarial Diffusion Across Modalities: A Fusion Survey of Attacks, Defenses, and Evaluation for Text, Vision, and Vision-Language Models

From Weights to Features: SAE-Guided Activation Regularization for LLM Continual Learning

Do Safety Guardrails Need to Reason? LeanGuard: A Fast and Light Approach for Robust Moderation

HyperDFlash: MHC-Aligned Block Speculative Decoding with Gated Residual Reduction

Structure Before Collapse: Transient semantic geometry in next-token prediction

Reproducibility Study of "AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models"

AIGP: An LLM-Based Framework for Long-Term Value Alignment in E-Commerce Pricing

KARLA: Knowledge-base Augmented Retrieval for Language Models

AgentX: Towards Agent-Driven Self-Iteration of Industrial Recommender Systems

Jailbreaking for the Average Jane: Choosing Optimal Jailbreaks via Bandit Algorithms for Automatically Enhanced Queries

Einstein World Models

Just how sure are you? Improving Verbalized Uncertainty Calibration in Medical VQA

HarmVideoBench: Benchmarking Harmful Video Understanding in Large Multimodal Models

Ask, Don't Judge: Binary Questions for Interpretable LLM Evaluation and Self-Improvement

The Geometry of Updates: Fisher Alignment at Vocabulary Scale

DanceOPD: On-Policy Generative Field Distillation

Tuning Language Models by Mixture-of-Depths Ensemble

A Systematic Survey of Semantic Role Labeling in the Era of Pretrained Language Models

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

Somatic in the East, Psychological in the West?: Investigating Clinically-Grounded Cross-Cultural Depression Symptom Expression in LLMs

Vis-CoT: A Human-in-the-Loop Framework for Interactive Visualization and Intervention in LLM Chain-of-Thought Reasoning

Graph-Based Phonetic Error Correction of Noisy ASR

Small edits, large models: How Wikipedia advocacy shapes LLM values

AgentOdyssey: Open-Ended Long-Horizon Text Game Generation for Test-Time Continual Learning Agents

Error-Aware TF-IDF Retrieval-Augmented Generation for ASR Error Correction

Perfect Detection, Failed Control: The Geometry of Knowing vs. Steering in Language Models

Dustin: Draft-Augmented Sparse Verification for Efficient Long-Context Generation with Speculative Decoding

LLM Performance on a Real, Double-Marked GCSE Benchmark

LLM-Based Scientific Peer Review: Methods, Benchmarks, and Reliability Challenges

Dream at SemEval-2026 Task 13: SALSA for Single-Pass Machine-Generated Code Detection

The cognitive, affective, and behavioral expression of self-stigma among people who use drugs in online substance use communities

Hitting a Moving Target: Test-Time Adaptation for AI Text Detection under Continual Distribution Shift

What Intermediate Layers Know: Detecting Jailbreaks from Entropy Dynamics

Towards Structuring an Arabic-English Machine-Readable Dictionary Using Parsing Expression Grammars

Automatic Generation of Highlights for Academic Paper Via Prompt-based Learning

Improved Large Language Diffusion Models

Hybrid-IR: Dual-Path Hybrid Retrieval with Iterative Reasoning for Complex Medical Question Answering

Efficient and Trainable Language Model Test-Time Scaling via Local Branch Routing