arXiv — NLP / Computation & Language
500 articles archived · Visit source ↗ · RSS
-
arXiv — NLP / Computation & Language research 4d ago
Mapping Political-Elite Networks in Europe with a Multilingual Joint Entity-Relation Extraction Pipeline
arXiv:2606.27347v1 Announce Type: new Abstract: Whether political elites organise into rent-seeking coalitions that capture public resources or civic networks that sustain governance is a central question in comparative politics. Yet observing these complex, informal, and…
11 -
arXiv — NLP / Computation & Language research 4d ago
Neural Speaker Diarization via Multilingual Training: Evaluation on Low-Resource Nepali-Hindi Speech
arXiv:2606.26144v1 Announce Type: cross Abstract: Speaker diarization, the task of determining "who spoke when" in a multi-speaker recording, is a critical component in applications such as meeting transcription, accessibility tools, and multilingual information retrieval. While…
36 -
arXiv — NLP / Computation & Language research 4d ago
From Clicks to Intent: Cross-Platform Session Embeddings with LLM-Distilled Taxonomy for Financial Services Recommendations
arXiv:2606.26277v1 Announce Type: cross Abstract: Sequential user behavior modeling is widely adopted in industrial recommender systems; however, significant gaps remain in financial services, where pre-login web interactions and authenticated in-app experiences differ…
24 -
arXiv — NLP / Computation & Language research 4d ago
The Verification Horizon: No Silver Bullet for Coding Agent Rewards
arXiv:2606.26300v1 Announce Type: cross Abstract: A classical intuition holds that verifying a solution is easier than producing one. For today's coding agents, this intuition is being inverted: as foundation models develop stronger reasoning capabilities and engineering…
24 -
arXiv — NLP / Computation & Language research 4d ago
Axon: A Synthesizing Superoptimizer for Tensor Programs
arXiv:2606.26344v1 Announce Type: cross Abstract: Writing high performance kernels for AI accelerators requires deep expertise in tiling, instruction selection, data layout, and operator fusion placing a significant burden on programmers. In this paper, we focus on tile based AI…
33 -
arXiv — NLP / Computation & Language research 4d ago
Narration-of-Thought: Inference-Time Scaffolding for Defeasible Ethical Reasoning in Large Language Models
arXiv:2606.26366v1 Announce Type: cross Abstract: Standard chain-of-thought on moral dilemmas exhibits two failure modes: stakeholder collapse (the trace names at most one party with a stake in the outcome) and uncertainty suppression (no explicit unknowns or hedges before…
29 -
arXiv — NLP / Computation & Language research 4d ago
Staying VIGILant: Mitigating Visual Laziness via Counterfactual Visual Alignment in MLLMs
arXiv:2606.26387v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) extend large language models (LLMs) with visual perception, enabling joint reasoning over images and text. Despite inheriting strong reasoning capabilities from LLMs, they remain prone to…
19 -
arXiv — NLP / Computation & Language research 4d ago
DualEval: Joint Model-Item Calibration for Unified LLM Evaluation
arXiv:2606.26429v1 Announce Type: cross Abstract: Current LLM evaluation relies on two complementary but often disconnected signals: static benchmarks with objective correctness labels and arena-style preference data that better reflect open-ended user interactions. We introduce…
24 -
arXiv — NLP / Computation & Language research 4d ago
Epiphany-Aware KV Cache Eviction Without the Attention Matrix
arXiv:2606.26472v1 Announce Type: cross Abstract: As reasoning models emit chains of thought tens of thousands of tokens long, KV cache increasingly becomes a deployment bottleneck. Existing cache eviction methods rank tokens by attention weight, which is a noisy importance…
21 -
arXiv — NLP / Computation & Language research 4d ago
Adaptive Evaluation of Out-of-Band Defenses Against Prompt Injection in LLM Agents
arXiv:2606.26479v1 Announce Type: cross Abstract: Recent work (2024 to 2026) has converged on a strategy for defending tool-using LLM agents against indirect prompt injection: rather than training the model to refuse malicious instructions, enforce security outside the model…
38 -
arXiv — NLP / Computation & Language research 4d ago
Humans Disengage, Reasoning Models Persist: Separating Difficulty Registration from Deliberation Allocation
arXiv:2606.26502v1 Announce Type: cross Abstract: Large reasoning models (LRMs) take longer on harder problems, just as humans do. This surface similarity hides an opposite pattern within items. When an LRM gets a problem wrong, it spends more tokens than when it gets the same…
29 -
arXiv — NLP / Computation & Language research 4d ago
Compiler-Driven Approximation Tuning for Hyperdimensional Computing
arXiv:2606.26547v1 Announce Type: cross Abstract: As Moore's law reaches its physical and economic limits, domain-specific approaches are increasingly employed to accelerate machine learning workloads. Hyperdimensional Computing (HDC) represents one such emerging paradigm,…
6 -
arXiv — NLP / Computation & Language research 4d ago
Adversarial Diffusion Across Modalities: A Fusion Survey of Attacks, Defenses, and Evaluation for Text, Vision, and Vision-Language Models
arXiv:2606.26566v1 Announce Type: cross Abstract: Adversarial evaluation of AI systems has matured along four largely disconnected tracks: diffusion-based attacks on text and large language models (LLMs), diffusion-based attacks on image classifiers, jailbreak pipelines against…
18 -
arXiv — NLP / Computation & Language research 4d ago
From Weights to Features: SAE-Guided Activation Regularization for LLM Continual Learning
arXiv:2606.26629v1 Announce Type: cross Abstract: Weight-space regularization methods such as Elastic Weight Consolidation (EWC) are the standard approach to catastrophic forgetting in continual learning. However, those methods tend to underperform when applied to large language…
15 -
arXiv — NLP / Computation & Language research 4d ago
Do Safety Guardrails Need to Reason? LeanGuard: A Fast and Light Approach for Robust Moderation
arXiv:2606.26686v1 Announce Type: cross Abstract: In order to screen a prompt or a response, the recent guardrail methods generate a chain-of-thought (CoT) before they issue a verdict. This design follows a common belief that step-by-step reasoning improves a decision. However,…
17 -
arXiv — NLP / Computation & Language research 4d ago
HyperDFlash: MHC-Aligned Block Speculative Decoding with Gated Residual Reduction
arXiv:2606.26744v1 Announce Type: cross Abstract: We present HyperDFlash, a block-parallel speculative decoding framework tailored to the novel multi-hyper-connection (MHC) architecture proposed by DeepSeek-V4. Despite the strong initial-token drafting performance of the native…
10 -
arXiv — NLP / Computation & Language research 4d ago
Structure Before Collapse: Transient semantic geometry in next-token prediction
arXiv:2606.26749v1 Announce Type: cross Abstract: Neural Collapse predicts that balanced one-hot classification pushes model representations to be equally far from each other; a symmetric configuration that depends only on the output label and ignores any semantic similarity in…
29 -
arXiv — NLP / Computation & Language research 4d ago
Reproducibility Study of "AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models"
arXiv:2606.26783v1 Announce Type: cross Abstract: Fang et al. (2025) introduced a null-space constrained projection, named AlphaEdit, for locate-then-edit knowledge editing methods, theoretically guaranteeing that edits do not disrupt previously preserved knowledge, and reports…
20 -
arXiv — NLP / Computation & Language research 4d ago
AIGP: An LLM-Based Framework for Long-Term Value Alignment in E-Commerce Pricing
arXiv:2606.26787v1 Announce Type: cross Abstract: Traditional dynamic pricing models in large-scale e-commerce suffer from limited interpretability, poor utilization of unstructured information, and misalignment with long-term business objectives such as cumulative Gross…
26 -
arXiv — NLP / Computation & Language research 4d ago
KARLA: Knowledge-base Augmented Retrieval for Language Models
arXiv:2606.26807v1 Announce Type: cross Abstract: We propose a new method that allows an LLM to automatically pull in factual knowledge from a knowledge base during token generation. This means that (1)~factual knowledge in the LLM output can be updated without retraining the…
12 -
arXiv — NLP / Computation & Language research 4d ago
AgentX: Towards Agent-Driven Self-Iteration of Industrial Recommender Systems
arXiv:2606.26859v1 Announce Type: cross Abstract: Recommendation algorithm iteration is moving from an artisanal, engineer-bound process toward an industrialized research loop, but this transition remains blocked by a structural execution bottleneck: the idea-to-launch cycle…
10 -
arXiv — NLP / Computation & Language research 4d ago
Jailbreaking for the Average Jane: Choosing Optimal Jailbreaks via Bandit Algorithms for Automatically Enhanced Queries
arXiv:2606.26936v1 Announce Type: cross Abstract: With a profusion of jailbreaks for LLMs now widely known, a growing concern is that non-expert malicious actors ("the average Jane") could elicit actionable responses to malicious requests. In this work, we examine whether this…
36 -
arXiv — NLP / Computation & Language research 4d ago
Einstein World Models
arXiv:2606.26969v1 Announce Type: cross Abstract: Does intelligence require the ability to reason about phenomena beyond direct experience? It is natural to suspect that some complex thought cannot be captured through language alone. However, of particular concern to this work,…
28 -
arXiv — NLP / Computation & Language research 4d ago
Just how sure are you? Improving Verbalized Uncertainty Calibration in Medical VQA
arXiv:2606.27023v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) applied to Medical Visual Question Answering (VQA) tend to produce overconfident outputs regardless of actual correctness, and existing verbalized confidence calibration methods, developed…
15 -
arXiv — NLP / Computation & Language research 4d ago
HarmVideoBench: Benchmarking Harmful Video Understanding in Large Multimodal Models
arXiv:2606.27187v1 Announce Type: cross Abstract: Large vision-language models (LVLMs) have recently shown immense potential in automated content moderation, sparking growing interest in developing harmful-video benchmarks. However, we identify two primary limitations in…
25 -
arXiv — NLP / Computation & Language research 4d ago
Ask, Don't Judge: Binary Questions for Interpretable LLM Evaluation and Self-Improvement
arXiv:2606.27226v1 Announce Type: cross Abstract: Evaluating LLM outputs remains a major bottleneck in NLP: human evaluation is expensive and slow, lexical metrics correlate poorly with human judgments on open-ended generation, and holistic LLM judges often produce opaque scores…
14 -
arXiv — NLP / Computation & Language research 4d ago
The Geometry of Updates: Fisher Alignment at Vocabulary Scale
arXiv:2606.27242v1 Announce Type: cross Abstract: Training-free source selection for LLM families with shared vocabularies arises in scientific string domains such as SMILES, protein, and genomic sequences, where candidate corpora share a tokenizer but differ in prediction…
38 -
arXiv — NLP / Computation & Language research 4d ago
DanceOPD: On-Policy Generative Field Distillation
arXiv:2606.27377v1 Announce Type: cross Abstract: Modern image generation demands a single model that unifies diverse capabilities, including text-to-image (T2I), local editing, and global editing. However, these capabilities are rarely naturally aligned and often conflict. For…
18 -
arXiv — NLP / Computation & Language research 4d ago
Tuning Language Models by Mixture-of-Depths Ensemble
arXiv:2410.13077v2 Announce Type: replace Abstract: Transformer-based Large Language Models (LLMs) traditionally rely on final-layer loss for finetuning and final-layer representations for predictions, potentially overlooking the predictive power embedded in late layers.…
35 -
arXiv — NLP / Computation & Language research 4d ago
A Systematic Survey of Semantic Role Labeling in the Era of Pretrained Language Models
arXiv:2502.08660v4 Announce Type: replace Abstract: Semantic role labeling (SRL) is a central natural language processing task for understanding predicate-argument structures within texts and enabling downstream applications. Despite extensive research, comprehensive surveys…
14 -
arXiv — NLP / Computation & Language research 4d ago
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
arXiv:2506.15681v4 Announce Type: replace Abstract: Recent advancements in vision-language models (VLMs) have leveraged large language models (LLMs) to achieve performance on par with closed-source systems like GPT-4V. However, deploying these models in real-world scenarios,…
16 -
arXiv — NLP / Computation & Language research 4d ago
Somatic in the East, Psychological in the West?: Investigating Clinically-Grounded Cross-Cultural Depression Symptom Expression in LLMs
arXiv:2508.03247v2 Announce Type: replace Abstract: Prior clinical psychology research shows that Western individuals with depression tend to report psychological symptoms, while Eastern individuals report somatic ones. We test whether Large Language Models (LLMs), which are…
5 -
arXiv — NLP / Computation & Language research 4d ago
Vis-CoT: A Human-in-the-Loop Framework for Interactive Visualization and Intervention in LLM Chain-of-Thought Reasoning
arXiv:2509.01412v3 Announce Type: replace Abstract: Large language models (LLMs) show strong reasoning via chain-of-thought (CoT) prompting, but the process is opaque, which makes verification, debugging, and control difficult in high-stakes settings. We present Vis-CoT, a…
37 -
arXiv — NLP / Computation & Language research 5d ago
Graph-Based Phonetic Error Correction of Noisy ASR
arXiv:2606.24889v1 Announce Type: new Abstract: Automatic speech recognition (ASR) systems, despite low overall word error rates, produce residual lexical errors that disproportionately affect semantically critical tokens such as named entities, negations, and sentiment-bearing…
37 -
arXiv — NLP / Computation & Language research 5d ago
Small edits, large models: How Wikipedia advocacy shapes LLM values
arXiv:2606.24890v1 Announce Type: new Abstract: Can a small group of volunteers shape how AI systems discuss animal welfare, just by editing Wikipedia? We show that they can. Wikipedia appears in nearly every major language model training dataset and is weighted more heavily…
14 -
arXiv — NLP / Computation & Language research 5d ago
AgentOdyssey: Open-Ended Long-Horizon Text Game Generation for Test-Time Continual Learning Agents
arXiv:2606.24893v1 Announce Type: new Abstract: For agents to learn continuously from interaction with the world at test time, they must be able to explore effectively, acquire new world knowledge and skills, retain relevant episodic experiences, and plan over long horizons. To…
17 -
arXiv — NLP / Computation & Language research 5d ago
Error-Aware TF-IDF Retrieval-Augmented Generation for ASR Error Correction
arXiv:2606.24915v1 Announce Type: new Abstract: End-to-end automatic speech recognition systems frequently hallucinate rare entities and domain-specific terms, especially in low-resource languages. While retrieval-augmented generation frameworks can mitigate these errors using…
18 -
arXiv — NLP / Computation & Language research 5d ago
Perfect Detection, Failed Control: The Geometry of Knowing vs. Steering in Language Models
arXiv:2606.24952v1 Announce Type: new Abstract: A central aspiration of mechanistic interpretability is controllability: if we know where a behavior is represented in a model's activations, we should be able to modify it. This rests on a hidden premise -- that the direction…
26 -
arXiv — NLP / Computation & Language research 5d ago
Dustin: Draft-Augmented Sparse Verification for Efficient Long-Context Generation with Speculative Decoding
arXiv:2606.24957v1 Announce Type: new Abstract: While speculative decoding improves inference throughput for multi-batch long-context Large Language Models (LLMs), its efficiency is often limited by a verification bottleneck where Key-Value (KV) cache loading dominates latency.…
19 -
arXiv — NLP / Computation & Language research 5d ago
LLM Performance on a Real, Double-Marked GCSE Benchmark
arXiv:2606.24973v1 Announce Type: new Abstract: We introduce a dataset of 32,534 double-marked real student responses to GCSE mock exams (GCSEs are the UK's national exams, taken at age ~16), spanning 328 questions across five subjects and including handwritten work. We test…
26 -
arXiv — NLP / Computation & Language research 5d ago
LLM-Based Scientific Peer Review: Methods, Benchmarks, and Reliability Challenges
arXiv:2606.25057v1 Announce Type: new Abstract: The rapid growth of scientific submissions has pushed traditional peer review toward its scalability limits, motivating the exploration of large language models (LLMs) as intelligent automated evaluation assistants. Although recent…
11 -
arXiv — NLP / Computation & Language research 5d ago
Dream at SemEval-2026 Task 13: SALSA for Single-Pass Machine-Generated Code Detection
arXiv:2606.25102v1 Announce Type: new Abstract: Large language models have transformed code generation, raising concerns around authorship, assessment integrity, and software trust. SemEval-2026 Task 13 Subtask A operationalizes detection as binary classification over code…
28 -
arXiv — NLP / Computation & Language research 5d ago
The cognitive, affective, and behavioral expression of self-stigma among people who use drugs in online substance use communities
arXiv:2606.25143v1 Announce Type: new Abstract: Objectives: To develop a codebook for self-stigma across cognitive, affective, and behavioral domains, and to estimate the prevalence, co-occurrence, and temporal patterns of these indicators in Reddit posts by people who use…
12 -
arXiv — NLP / Computation & Language research 5d ago
Hitting a Moving Target: Test-Time Adaptation for AI Text Detection under Continual Distribution Shift
arXiv:2606.25152v1 Announce Type: new Abstract: Deployed approaches for AI text detection often rely on training-time access to labeled datasets of both human-written and AI-generated text. This approach is vulnerable to three types of distribution shifts that occur continually…
10 -
arXiv — NLP / Computation & Language research 5d ago
What Intermediate Layers Know: Detecting Jailbreaks from Entropy Dynamics
arXiv:2606.25182v1 Announce Type: new Abstract: Jailbreak attacks reveal a persistent weakness in aligned Large Language Models: carefully crafted prompts can elicit policy-violating responses despite safety training. While most defenses operate at the prompt or output level, it…
5 -
arXiv — NLP / Computation & Language research 5d ago
Towards Structuring an Arabic-English Machine-Readable Dictionary Using Parsing Expression Grammars
arXiv:2606.25231v1 Announce Type: new Abstract: Dictionaries are rich sources of lexical information about words that is required for many applications of natural language processing and human language technology. However, publishers prepare printed dictionaries for human usage…
8 -
arXiv — NLP / Computation & Language research 5d ago
Automatic Generation of Highlights for Academic Paper Via Prompt-based Learning
arXiv:2606.25253v1 Announce Type: new Abstract: Highlights provide a concise summary of the main contributions of an academic paper and help readers quickly understand its focus. However, many journals do not provide highlights, which limits their use in literature retrieval,…
17 -
arXiv — NLP / Computation & Language research 5d ago
Improved Large Language Diffusion Models
arXiv:2606.25331v1 Announce Type: new Abstract: Modern large language models are predominantly trained with autoregressive factorization and causal attention. We present \emph{iLLaDA}, an 8B masked diffusion language model trained from scratch with fully bidirectional attention.…
10 -
arXiv — NLP / Computation & Language research 5d ago
Hybrid-IR: Dual-Path Hybrid Retrieval with Iterative Reasoning for Complex Medical Question Answering
arXiv:2606.25338v1 Announce Type: new Abstract: Large language models (LLMs) have shown promising performance across a wide range of biomedical applications, including medical question answering (QA), yet they remain prone to hallucinations and outdated knowledge. Although…
9 -
arXiv — NLP / Computation & Language research 5d ago
Efficient and Trainable Language Model Test-Time Scaling via Local Branch Routing
arXiv:2606.25354v1 Announce Type: new Abstract: Test-time scaling improves language-model reasoning, but existing approaches often face a difficult trade-off: long chain-of-thought sampling remains single-threaded, while sentence- or solution-level search can be computationally…
22