arXiv — NLP / Computation & Language

500 articles archived · Visit source ↗ · RSS

arXiv — NLP / Computation & Language research 6d ago

Selective Capability Unlearning in End-to-End Spoken Language Understanding

arXiv:2606.24063v1 Announce Type: new Abstract: Modern spoken language understanding (SLU) systems are increasingly deployed in real-world settings, where specific functionalities may need to be removed due to policy or safety constraints. In SLU, a functionality corresponds to…

23
arXiv — NLP / Computation & Language research 6d ago

Sentence-Level Contextual Entrainment in Large Language Models

arXiv:2606.24077v1 Announce Type: new Abstract: Contextual entrainment, which is a newly discovered phenomenon in large language models (LLMs), refers to the tendency of a model to assign higher probabilities to tokens that appear in its context. In this work, we extend this…

32
arXiv — NLP / Computation & Language research 6d ago

CAVEWOMAN: How Large Language Models Behave Under Linguistic Input and Output Compression

arXiv:2606.24083v1 Announce Type: new Abstract: "Talk short. Drop grammar. Save token." This caveman style is widely promoted as a way to cut inference cost, but whether it actually saves anything depends on which channel (the user's prompt or the model's response) is being…

25
arXiv — NLP / Computation & Language research 6d ago

Predicting Poets' Origins from Verse: A Computational Analysis of Regional Linguistic Fingerprints in the Complete Tang Poems

arXiv:2606.24093v1 Announce Type: new Abstract: We ask whether the geographic origin of Tang-dynasty poets leaves a detectable linguistic trace in their work. Aggregating every poem attributed to each author in the Complete Tang Poems (Quan Tang Shi) and linking poets to their…

4
arXiv — NLP / Computation & Language research 6d ago

PORTER: Language-Grounded Event Representations for Portable Structured EHR Foundation Models

arXiv:2606.24102v1 Announce Type: new Abstract: Most electronic health record (EHR) foundation models encode clinical events as discrete event tokens from a fixed vocabulary and therefore cannot directly represent events containing unseen concepts or new combinations of concepts…

35
arXiv — NLP / Computation & Language research 6d ago

Metis: Bridging Text and Code Memory for Self-Evolving Agents

arXiv:2606.24151v1 Announce Type: new Abstract: Self-evolving agents improve over time by distilling experience from past executions and reusing it in future tasks. Existing systems represent such experience either as natural-language text injected into the agent context or as…

38
arXiv — NLP / Computation & Language research 6d ago

MedBench v5: A Dynamic, Process-Oriented, and Hallucination-Aware Benchmark for Clinical Multimodal Models

arXiv:2606.24155v1 Announce Type: new Abstract: Existing medical AI benchmarks lack process visibility, atomic skill evaluation, and integrated hallucination detection. We introduce MedBench v5, a redesigned benchmark for clinical multimodal models (language, vision-language,…

38
arXiv — NLP / Computation & Language research 6d ago

BehaviorBench: Benchmarking Foundation Models for Behavioral Science Tasks

arXiv:2606.24162v1 Announce Type: new Abstract: Foundation models have been increasingly applied to behavioral science domains such as psychology, sociology, and economics. While these models show promise in individual tasks such as survey response prediction and human-subject…

27
arXiv — NLP / Computation & Language research 6d ago

A P\={a}ninian Foundation for Indic Language Processing

arXiv:2606.24172v1 Announce Type: new Abstract: More than a billion people communicate in Indic languages, yet the natural language processing infrastructure serving them remains fragmented and underdeveloped. The cause is structural: the field organizes its tools and benchmarks…

24
arXiv — NLP / Computation & Language research 6d ago

A Synthetic Reliability-Aware PINN Benchmark for Offshore Wind Turbine Support-Structure Monitoring with Bayesian Inverse Identification

arXiv:2606.24176v1 Announce Type: new Abstract: Reliable structural health monitoring (SHM) of offshore wind turbine (OWT) support structures requires fast state estimation from sparse measurements. Repeated high fidelity finite element or aeroelastic analyses are difficult to…

8
arXiv — NLP / Computation & Language research 6d ago

Aspect-Based Sentiment Evolution and its Correlation with Review Rounds in Multi-Round Peer Reviews: A Deep Learning Approach

arXiv:2606.24188v1 Announce Type: new Abstract: Mining sentiment information from the textual content of peer review comments offers valuable insights into the scientific evaluation process. However, previous studies are often constrained by coarse-grained analysis and the lack…

19
arXiv — NLP / Computation & Language research 6d ago

MMed-Bench-IR: A Heterogeneous Benchmark for Multilingual Medical Information Retrieval

arXiv:2606.24200v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) in clinical settings increasingly requires multilingual retrieval against predominantly English evidence corpora. Multilingual medical retrieval demands three capabilities: cross-lingual…

36
arXiv — NLP / Computation & Language research 6d ago

Decoherence as Defence and the Magnitude of Noise Regularisation: A Rigorous N -Qubit Theory of Stochastic Quantum Neural Networks for Adversarially Robust Network Intrusion Detection

arXiv:2606.24219v1 Announce Type: new Abstract: Stochastic quantum neural networks (SQNNs) encode neuronal activations as qubits, synaptic topology as entanglement, and neural noise through a Lindblad master equation. A recent conference study applied a ring-entangled SQNN to…

19
arXiv — NLP / Computation & Language research 6d ago

SURGELLM: Rethinking Multi-Task Evaluation through Task-Aware Feature Gating with Class-Balanced Normalization

arXiv:2606.24259v1 Announce Type: new Abstract: Fine-tuned encoders deployed across heterogeneous NLP tasks face three compounding problems: mismatched inductive biases, class-imbalance corruption of feature statistics, and no mechanism to condition attention on external lexical…

4
arXiv — NLP / Computation & Language research 6d ago

Pigeonholing: Bad prompts hurt models to collapse and make mistakes

arXiv:2606.24267v1 Announce Type: new Abstract: While in-context learning is generally shown to be effective in Large Language Models (LLMs), bad contexts can cause performance degradation and mode collapse, a phenomenon we call "pigeonholing." **Unintentionally bad** contexts…

26
arXiv — NLP / Computation & Language research 6d ago

CALIBER: Calibrating Confidence Before and After Reasoning in Language Models

arXiv:2606.24281v1 Announce Type: new Abstract: Reasoning language models are increasingly asked not only to answer difficult questions, but also to estimate their likelihood of success. Existing methods typically elicit confidence only once: either before thinking or after…

4
arXiv — NLP / Computation & Language research 6d ago

AVOC: Enhancing Hour-Level Audio-Video Understanding in Omni-Modal LLMs via Retrieval-Inspired Token Compression

arXiv:2606.24286v1 Announce Type: new Abstract: Multimodal Large Language Models have achieved remarkable progress in short-form audio-video understanding, yet long-form audio-video comprehension remains challenged by limited context windows and severe information redundancy. To…

15
arXiv — NLP / Computation & Language research 6d ago

Prague Dependency Treebank -- Consolidated 2.0: Enriching a Complex Annotation Scheme

arXiv:2606.24324v1 Announce Type: new Abstract: The Prague Dependency Treebank framework is unique in its attempt to systematically include and link different layers of language, including a meaning representation with several types of inter-sentential phenomena, especially…

12
arXiv — NLP / Computation & Language research 6d ago

Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment

arXiv:2606.24331v1 Announce Type: new Abstract: Transformer-based language models have become the default substrate for natural language processing and the pace of new releases has made it hard for practitioners to separate durable ideas from the noise of incremental…

17
arXiv — NLP / Computation & Language research 6d ago

Meet UD_Czech-PDTC: A Large and Genre-Rich Treebank in Universal Dependencies

arXiv:2606.24337v1 Announce Type: new Abstract: Czech has been part of Universal Dependencies since its first release in 2015. It has also been one of the best represented languages, with the Prague Dependency Treebank being order of magnitude larger than most other UD…

20
arXiv — NLP / Computation & Language research 6d ago

Automatic Part-of-Speech Tagging of Arabic-English Dictionary Senses through WordNet

arXiv:2606.24359v1 Announce Type: new Abstract: This paper proposed an algorithm for part-of-speech (POS) tagging senses of a bilingual dictionary. The algorithm is applied on the Al-Mawrid Arabic-English dictionary. The tagging task is accomplished by transferring the POS tags…

21
arXiv — NLP / Computation & Language research 6d ago

MorfFlex: Handling Rich Morphology

arXiv:2606.24366v1 Announce Type: new Abstract: We present MorfFlex, a morphological dictionary architecture suitable for languages with extensive regularity in both inflection and derivation. As the primary example of MorfFlex in use we introduce MorfFlex CZ, a morphological…

27
arXiv — NLP / Computation & Language research 6d ago

On the Stability of Prompt Ranking in Large Language Model Evaluation

arXiv:2606.24381v1 Announce Type: new Abstract: Prompt-based interaction has become a dominant paradigm for using large language models (LLMs), where multiple candidate prompts are evaluated and the top-ranked one is selected for downstream use. This workflow implicitly assumes…

34
arXiv — NLP / Computation & Language research 6d ago

AutoSpecNER: A Fine-Grained Named Entity Recognition Dataset for Vehicle Specification Extraction

arXiv:2606.24387v1 Announce Type: new Abstract: Vehicle advertisements contain rich specification information, but automotive NER resources remain limited. We introduce AutoSpecNER, an expert-annotated dataset for fine-grained entity recognition in vehicle listings. The dataset…

6
arXiv — NLP / Computation & Language research 6d ago

Beyond Logprobs: A Multi-Signal Confidence Engine for LLM-Based Document Field Extraction

arXiv:2606.24420v1 Announce Type: new Abstract: In high-stakes document processing pipelines, including financial reconciliation, compliance verification, and procurement automation, an LLM extraction that is silently wrong is more dangerous than one that is visibly absent. The…

28
arXiv — NLP / Computation & Language research 6d ago

Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning

arXiv:2606.24428v1 Announce Type: new Abstract: Experience-driven self-evolution is critical for large language model (LLM) agents to improve through open-world interaction. However, existing experience learning methods mostly rely on single-agent loops, where the same agent…

17
arXiv — NLP / Computation & Language research 6d ago

The African Language Tax: Quantifying the Cost, Latency, and Context Penalty of Tokenizing African Languages in Frontier LLMs

arXiv:2606.24460v1 Announce Type: new Abstract: Commercial large language models bill, scale latency, and budget context per token. Yet tokenizers assign more subword tokens to the same meaning in some languages than in others, so speakers of languages with high token-fertility…

20
arXiv — NLP / Computation & Language research 6d ago

UOL@IDEM at BEA 2026 Shared Task 1: Neural Fusion and Feature-Rich Modeling for L1-Aware Vocabulary Difficulty Prediction

arXiv:2606.24501v1 Announce Type: new Abstract: This paper describes UOL@IDEM's closed-track submission to the BEA 2026 shared task on L1-aware vocabulary difficulty prediction. We model the task as regression and train separate systems for Spanish, German, and Mandarin…

32
arXiv — NLP / Computation & Language research 6d ago

Poster: Exploring the Limits of Audio-Based Detection of Turkish Phone Call Scams

arXiv:2606.24523v1 Announce Type: new Abstract: Scam phone calls exploit vulnerable communities worldwide, yet research on detection has focused almost exclusively on English and other high-resource languages. In low-resource settings such as Turkish, detection is especially…

11
arXiv — NLP / Computation & Language research 6d ago

AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning

arXiv:2606.24526v1 Announce Type: new Abstract: Large language models are increasingly deployed as agents that reason over documents rather than answer from parametric knowledge. We study archive-grounded reasoning: locating sparse evidence across a large, messy collection of…

36
arXiv — NLP / Computation & Language research 6d ago

NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?

arXiv:2606.24530v1 Announce Type: new Abstract: We introduce NatureBench, a cross-discipline benchmark of 90 tasks distilled from peer-reviewed Nature-family publications, designed to evaluate whether AI coding agents can move beyond reproduction toward discovery on real…

21
arXiv — NLP / Computation & Language research 6d ago

Cross-Lingual Exploration for Parametric Knowledge

arXiv:2606.24579v1 Announce Type: new Abstract: Parametric knowledge in Large Language Models is not equally accessible across languages. As a result, standard inference techniques often struggle to surface localized facts, leading to failures in cross-lingual knowledge transfer…

28
arXiv — NLP / Computation & Language research 6d ago

MEMPROBE: Probing Long-Term Agent Memory via Hidden User-State Recovery

arXiv:2606.24595v1 Announce Type: new Abstract: Long-term memory promises LLM agents that grow more capable across sessions, maintaining an accurate, evolving understanding of the user that interaction forms. In practice, however, this memory is evaluated mostly through…

32
arXiv — NLP / Computation & Language research 6d ago

To Compare, or Not to Compare: On Methodological Practices in Evaluating Social Bias

arXiv:2606.24596v1 Announce Type: new Abstract: As Large Language Models are increasingly deployed in critical applications, robustly evaluating their social biases is paramount. However, the current literature suffers from widespread methodological fragmentation, which yields…

22
arXiv — NLP / Computation & Language research 6d ago

Qwen-AgentWorld: Language World Models for General Agents

arXiv:2606.24597v1 Announce Type: new Abstract: A world model predicts environment dynamics based on current observations and actions, serving as a core cognitive mechanism for reasoning and planning. In this work, we investigate how world modeling based on language models can…

8
arXiv — NLP / Computation & Language research 6d ago

Same Lesson, Different Story: Cross-Lingual Reconstruction of Cultural Narratives in Large Language Models

arXiv:2606.24610v1 Announce Type: new Abstract: The evaluation of cultural grounding context becomes complex when multiple cultures convey the same moral lesson. This challenge is particularly relevant to large language models (LLMs), which produce narratives across a wide range…

10
arXiv — NLP / Computation & Language research 6d ago

Privacy-Preserving RAG via Multi-Agent Semantic Rewriting: Achieving Confidentiality Without Compromising Contextual Fidelity

arXiv:2606.24623v1 Announce Type: new Abstract: Retrieval-Augmented Generation enhances large language models by incorporating external knowledge, but deploying it in sensitive scenarios risks privacy leakage via malicious prompts. To address this, we propose a multi-agent…

30
arXiv — NLP / Computation & Language research 6d ago

The Warrant Gap: Claim-Conditioned Re-scoring for Fact-Checking

arXiv:2606.24627v1 Announce Type: new Abstract: Fact-checking systems built on LLMs achieve high verdict accuracy on standard benchmarks, yet routinely output Supports labels whose cited evidence does not license the claim. Structured decomposition is the natural way to inspect…

4
arXiv — NLP / Computation & Language research 6d ago

Measuring User's Mental Models of Speech Translation in Human-AI Collaboration

arXiv:2606.24644v1 Announce Type: new Abstract: Millions of people use machine translation (MT) tools daily, yet little is known about their perception of what systems can and cannot do. This paper studies users' mental models of speech translation systems through a new…

13
arXiv — NLP / Computation & Language research 6d ago

Harmonic: Hierarchical State Space Models for Efficient Long-Context Language Modeling

arXiv:2606.24650v1 Announce Type: new Abstract: We present Harmonic, a hierarchical state space model (SSM) for language modeling. The architecture stacks three recurrent levels at progressively slower timescales; each level receives the prediction error of the level below as…

21
arXiv — NLP / Computation & Language research 6d ago

AI-PAVE-Br: Leveraging Large Language Models for Enhanced Product Attribute Value Extraction through a Golden Set Approach

arXiv:2606.24655v1 Announce Type: new Abstract: The explosive growth and complexity of product data within the dynamic Brazilian e-commerce landscape demand robust and specialized methods for structured information extraction. Traditional approaches to Product Attribute Value…

5
arXiv — NLP / Computation & Language research 6d ago

DREAM: Dense Retrieval Embeddings via Autoregressive Modeling

arXiv:2606.24667v1 Announce Type: new Abstract: Dense retrieval embedding models are a fundamental component of modern retrieval-based AI systems. Most dense retrievers are trained with contrastive objectives, which require labeled positive and negative document pairs that are…

23
arXiv — NLP / Computation & Language research 6d ago

CN-NewsTTS Bench: a target-level automatic benchmark for raw-input Chinese news TTS pronunciation

arXiv:2606.24714v1 Announce Type: new Abstract: Chinese news text contains dense written forms such as scores, hyphenated model names, ranges, unit symbols, percentages, English abbreviations, and mixed Chinese-Latin-digit names. These forms are frequent in real listening…

33
arXiv — NLP / Computation & Language research 6d ago

Task Decomposition for Efficient Annotation

arXiv:2606.24734v1 Announce Type: new Abstract: High-quality annotations of structured representations are expensive to collect over large corpora. Manual annotation of structure is laborious, and model-based annotation, although cheaper to generate, requires expensive…

24
arXiv — NLP / Computation & Language research 6d ago

CANDLE: Character-level Arabic Noise Deduplication using Lightweight Encoder

arXiv:2606.24758v1 Announce Type: new Abstract: Handling repeated characters in text can be tricky, since they can represent either the correct spelling of a word or informal character elongation often seen in social media posts. We present CANDLE, a lightweight system for…

10
arXiv — NLP / Computation & Language research 6d ago

Posterior Refinement: Fast Language Generation via Any-Order Flow Maps

arXiv:2606.24773v1 Announce Type: new Abstract: Non-autoregressive generation offers a powerful paradigm for iterative refinement, allowing models to recursively critique, erase and regenerate arbitrary subsets of tokens. However, existing non-autoregressive models fail to…

37
arXiv — NLP / Computation & Language research 6d ago

Are We Ready For An Agent-Native Memory System?

arXiv:2606.24775v1 Announce Type: new Abstract: Memory for large language model (LLM) agents has rapidly evolved from simple retrieval-augmented mechanisms into a data management system that supports persistent information storage, retrieval, update, consolidation, and dynamic…

8
arXiv — NLP / Computation & Language research 6d ago

Paying to Know: Micro-Transaction Markets for Verified Product Information in Agentic E-Commerce

arXiv:2606.24783v1 Announce Type: new Abstract: Commercial NLP treats the shopping chatbot as a recommender or a conversion tool: its job is to match a user to a catalogue entry and close a sale. We argue that the arrival of agent-native micro-payment rails (e.g., x402, AP2)…

23
arXiv — NLP / Computation & Language research 6d ago

SHERLOC: Structured Diagnostic Localization for Code Repair Agents

arXiv:2606.24820v1 Announce Type: new Abstract: LLM agents solve repository-level coding tasks through multi-turn tool use, but utilize half their budget on locating faults before editing. Dedicated localization frameworks have emerged, yet are still evaluated as file retrieval…

20
arXiv — NLP / Computation & Language research 6d ago

L3Cube-MahaPOS: A Marathi Part-of-Speech Tagging Dataset and BERT Models

arXiv:2606.24825v1 Announce Type: new Abstract: Part-of-Speech (POS) tagging is a foundational NLP task underpinning machine translation, information extraction, and syntactic parsing. Despite Marathi being spoken by over 83 million people and ranking among the top twenty most…

16

Selective Capability Unlearning in End-to-End Spoken Language Understanding

Sentence-Level Contextual Entrainment in Large Language Models

CAVEWOMAN: How Large Language Models Behave Under Linguistic Input and Output Compression

Predicting Poets' Origins from Verse: A Computational Analysis of Regional Linguistic Fingerprints in the Complete Tang Poems

PORTER: Language-Grounded Event Representations for Portable Structured EHR Foundation Models

Metis: Bridging Text and Code Memory for Self-Evolving Agents

MedBench v5: A Dynamic, Process-Oriented, and Hallucination-Aware Benchmark for Clinical Multimodal Models

BehaviorBench: Benchmarking Foundation Models for Behavioral Science Tasks

A P\={a}ninian Foundation for Indic Language Processing

A Synthetic Reliability-Aware PINN Benchmark for Offshore Wind Turbine Support-Structure Monitoring with Bayesian Inverse Identification

Aspect-Based Sentiment Evolution and its Correlation with Review Rounds in Multi-Round Peer Reviews: A Deep Learning Approach

MMed-Bench-IR: A Heterogeneous Benchmark for Multilingual Medical Information Retrieval

Decoherence as Defence and the Magnitude of Noise Regularisation: A Rigorous N -Qubit Theory of Stochastic Quantum Neural Networks for Adversarially Robust Network Intrusion Detection

SURGELLM: Rethinking Multi-Task Evaluation through Task-Aware Feature Gating with Class-Balanced Normalization

Pigeonholing: Bad prompts hurt models to collapse and make mistakes

CALIBER: Calibrating Confidence Before and After Reasoning in Language Models

AVOC: Enhancing Hour-Level Audio-Video Understanding in Omni-Modal LLMs via Retrieval-Inspired Token Compression

Prague Dependency Treebank -- Consolidated 2.0: Enriching a Complex Annotation Scheme

Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment

Meet UD_Czech-PDTC: A Large and Genre-Rich Treebank in Universal Dependencies

Automatic Part-of-Speech Tagging of Arabic-English Dictionary Senses through WordNet

MorfFlex: Handling Rich Morphology

On the Stability of Prompt Ranking in Large Language Model Evaluation

AutoSpecNER: A Fine-Grained Named Entity Recognition Dataset for Vehicle Specification Extraction

Beyond Logprobs: A Multi-Signal Confidence Engine for LLM-Based Document Field Extraction

Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning

The African Language Tax: Quantifying the Cost, Latency, and Context Penalty of Tokenizing African Languages in Frontier LLMs

UOL@IDEM at BEA 2026 Shared Task 1: Neural Fusion and Feature-Rich Modeling for L1-Aware Vocabulary Difficulty Prediction

Poster: Exploring the Limits of Audio-Based Detection of Turkish Phone Call Scams

AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning

NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?

Cross-Lingual Exploration for Parametric Knowledge

MEMPROBE: Probing Long-Term Agent Memory via Hidden User-State Recovery

To Compare, or Not to Compare: On Methodological Practices in Evaluating Social Bias

Qwen-AgentWorld: Language World Models for General Agents

Same Lesson, Different Story: Cross-Lingual Reconstruction of Cultural Narratives in Large Language Models

Privacy-Preserving RAG via Multi-Agent Semantic Rewriting: Achieving Confidentiality Without Compromising Contextual Fidelity

The Warrant Gap: Claim-Conditioned Re-scoring for Fact-Checking

Measuring User's Mental Models of Speech Translation in Human-AI Collaboration

Harmonic: Hierarchical State Space Models for Efficient Long-Context Language Modeling

AI-PAVE-Br: Leveraging Large Language Models for Enhanced Product Attribute Value Extraction through a Golden Set Approach

DREAM: Dense Retrieval Embeddings via Autoregressive Modeling

CN-NewsTTS Bench: a target-level automatic benchmark for raw-input Chinese news TTS pronunciation

Task Decomposition for Efficient Annotation

CANDLE: Character-level Arabic Noise Deduplication using Lightweight Encoder

Posterior Refinement: Fast Language Generation via Any-Order Flow Maps

Are We Ready For An Agent-Native Memory System?

Paying to Know: Micro-Transaction Markets for Verified Product Information in Agentic E-Commerce

SHERLOC: Structured Diagnostic Localization for Code Repair Agents

L3Cube-MahaPOS: A Marathi Part-of-Speech Tagging Dataset and BERT Models