arXiv — NLP / Computation & Language
500 articles archived · Visit source ↗ · RSS
-
arXiv — NLP / Computation & Language research 6d ago
Selective Capability Unlearning in End-to-End Spoken Language Understanding
arXiv:2606.24063v1 Announce Type: new Abstract: Modern spoken language understanding (SLU) systems are increasingly deployed in real-world settings, where specific functionalities may need to be removed due to policy or safety constraints. In SLU, a functionality corresponds to…
23 -
arXiv — NLP / Computation & Language research 6d ago
Sentence-Level Contextual Entrainment in Large Language Models
arXiv:2606.24077v1 Announce Type: new Abstract: Contextual entrainment, which is a newly discovered phenomenon in large language models (LLMs), refers to the tendency of a model to assign higher probabilities to tokens that appear in its context. In this work, we extend this…
32 -
arXiv — NLP / Computation & Language research 6d ago
CAVEWOMAN: How Large Language Models Behave Under Linguistic Input and Output Compression
arXiv:2606.24083v1 Announce Type: new Abstract: "Talk short. Drop grammar. Save token." This caveman style is widely promoted as a way to cut inference cost, but whether it actually saves anything depends on which channel (the user's prompt or the model's response) is being…
25 -
arXiv — NLP / Computation & Language research 6d ago
Predicting Poets' Origins from Verse: A Computational Analysis of Regional Linguistic Fingerprints in the Complete Tang Poems
arXiv:2606.24093v1 Announce Type: new Abstract: We ask whether the geographic origin of Tang-dynasty poets leaves a detectable linguistic trace in their work. Aggregating every poem attributed to each author in the Complete Tang Poems (Quan Tang Shi) and linking poets to their…
4 -
arXiv — NLP / Computation & Language research 6d ago
PORTER: Language-Grounded Event Representations for Portable Structured EHR Foundation Models
arXiv:2606.24102v1 Announce Type: new Abstract: Most electronic health record (EHR) foundation models encode clinical events as discrete event tokens from a fixed vocabulary and therefore cannot directly represent events containing unseen concepts or new combinations of concepts…
35 -
arXiv — NLP / Computation & Language research 6d ago
Metis: Bridging Text and Code Memory for Self-Evolving Agents
arXiv:2606.24151v1 Announce Type: new Abstract: Self-evolving agents improve over time by distilling experience from past executions and reusing it in future tasks. Existing systems represent such experience either as natural-language text injected into the agent context or as…
38 -
arXiv — NLP / Computation & Language research 6d ago
MedBench v5: A Dynamic, Process-Oriented, and Hallucination-Aware Benchmark for Clinical Multimodal Models
arXiv:2606.24155v1 Announce Type: new Abstract: Existing medical AI benchmarks lack process visibility, atomic skill evaluation, and integrated hallucination detection. We introduce MedBench v5, a redesigned benchmark for clinical multimodal models (language, vision-language,…
38 -
arXiv — NLP / Computation & Language research 6d ago
BehaviorBench: Benchmarking Foundation Models for Behavioral Science Tasks
arXiv:2606.24162v1 Announce Type: new Abstract: Foundation models have been increasingly applied to behavioral science domains such as psychology, sociology, and economics. While these models show promise in individual tasks such as survey response prediction and human-subject…
27 -
arXiv — NLP / Computation & Language research 6d ago
A P\={a}ninian Foundation for Indic Language Processing
arXiv:2606.24172v1 Announce Type: new Abstract: More than a billion people communicate in Indic languages, yet the natural language processing infrastructure serving them remains fragmented and underdeveloped. The cause is structural: the field organizes its tools and benchmarks…
24 -
arXiv — NLP / Computation & Language research 6d ago
A Synthetic Reliability-Aware PINN Benchmark for Offshore Wind Turbine Support-Structure Monitoring with Bayesian Inverse Identification
arXiv:2606.24176v1 Announce Type: new Abstract: Reliable structural health monitoring (SHM) of offshore wind turbine (OWT) support structures requires fast state estimation from sparse measurements. Repeated high fidelity finite element or aeroelastic analyses are difficult to…
8 -
arXiv — NLP / Computation & Language research 6d ago
Aspect-Based Sentiment Evolution and its Correlation with Review Rounds in Multi-Round Peer Reviews: A Deep Learning Approach
arXiv:2606.24188v1 Announce Type: new Abstract: Mining sentiment information from the textual content of peer review comments offers valuable insights into the scientific evaluation process. However, previous studies are often constrained by coarse-grained analysis and the lack…
19 -
arXiv — NLP / Computation & Language research 6d ago
MMed-Bench-IR: A Heterogeneous Benchmark for Multilingual Medical Information Retrieval
arXiv:2606.24200v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) in clinical settings increasingly requires multilingual retrieval against predominantly English evidence corpora. Multilingual medical retrieval demands three capabilities: cross-lingual…
36 -
arXiv — NLP / Computation & Language research 6d ago
Decoherence as Defence and the Magnitude of Noise Regularisation: A Rigorous N -Qubit Theory of Stochastic Quantum Neural Networks for Adversarially Robust Network Intrusion Detection
arXiv:2606.24219v1 Announce Type: new Abstract: Stochastic quantum neural networks (SQNNs) encode neuronal activations as qubits, synaptic topology as entanglement, and neural noise through a Lindblad master equation. A recent conference study applied a ring-entangled SQNN to…
19 -
arXiv — NLP / Computation & Language research 6d ago
SURGELLM: Rethinking Multi-Task Evaluation through Task-Aware Feature Gating with Class-Balanced Normalization
arXiv:2606.24259v1 Announce Type: new Abstract: Fine-tuned encoders deployed across heterogeneous NLP tasks face three compounding problems: mismatched inductive biases, class-imbalance corruption of feature statistics, and no mechanism to condition attention on external lexical…
4 -
arXiv — NLP / Computation & Language research 6d ago
Pigeonholing: Bad prompts hurt models to collapse and make mistakes
arXiv:2606.24267v1 Announce Type: new Abstract: While in-context learning is generally shown to be effective in Large Language Models (LLMs), bad contexts can cause performance degradation and mode collapse, a phenomenon we call "pigeonholing." **Unintentionally bad** contexts…
26 -
arXiv — NLP / Computation & Language research 6d ago
CALIBER: Calibrating Confidence Before and After Reasoning in Language Models
arXiv:2606.24281v1 Announce Type: new Abstract: Reasoning language models are increasingly asked not only to answer difficult questions, but also to estimate their likelihood of success. Existing methods typically elicit confidence only once: either before thinking or after…
4 -
arXiv — NLP / Computation & Language research 6d ago
AVOC: Enhancing Hour-Level Audio-Video Understanding in Omni-Modal LLMs via Retrieval-Inspired Token Compression
arXiv:2606.24286v1 Announce Type: new Abstract: Multimodal Large Language Models have achieved remarkable progress in short-form audio-video understanding, yet long-form audio-video comprehension remains challenged by limited context windows and severe information redundancy. To…
15 -
arXiv — NLP / Computation & Language research 6d ago
Prague Dependency Treebank -- Consolidated 2.0: Enriching a Complex Annotation Scheme
arXiv:2606.24324v1 Announce Type: new Abstract: The Prague Dependency Treebank framework is unique in its attempt to systematically include and link different layers of language, including a meaning representation with several types of inter-sentential phenomena, especially…
12 -
arXiv — NLP / Computation & Language research 6d ago
Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment
arXiv:2606.24331v1 Announce Type: new Abstract: Transformer-based language models have become the default substrate for natural language processing and the pace of new releases has made it hard for practitioners to separate durable ideas from the noise of incremental…
17 -
arXiv — NLP / Computation & Language research 6d ago
Meet UD_Czech-PDTC: A Large and Genre-Rich Treebank in Universal Dependencies
arXiv:2606.24337v1 Announce Type: new Abstract: Czech has been part of Universal Dependencies since its first release in 2015. It has also been one of the best represented languages, with the Prague Dependency Treebank being order of magnitude larger than most other UD…
20 -
arXiv — NLP / Computation & Language research 6d ago
Automatic Part-of-Speech Tagging of Arabic-English Dictionary Senses through WordNet
arXiv:2606.24359v1 Announce Type: new Abstract: This paper proposed an algorithm for part-of-speech (POS) tagging senses of a bilingual dictionary. The algorithm is applied on the Al-Mawrid Arabic-English dictionary. The tagging task is accomplished by transferring the POS tags…
21 -
arXiv — NLP / Computation & Language research 6d ago
MorfFlex: Handling Rich Morphology
arXiv:2606.24366v1 Announce Type: new Abstract: We present MorfFlex, a morphological dictionary architecture suitable for languages with extensive regularity in both inflection and derivation. As the primary example of MorfFlex in use we introduce MorfFlex CZ, a morphological…
27 -
arXiv — NLP / Computation & Language research 6d ago
On the Stability of Prompt Ranking in Large Language Model Evaluation
arXiv:2606.24381v1 Announce Type: new Abstract: Prompt-based interaction has become a dominant paradigm for using large language models (LLMs), where multiple candidate prompts are evaluated and the top-ranked one is selected for downstream use. This workflow implicitly assumes…
34 -
arXiv — NLP / Computation & Language research 6d ago
AutoSpecNER: A Fine-Grained Named Entity Recognition Dataset for Vehicle Specification Extraction
arXiv:2606.24387v1 Announce Type: new Abstract: Vehicle advertisements contain rich specification information, but automotive NER resources remain limited. We introduce AutoSpecNER, an expert-annotated dataset for fine-grained entity recognition in vehicle listings. The dataset…
6 -
arXiv — NLP / Computation & Language research 6d ago
Beyond Logprobs: A Multi-Signal Confidence Engine for LLM-Based Document Field Extraction
arXiv:2606.24420v1 Announce Type: new Abstract: In high-stakes document processing pipelines, including financial reconciliation, compliance verification, and procurement automation, an LLM extraction that is silently wrong is more dangerous than one that is visibly absent. The…
28 -
arXiv — NLP / Computation & Language research 6d ago
Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning
arXiv:2606.24428v1 Announce Type: new Abstract: Experience-driven self-evolution is critical for large language model (LLM) agents to improve through open-world interaction. However, existing experience learning methods mostly rely on single-agent loops, where the same agent…
17 -
arXiv — NLP / Computation & Language research 6d ago
The African Language Tax: Quantifying the Cost, Latency, and Context Penalty of Tokenizing African Languages in Frontier LLMs
arXiv:2606.24460v1 Announce Type: new Abstract: Commercial large language models bill, scale latency, and budget context per token. Yet tokenizers assign more subword tokens to the same meaning in some languages than in others, so speakers of languages with high token-fertility…
20 -
arXiv — NLP / Computation & Language research 6d ago
UOL@IDEM at BEA 2026 Shared Task 1: Neural Fusion and Feature-Rich Modeling for L1-Aware Vocabulary Difficulty Prediction
arXiv:2606.24501v1 Announce Type: new Abstract: This paper describes UOL@IDEM's closed-track submission to the BEA 2026 shared task on L1-aware vocabulary difficulty prediction. We model the task as regression and train separate systems for Spanish, German, and Mandarin…
32 -
arXiv — NLP / Computation & Language research 6d ago
Poster: Exploring the Limits of Audio-Based Detection of Turkish Phone Call Scams
arXiv:2606.24523v1 Announce Type: new Abstract: Scam phone calls exploit vulnerable communities worldwide, yet research on detection has focused almost exclusively on English and other high-resource languages. In low-resource settings such as Turkish, detection is especially…
11 -
arXiv — NLP / Computation & Language research 6d ago
AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning
arXiv:2606.24526v1 Announce Type: new Abstract: Large language models are increasingly deployed as agents that reason over documents rather than answer from parametric knowledge. We study archive-grounded reasoning: locating sparse evidence across a large, messy collection of…
36 -
arXiv — NLP / Computation & Language research 6d ago
NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?
arXiv:2606.24530v1 Announce Type: new Abstract: We introduce NatureBench, a cross-discipline benchmark of 90 tasks distilled from peer-reviewed Nature-family publications, designed to evaluate whether AI coding agents can move beyond reproduction toward discovery on real…
21 -
arXiv — NLP / Computation & Language research 6d ago
Cross-Lingual Exploration for Parametric Knowledge
arXiv:2606.24579v1 Announce Type: new Abstract: Parametric knowledge in Large Language Models is not equally accessible across languages. As a result, standard inference techniques often struggle to surface localized facts, leading to failures in cross-lingual knowledge transfer…
28 -
arXiv — NLP / Computation & Language research 6d ago
MEMPROBE: Probing Long-Term Agent Memory via Hidden User-State Recovery
arXiv:2606.24595v1 Announce Type: new Abstract: Long-term memory promises LLM agents that grow more capable across sessions, maintaining an accurate, evolving understanding of the user that interaction forms. In practice, however, this memory is evaluated mostly through…
32 -
arXiv — NLP / Computation & Language research 6d ago
To Compare, or Not to Compare: On Methodological Practices in Evaluating Social Bias
arXiv:2606.24596v1 Announce Type: new Abstract: As Large Language Models are increasingly deployed in critical applications, robustly evaluating their social biases is paramount. However, the current literature suffers from widespread methodological fragmentation, which yields…
22 -
arXiv — NLP / Computation & Language research 6d ago
Qwen-AgentWorld: Language World Models for General Agents
arXiv:2606.24597v1 Announce Type: new Abstract: A world model predicts environment dynamics based on current observations and actions, serving as a core cognitive mechanism for reasoning and planning. In this work, we investigate how world modeling based on language models can…
8 -
arXiv — NLP / Computation & Language research 6d ago
Same Lesson, Different Story: Cross-Lingual Reconstruction of Cultural Narratives in Large Language Models
arXiv:2606.24610v1 Announce Type: new Abstract: The evaluation of cultural grounding context becomes complex when multiple cultures convey the same moral lesson. This challenge is particularly relevant to large language models (LLMs), which produce narratives across a wide range…
10 -
arXiv — NLP / Computation & Language research 6d ago
Privacy-Preserving RAG via Multi-Agent Semantic Rewriting: Achieving Confidentiality Without Compromising Contextual Fidelity
arXiv:2606.24623v1 Announce Type: new Abstract: Retrieval-Augmented Generation enhances large language models by incorporating external knowledge, but deploying it in sensitive scenarios risks privacy leakage via malicious prompts. To address this, we propose a multi-agent…
30 -
arXiv — NLP / Computation & Language research 6d ago
The Warrant Gap: Claim-Conditioned Re-scoring for Fact-Checking
arXiv:2606.24627v1 Announce Type: new Abstract: Fact-checking systems built on LLMs achieve high verdict accuracy on standard benchmarks, yet routinely output Supports labels whose cited evidence does not license the claim. Structured decomposition is the natural way to inspect…
4 -
arXiv — NLP / Computation & Language research 6d ago
Measuring User's Mental Models of Speech Translation in Human-AI Collaboration
arXiv:2606.24644v1 Announce Type: new Abstract: Millions of people use machine translation (MT) tools daily, yet little is known about their perception of what systems can and cannot do. This paper studies users' mental models of speech translation systems through a new…
13 -
arXiv — NLP / Computation & Language research 6d ago
Harmonic: Hierarchical State Space Models for Efficient Long-Context Language Modeling
arXiv:2606.24650v1 Announce Type: new Abstract: We present Harmonic, a hierarchical state space model (SSM) for language modeling. The architecture stacks three recurrent levels at progressively slower timescales; each level receives the prediction error of the level below as…
21 -
arXiv — NLP / Computation & Language research 6d ago
AI-PAVE-Br: Leveraging Large Language Models for Enhanced Product Attribute Value Extraction through a Golden Set Approach
arXiv:2606.24655v1 Announce Type: new Abstract: The explosive growth and complexity of product data within the dynamic Brazilian e-commerce landscape demand robust and specialized methods for structured information extraction. Traditional approaches to Product Attribute Value…
5 -
arXiv — NLP / Computation & Language research 6d ago
DREAM: Dense Retrieval Embeddings via Autoregressive Modeling
arXiv:2606.24667v1 Announce Type: new Abstract: Dense retrieval embedding models are a fundamental component of modern retrieval-based AI systems. Most dense retrievers are trained with contrastive objectives, which require labeled positive and negative document pairs that are…
23 -
arXiv — NLP / Computation & Language research 6d ago
CN-NewsTTS Bench: a target-level automatic benchmark for raw-input Chinese news TTS pronunciation
arXiv:2606.24714v1 Announce Type: new Abstract: Chinese news text contains dense written forms such as scores, hyphenated model names, ranges, unit symbols, percentages, English abbreviations, and mixed Chinese-Latin-digit names. These forms are frequent in real listening…
33 -
arXiv — NLP / Computation & Language research 6d ago
Task Decomposition for Efficient Annotation
arXiv:2606.24734v1 Announce Type: new Abstract: High-quality annotations of structured representations are expensive to collect over large corpora. Manual annotation of structure is laborious, and model-based annotation, although cheaper to generate, requires expensive…
24 -
arXiv — NLP / Computation & Language research 6d ago
CANDLE: Character-level Arabic Noise Deduplication using Lightweight Encoder
arXiv:2606.24758v1 Announce Type: new Abstract: Handling repeated characters in text can be tricky, since they can represent either the correct spelling of a word or informal character elongation often seen in social media posts. We present CANDLE, a lightweight system for…
10 -
arXiv — NLP / Computation & Language research 6d ago
Posterior Refinement: Fast Language Generation via Any-Order Flow Maps
arXiv:2606.24773v1 Announce Type: new Abstract: Non-autoregressive generation offers a powerful paradigm for iterative refinement, allowing models to recursively critique, erase and regenerate arbitrary subsets of tokens. However, existing non-autoregressive models fail to…
37 -
arXiv — NLP / Computation & Language research 6d ago
Are We Ready For An Agent-Native Memory System?
arXiv:2606.24775v1 Announce Type: new Abstract: Memory for large language model (LLM) agents has rapidly evolved from simple retrieval-augmented mechanisms into a data management system that supports persistent information storage, retrieval, update, consolidation, and dynamic…
8 -
arXiv — NLP / Computation & Language research 6d ago
Paying to Know: Micro-Transaction Markets for Verified Product Information in Agentic E-Commerce
arXiv:2606.24783v1 Announce Type: new Abstract: Commercial NLP treats the shopping chatbot as a recommender or a conversion tool: its job is to match a user to a catalogue entry and close a sale. We argue that the arrival of agent-native micro-payment rails (e.g., x402, AP2)…
23 -
arXiv — NLP / Computation & Language research 6d ago
SHERLOC: Structured Diagnostic Localization for Code Repair Agents
arXiv:2606.24820v1 Announce Type: new Abstract: LLM agents solve repository-level coding tasks through multi-turn tool use, but utilize half their budget on locating faults before editing. Dedicated localization frameworks have emerged, yet are still evaluated as file retrieval…
20 -
arXiv — NLP / Computation & Language research 6d ago
L3Cube-MahaPOS: A Marathi Part-of-Speech Tagging Dataset and BERT Models
arXiv:2606.24825v1 Announce Type: new Abstract: Part-of-Speech (POS) tagging is a foundational NLP task underpinning machine translation, information extraction, and syntactic parsing. Despite Marathi being spoken by over 83 million people and ranking among the top twenty most…
16