arXiv — NLP / Computation & Language
500 articles archived · Visit source ↗ · RSS
-
arXiv — NLP / Computation & Language research 5d ago
Memory Makes the Difference: Evaluating How Different Memory Roles Shape Conversational Agents
arXiv:2606.25361v1 Announce Type: new Abstract: Prior research on memory mechanism in RAG-based conversational system has emphasized how memory is stored and retrieved. However, far less is known about how memories with different functional roles influence response quality.…
27 -
arXiv — NLP / Computation & Language research 5d ago
Neural Machine Translation for Low-Resource Tangkhul--English
arXiv:2606.25365v1 Announce Type: new Abstract: We present a study on low-resource machine translation for the Tangkhul-English (nmf-en) language pair. Tangkhul is a severely under-resourced Tibeto-Burman language spoken primarily in Manipur, India, with virtually no prior…
16 -
arXiv — NLP / Computation & Language research 5d ago
Three Buddhist Vocabularies: Computational Stylometry of the English Pali Canon across Sutta, Vinaya, and Abhidhamma
arXiv:2606.25372v1 Announce Type: new Abstract: We present a computational stylometric analysis of the Tipitaka across all three Pitakas in English translation, extending earlier work on the Sutta Pitaka alone. The corpus spans 134,831 segments from Bhikkhu Sujato's Sutta Pitaka…
18 -
arXiv — NLP / Computation & Language research 5d ago
Story Operators: Decomposing the Original $\to$ Sequel Transformation in Embedding Space
arXiv:2606.25379v1 Announce Type: new Abstract: I treat a book as a point in a sentence-embedding space and a literary transformation as an operation on points. Given an original novel and its sequel, I ask what it takes, geometrically, to turn the first into the second. Using…
18 -
arXiv — NLP / Computation & Language research 5d ago
A Survey of Toxicity Detection and Mitigation Strategies for Multilingual Language Models
arXiv:2606.25380v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed across languages, but their safety behavior remains uneven across linguistic and cultural contexts. This survey synthesizes work on toxicity detection and detoxification for…
38 -
arXiv — NLP / Computation & Language research 5d ago
Introducing corpora Hlava Cor and Hlava AD: Human Label Variation in Coreference and Discourse Relations
arXiv:2606.25383v1 Announce Type: new Abstract: As previous research on annotator disagreement in discourse phenomena has shown, understanding text coherence varies considerably from one individual to another. To explore this phenomenon, we created two corpora with multiple…
28 -
arXiv — NLP / Computation & Language research 5d ago
Beyond Next-Observation Prediction: Agent-Authored World Modeling for Sequential Decision Making
arXiv:2606.25421v1 Announce Type: new Abstract: Recent studies on world modeling for Large Language Model (LLM) agents typically formulate the learning objective as next-observation prediction. However, this objective ties supervision to what a transition happens to reveal,…
32 -
arXiv — NLP / Computation & Language research 5d ago
PolicyAlign: Direct Policy-Based Safety Alignment for Large Language Models
arXiv:2606.25442v1 Announce Type: new Abstract: Safety alignment of large language models (LLMs) typically depends on high-quality supervision data, such as safe demonstrations or preference pairs. However, in real-world deployment, emerging safety requirements are often…
29 -
arXiv — NLP / Computation & Language research 5d ago
Reclaim Evaluation: A Lossy Memory Is Worse Than an Empty One
arXiv:2606.25449v1 Announce Type: new Abstract: A language model's memory can be worse than having no memory at all. Give a model a memory that kept a wrong conclusion but dropped the work behind it, and it emits that stale value as a confident answer; give the same model an…
30 -
arXiv — NLP / Computation & Language research 5d ago
Probing in the Wild: A Case Study of Self-Supervised Speech Representations on Mandarin Sub-dialects with Unsupervised Articulatory Analysis
arXiv:2606.25459v1 Announce Type: new Abstract: While self-supervised speech models have achieved strong performance across speech tasks, relatively little is known about how their internal phonetic representations behave under fine-grained dialect variation. Existing probing…
11 -
arXiv — NLP / Computation & Language research 5d ago
Optimizing Abstractive Summarization With Fine-Tuned PEGASUS
arXiv:2606.25462v1 Announce Type: new Abstract: Abstractive text summarization is the technique of generating a short and concise summary comprising the salient ideas of a source text without making a subset of the salient sentences from the source text. The introduction of…
22 -
arXiv — NLP / Computation & Language research 5d ago
A Red Teaming Framework for Large Language Models: A Case Study on Faithfulness Evaluation
arXiv:2606.25476v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated remarkable performance across natural language processing tasks, yet their deployment in high-stakes applications raises critical concerns regarding reliability, safety, and…
36 -
arXiv — NLP / Computation & Language research 5d ago
How Reliable Is Your Jailbreak Judge? Calibration and Adversarial Robustness of Automated ASR Scoring
arXiv:2606.25487v1 Announce Type: new Abstract: Almost every paper on LLM jailbreaks and prompt injection reports an attack-success rate (ASR), and that number is assigned not by people but by an automated judge: either a safety classifier trained for the task, or a general chat…
23 -
arXiv — NLP / Computation & Language research 5d ago
Spam and Sentiment Detection in Arabic Tweets Using MARBERT Model
arXiv:2606.25495v1 Announce Type: new Abstract: Saudi Telecom Company (STC) is among the most popular companies in Saudi Arabia, with many customers. Yet, there is still a big room for improvement in users' satisfaction. Social media is the most robust platform to gauge users'…
37 -
arXiv — NLP / Computation & Language research 5d ago
Fault of Our Stars: Behavioral Drivers of Rating-Sentiment Incongruence
arXiv:2606.25518v1 Announce Type: new Abstract: When people share experiences online, they often express thoughts in two ways: a star rating and a written review. In sentiment analysis, ratings are widely used as convenient weak labels for textual sentiment, yet whether the two…
20 -
arXiv — NLP / Computation & Language research 5d ago
SFL-MTSC: Leveraging Semantic Frame-Level Multi-Task Self-Consistency for Robust Multi-Intent Spoken Language Understanding
arXiv:2606.25552v1 Announce Type: new Abstract: Prompt-based spoken language understanding (SLU) with large language models (LLMs) often suffers from inconsistent intent--slot structures due to decoding stochasticity, particularly in multi-intent scenarios. In view of this, we…
28 -
arXiv — NLP / Computation & Language research 5d ago
BiPACE: Bisimulation-Guided Policy Optimization with Action Counterfactual Estimation for LLM Agents
arXiv:2606.25556v1 Announce Type: new Abstract: Stepwise group-based RL is an attractive way to train long-horizon LLM agents without a learned critic: it reuses multiple sampled rollouts to estimate local advantages. Its weakness is less visible but more fundamental: every…
11 -
arXiv — NLP / Computation & Language research 5d ago
Riazi-8B: An Urdu Large Language Model for Mathematical Reasoning
arXiv:2606.25568v1 Announce Type: new Abstract: Recent LLMs demonstrate strong mathematical reasoning capabilities, but existing gains rely heavily on English-centric training resources and benchmarks. As a result, reasoning performance degrades substantially in low-resource…
27 -
arXiv — NLP / Computation & Language research 5d ago
Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints
arXiv:2606.25605v1 Announce Type: new Abstract: Tool Calling and Structured Output are two core capabilities of modern Agent systems, yet their interaction under joint deployment conditions remains insufficiently understood. This paper reports a reproducible phenomenon observed…
10 -
arXiv — NLP / Computation & Language research 5d ago
Staying In Character: Perspective-Bounded Memory For Book-Based Role-Playing Agents
arXiv:2606.25632v1 Announce Type: new Abstract: Recent LLM role-playing systems build character agents from novels by extracting characters, scenes, and relations. Yet long-narrative role-playing suffers from two failures: Factual Overreach, where shared retrieval or parametric…
30 -
arXiv — NLP / Computation & Language research 5d ago
MedGuards: Multi-Agent System for Reliable Medical Error Detection and Correction
arXiv:2606.25651v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly deployed in healthcare settings, accurate error detection and correction in generated or existing text becomes critical, as even minor mistakes can pose risks to patient safety.…
34 -
arXiv — NLP / Computation & Language research 5d ago
Is GraphRAG Needed? From Basic RAG to Graph-/Agentic Solutions with Context Optimization
arXiv:2606.25656v1 Announce Type: new Abstract: As advanced RAG variants like GraphRAG and Agentic RAG emerge, one leading question is when and how to use them. Here, we introduce a framework for different RAG scenarios evaluation and comparison on semi-structured knowledge…
21 -
arXiv — NLP / Computation & Language research 5d ago
BitNet Text Embeddings
arXiv:2606.25674v1 Announce Type: new Abstract: LLM-based text embedders have substantially improved retrieval and semantic representation quality, but their deployment remains costly: large backbone models slow down embedding inference, while high-dimensional full-precision…
32 -
arXiv — NLP / Computation & Language research 5d ago
OPERA: Aligning Open-Ended Reasoning via Objective Perplexity-based Reinforcement Learning
arXiv:2606.25757v1 Announce Type: new Abstract: Reinforcement Learning (RL) has enabled LLMs to excel in objective reasoning tasks such as mathematics and code generation. However, applying RL to open-ended tasks, such as creative writing, remains challenging because…
22 -
arXiv — NLP / Computation & Language research 5d ago
Do Encoders Suffice? A Systematic Comparison of Encoder and Decoder Safety Judges for LLM Adversarial Evaluation
arXiv:2606.25782v1 Announce Type: new Abstract: With the widespread adoption of large language models (LLMs) in chatbots and everyday applications, companies increasingly need guardrails that are effective while remaining low-cost and low-latency. Safety evaluation of LLM…
18 -
arXiv — NLP / Computation & Language research 5d ago
Beyond Function Calling: Benchmarking Tool-Using Agents under Tool-Environment Unreliability
arXiv:2606.25819v1 Announce Type: new Abstract: Large language models are increasingly deployed as agents that solve tasks by interacting with external tool environments. Although recent tool-use benchmarks increasingly cover complex task settings, they still largely assume…
26 -
arXiv — NLP / Computation & Language research 5d ago
SARA: Unlocking Multilingual Knowledge in Mixture-of-Experts via Semantically Anchored Routing Alignment
arXiv:2606.25821v1 Announce Type: new Abstract: Sparse Mixture-of-Experts (MoE) architectures have emerged as an increasingly influential paradigm as they offer a strategic balance between parameter scalability and computational efficiency. However, low-resource languages, which…
21 -
arXiv — NLP / Computation & Language research 5d ago
Overview of HIPE-2026: Person-Place Relation Extraction from Multilingual Historical Texts
arXiv:2606.25935v1 Announce Type: new Abstract: Was this person ever at that place, and if so, when? Answering such questions from noisy, multilingual historical documents is the central challenge of HIPE-2026, the third edition of the HIPE evaluation series. Moving from named…
14 -
arXiv — NLP / Computation & Language research 5d ago
Weave of Formal Thought
arXiv:2606.25987v1 Announce Type: new Abstract: Large language models (LLMs) attain remarkable surface fluency on code, yet they neither formally guarantee the syntactic validity of their output nor leverage the hierarchical structure defining the target language. While existing…
18 -
arXiv — NLP / Computation & Language research 5d ago
SpeechEQ: Benchmarking Emotional Intelligence Quotient in Socially Aware Voice Conversational Models
arXiv:2606.25990v1 Announce Type: new Abstract: As multimodal conversational systems increasingly engage in spoken interaction, their ability to navigate paralinguistic social cues has become a critical bottleneck for natural human-AI communication. However, existing evaluations…
29 -
arXiv — NLP / Computation & Language research 5d ago
Dziri Voicebot: An End-to-End Low-Resource Speech-to-Speech Conversational System for Algerian Dialect
arXiv:2606.26003v1 Announce Type: new Abstract: Automatic speech and language technologies are still heavily biased toward high-resource languages, limiting their applicability to dialectal and low-resource settings such as Algerian Dialect. This language presents additional…
28 -
arXiv — NLP / Computation & Language research 5d ago
The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar
arXiv:2606.26015v1 Announce Type: new Abstract: Text detoxification, the automated detection and mitigation of abusive and harmful content, is essential for ensuring the safety of online communities and protecting users. However, low resource languages such as Tatar have…
10 -
arXiv — NLP / Computation & Language research 5d ago
Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It
arXiv:2606.26027v1 Announce Type: new Abstract: Tool use enables large language models (LLMs) to perform complex tasks, and recent agentic reinforcement learning (RL) methods show promise for enhancing model capabilities. However, RL alone often leads to instability or limited…
17 -
arXiv — NLP / Computation & Language research 5d ago
Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning
arXiv:2606.26036v1 Announce Type: new Abstract: Training-time data poisoning during fine-tuning poses a significant threat to large language models (LLMs) deployed for abstractive text summarization, where small task-specific datasets exert disproportionate influence on model…
26 -
arXiv — NLP / Computation & Language research 5d ago
AI translation of literary texts is "fine", but readers still prefer human translations
arXiv:2606.26040v1 Announce Type: new Abstract: AI translation of literary works is increasingly common. While the content may be rendered adequately, we do not know enough about how readers experience it in terms of immersiveness and literary effect, aspects poorly captured by…
35 -
arXiv — NLP / Computation & Language research 5d ago
When Certainty Is an Artifact: Keyword Lexicon Blindness and the (Mis)Measurement of Rhetorical Stance
arXiv:2606.26062v1 Announce Type: new Abstract: Can a statistically significant, large-effect-size finding in computational social science be entirely an artifact of the measurement instrument? We present a case where the answer appears to be yes. Analyzing 85 interviews across…
20 -
arXiv — NLP / Computation & Language research 5d ago
Same Evidence, Different Answer: Auditing Order Sensitivity in Multimodal Large Language Models
arXiv:2606.26079v1 Announce Type: new Abstract: Standard benchmarks for multimodal large language models (MLLMs) score each item on one canonical ordering and miss whether order-irrelevant shuffling changes the answer, a baseline reliability property called for by emerging AI…
31 -
arXiv — NLP / Computation & Language research 5d ago
Real-Time Voice AI Hears but Does Not Listen
arXiv:2606.26083v1 Announce Type: new Abstract: Speech conveys information through both words and vocal delivery. We evaluate four leading production realtime voice systems-OpenAI's GPT Realtime 2, Google's Gemini 3.1 Flash Live, and Alibaba's Qwen3.5 Omni Plus and Omni Flash-on…
34 -
arXiv — NLP / Computation & Language research 5d ago
Invisible to humans, visible to machines: a preregistered audit of Unicode fidelity across four biomedical bibliographic APIs
arXiv:2606.24897v1 Announce Type: cross Abstract: Biomedical text mining, scientometrics, and the construction of training corpora for biomedical large language models (LLMs) all assume that the abstract text returned by a bibliographic API faithfully reproduces the published…
8 -
arXiv — NLP / Computation & Language research 5d ago
The Hitchhiker's Guide to Agentic AI: From Foundations to Systems
arXiv:2606.24937v1 Announce Type: cross Abstract: The Hitchhiker's Guide to Agentic AI is a comprehensive practitioner's reference for building autonomous AI systems. The book covers the full stack from first principles to production deployment, organized around a central…
25 -
arXiv — NLP / Computation & Language research 5d ago
Digital Twin-Driven Adaptive Sim-to-Real Alignment via Reinforcement Learning for Vibration-Based Bearing Health Monitoring Under Data Scarcity
arXiv:2606.24954v1 Announce Type: cross Abstract: Vibration-based health monitoring of rotating machinery requires reliable fault diagnosis under operational data constraints, yet condition assessment remains challenged by structural scarcity of fault events and heterogeneous…
30 -
arXiv — NLP / Computation & Language research 5d ago
Why Do Accumulated Transformations Extrapolate?
arXiv:2606.24975v1 Announce Type: cross Abstract: PaTH Attention showed that replacing RoPE's position-indexed rotations with accumulated data-dependent Householder reflections yields strong length extrapolation, though performance degrades at extreme context lengths. We ask…
22 -
arXiv — NLP / Computation & Language research 5d ago
Diagnosing and Mitigating Compounding Failures in Agentic Persuasion via Taxonomic Strategy Retrieval
arXiv:2606.24976v1 Announce Type: cross Abstract: Foundation-model agents in multi-step, open-ended environments frequently suffer from compounding errors, where early mistakes contaminate long-horizon trajectories. While Multi-Agent Debate (MAD) succeeds in deterministic…
10 -
arXiv — NLP / Computation & Language research 5d ago
Learning Diachronic Representations of Ancient Greek Letterforms
arXiv:2606.24984v1 Announce Type: cross Abstract: Learning representations that remain robust across centuries of variation in handwriting is a key challenge in diachronic representation learning. Taking one of the longest continuously used writing systems, ancient Greek, as a…
27 -
arXiv — NLP / Computation & Language research 5d ago
Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients
arXiv:2606.25008v1 Announce Type: cross Abstract: Neural scaling laws describe how pre-training loss decays as power laws with training time, model size, and compute. This position paper argues that the exponents of these power laws are fixed by generic mechanisms: a one-third…
13 -
arXiv — NLP / Computation & Language research 5d ago
Emergent Capabilities Arise Randomly from Learning Sparse Attention Patterns
arXiv:2606.25010v1 Announce Type: cross Abstract: Neural scaling laws for transformer language models predict smooth improvements in pretraining loss with increasing parameters, but downstream capabilities such as in-context learning are known to emerge abruptly past a certain…
19 -
arXiv — NLP / Computation & Language research 5d ago
Do Thinking Tokens Help with Safety?
arXiv:2606.25013v1 Announce Type: cross Abstract: Today's reasoning models use thinking tokens to attain stronger performance on benchmarks than their instruction-tuned counterparts. It is also generally believed that this more "deliberative" mode should improve alignment and…
37 -
arXiv — NLP / Computation & Language research 5d ago
LLM-ACES: Closed-Loop Discovery of Dynamical Systems with LLM-Guided Adaptive Search
arXiv:2606.25039v1 Announce Type: cross Abstract: Recovering governing Ordinary Differential Equations (ODEs) from data is a central challenge in modeling dynamical systems across scientific domains. Existing approaches cast discovery as a static inference problem over fixed…
35 -
arXiv — NLP / Computation & Language research 5d ago
To Isolate or to Score? Model-Adaptive Assessment for Cost-Efficient Multi-Agent RAG
arXiv:2606.25191v1 Announce Type: cross Abstract: Multi-agent document assessment for retrieval-augmented generation is computationally expensive, driving practitioners toward smaller, deployable models whose assessment mechanisms remain poorly understood. We conduct a…
29 -
arXiv — NLP / Computation & Language research 5d ago
RAVEN: Long-Horizon Reasoning & Navigation with a Visuo-Spatio-Temporal Memory
arXiv:2606.25206v1 Announce Type: cross Abstract: Long-term robot deployment requires a compact and scalable memory that preserves fine-grained visual semantics, grounds observations in space and time, and enables efficient storage and retrieval. In this paper, we propose RAVEN,…
21