arXiv — NLP / Computation & Language

500 articles archived · Visit source ↗ · RSS

arXiv — NLP / Computation & Language research 5d ago

Memory Makes the Difference: Evaluating How Different Memory Roles Shape Conversational Agents

arXiv:2606.25361v1 Announce Type: new Abstract: Prior research on memory mechanism in RAG-based conversational system has emphasized how memory is stored and retrieved. However, far less is known about how memories with different functional roles influence response quality.…

27
arXiv — NLP / Computation & Language research 5d ago

Neural Machine Translation for Low-Resource Tangkhul--English

arXiv:2606.25365v1 Announce Type: new Abstract: We present a study on low-resource machine translation for the Tangkhul-English (nmf-en) language pair. Tangkhul is a severely under-resourced Tibeto-Burman language spoken primarily in Manipur, India, with virtually no prior…

16
arXiv — NLP / Computation & Language research 5d ago

Three Buddhist Vocabularies: Computational Stylometry of the English Pali Canon across Sutta, Vinaya, and Abhidhamma

arXiv:2606.25372v1 Announce Type: new Abstract: We present a computational stylometric analysis of the Tipitaka across all three Pitakas in English translation, extending earlier work on the Sutta Pitaka alone. The corpus spans 134,831 segments from Bhikkhu Sujato's Sutta Pitaka…

18
arXiv — NLP / Computation & Language research 5d ago

Story Operators: Decomposing the Original $\to$ Sequel Transformation in Embedding Space

arXiv:2606.25379v1 Announce Type: new Abstract: I treat a book as a point in a sentence-embedding space and a literary transformation as an operation on points. Given an original novel and its sequel, I ask what it takes, geometrically, to turn the first into the second. Using…

18
arXiv — NLP / Computation & Language research 5d ago

A Survey of Toxicity Detection and Mitigation Strategies for Multilingual Language Models

arXiv:2606.25380v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed across languages, but their safety behavior remains uneven across linguistic and cultural contexts. This survey synthesizes work on toxicity detection and detoxification for…

38
arXiv — NLP / Computation & Language research 5d ago

Introducing corpora Hlava Cor and Hlava AD: Human Label Variation in Coreference and Discourse Relations

arXiv:2606.25383v1 Announce Type: new Abstract: As previous research on annotator disagreement in discourse phenomena has shown, understanding text coherence varies considerably from one individual to another. To explore this phenomenon, we created two corpora with multiple…

28
arXiv — NLP / Computation & Language research 5d ago

Beyond Next-Observation Prediction: Agent-Authored World Modeling for Sequential Decision Making

arXiv:2606.25421v1 Announce Type: new Abstract: Recent studies on world modeling for Large Language Model (LLM) agents typically formulate the learning objective as next-observation prediction. However, this objective ties supervision to what a transition happens to reveal,…

32
arXiv — NLP / Computation & Language research 5d ago

PolicyAlign: Direct Policy-Based Safety Alignment for Large Language Models

arXiv:2606.25442v1 Announce Type: new Abstract: Safety alignment of large language models (LLMs) typically depends on high-quality supervision data, such as safe demonstrations or preference pairs. However, in real-world deployment, emerging safety requirements are often…

29
arXiv — NLP / Computation & Language research 5d ago

Reclaim Evaluation: A Lossy Memory Is Worse Than an Empty One

arXiv:2606.25449v1 Announce Type: new Abstract: A language model's memory can be worse than having no memory at all. Give a model a memory that kept a wrong conclusion but dropped the work behind it, and it emits that stale value as a confident answer; give the same model an…

30
arXiv — NLP / Computation & Language research 5d ago

Probing in the Wild: A Case Study of Self-Supervised Speech Representations on Mandarin Sub-dialects with Unsupervised Articulatory Analysis

arXiv:2606.25459v1 Announce Type: new Abstract: While self-supervised speech models have achieved strong performance across speech tasks, relatively little is known about how their internal phonetic representations behave under fine-grained dialect variation. Existing probing…

11
arXiv — NLP / Computation & Language research 5d ago

Optimizing Abstractive Summarization With Fine-Tuned PEGASUS

arXiv:2606.25462v1 Announce Type: new Abstract: Abstractive text summarization is the technique of generating a short and concise summary comprising the salient ideas of a source text without making a subset of the salient sentences from the source text. The introduction of…

22
arXiv — NLP / Computation & Language research 5d ago

A Red Teaming Framework for Large Language Models: A Case Study on Faithfulness Evaluation

arXiv:2606.25476v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated remarkable performance across natural language processing tasks, yet their deployment in high-stakes applications raises critical concerns regarding reliability, safety, and…

36
arXiv — NLP / Computation & Language research 5d ago

How Reliable Is Your Jailbreak Judge? Calibration and Adversarial Robustness of Automated ASR Scoring

arXiv:2606.25487v1 Announce Type: new Abstract: Almost every paper on LLM jailbreaks and prompt injection reports an attack-success rate (ASR), and that number is assigned not by people but by an automated judge: either a safety classifier trained for the task, or a general chat…

23
arXiv — NLP / Computation & Language research 5d ago

Spam and Sentiment Detection in Arabic Tweets Using MARBERT Model

arXiv:2606.25495v1 Announce Type: new Abstract: Saudi Telecom Company (STC) is among the most popular companies in Saudi Arabia, with many customers. Yet, there is still a big room for improvement in users' satisfaction. Social media is the most robust platform to gauge users'…

37
arXiv — NLP / Computation & Language research 5d ago

Fault of Our Stars: Behavioral Drivers of Rating-Sentiment Incongruence

arXiv:2606.25518v1 Announce Type: new Abstract: When people share experiences online, they often express thoughts in two ways: a star rating and a written review. In sentiment analysis, ratings are widely used as convenient weak labels for textual sentiment, yet whether the two…

20
arXiv — NLP / Computation & Language research 5d ago

SFL-MTSC: Leveraging Semantic Frame-Level Multi-Task Self-Consistency for Robust Multi-Intent Spoken Language Understanding

arXiv:2606.25552v1 Announce Type: new Abstract: Prompt-based spoken language understanding (SLU) with large language models (LLMs) often suffers from inconsistent intent--slot structures due to decoding stochasticity, particularly in multi-intent scenarios. In view of this, we…

28
arXiv — NLP / Computation & Language research 5d ago

BiPACE: Bisimulation-Guided Policy Optimization with Action Counterfactual Estimation for LLM Agents

arXiv:2606.25556v1 Announce Type: new Abstract: Stepwise group-based RL is an attractive way to train long-horizon LLM agents without a learned critic: it reuses multiple sampled rollouts to estimate local advantages. Its weakness is less visible but more fundamental: every…

11
arXiv — NLP / Computation & Language research 5d ago

Riazi-8B: An Urdu Large Language Model for Mathematical Reasoning

arXiv:2606.25568v1 Announce Type: new Abstract: Recent LLMs demonstrate strong mathematical reasoning capabilities, but existing gains rely heavily on English-centric training resources and benchmarks. As a result, reasoning performance degrades substantially in low-resource…

27
arXiv — NLP / Computation & Language research 5d ago

Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints

arXiv:2606.25605v1 Announce Type: new Abstract: Tool Calling and Structured Output are two core capabilities of modern Agent systems, yet their interaction under joint deployment conditions remains insufficiently understood. This paper reports a reproducible phenomenon observed…

10
arXiv — NLP / Computation & Language research 5d ago

Staying In Character: Perspective-Bounded Memory For Book-Based Role-Playing Agents

arXiv:2606.25632v1 Announce Type: new Abstract: Recent LLM role-playing systems build character agents from novels by extracting characters, scenes, and relations. Yet long-narrative role-playing suffers from two failures: Factual Overreach, where shared retrieval or parametric…

30
arXiv — NLP / Computation & Language research 5d ago

MedGuards: Multi-Agent System for Reliable Medical Error Detection and Correction

arXiv:2606.25651v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly deployed in healthcare settings, accurate error detection and correction in generated or existing text becomes critical, as even minor mistakes can pose risks to patient safety.…

34
arXiv — NLP / Computation & Language research 5d ago

Is GraphRAG Needed? From Basic RAG to Graph-/Agentic Solutions with Context Optimization

arXiv:2606.25656v1 Announce Type: new Abstract: As advanced RAG variants like GraphRAG and Agentic RAG emerge, one leading question is when and how to use them. Here, we introduce a framework for different RAG scenarios evaluation and comparison on semi-structured knowledge…

21
arXiv — NLP / Computation & Language research 5d ago

BitNet Text Embeddings

arXiv:2606.25674v1 Announce Type: new Abstract: LLM-based text embedders have substantially improved retrieval and semantic representation quality, but their deployment remains costly: large backbone models slow down embedding inference, while high-dimensional full-precision…

32
arXiv — NLP / Computation & Language research 5d ago

OPERA: Aligning Open-Ended Reasoning via Objective Perplexity-based Reinforcement Learning

arXiv:2606.25757v1 Announce Type: new Abstract: Reinforcement Learning (RL) has enabled LLMs to excel in objective reasoning tasks such as mathematics and code generation. However, applying RL to open-ended tasks, such as creative writing, remains challenging because…

22
arXiv — NLP / Computation & Language research 5d ago

Do Encoders Suffice? A Systematic Comparison of Encoder and Decoder Safety Judges for LLM Adversarial Evaluation

arXiv:2606.25782v1 Announce Type: new Abstract: With the widespread adoption of large language models (LLMs) in chatbots and everyday applications, companies increasingly need guardrails that are effective while remaining low-cost and low-latency. Safety evaluation of LLM…

18
arXiv — NLP / Computation & Language research 5d ago

Beyond Function Calling: Benchmarking Tool-Using Agents under Tool-Environment Unreliability

arXiv:2606.25819v1 Announce Type: new Abstract: Large language models are increasingly deployed as agents that solve tasks by interacting with external tool environments. Although recent tool-use benchmarks increasingly cover complex task settings, they still largely assume…

26
arXiv — NLP / Computation & Language research 5d ago

SARA: Unlocking Multilingual Knowledge in Mixture-of-Experts via Semantically Anchored Routing Alignment

arXiv:2606.25821v1 Announce Type: new Abstract: Sparse Mixture-of-Experts (MoE) architectures have emerged as an increasingly influential paradigm as they offer a strategic balance between parameter scalability and computational efficiency. However, low-resource languages, which…

21
arXiv — NLP / Computation & Language research 5d ago

Overview of HIPE-2026: Person-Place Relation Extraction from Multilingual Historical Texts

arXiv:2606.25935v1 Announce Type: new Abstract: Was this person ever at that place, and if so, when? Answering such questions from noisy, multilingual historical documents is the central challenge of HIPE-2026, the third edition of the HIPE evaluation series. Moving from named…

14
arXiv — NLP / Computation & Language research 5d ago

Weave of Formal Thought

arXiv:2606.25987v1 Announce Type: new Abstract: Large language models (LLMs) attain remarkable surface fluency on code, yet they neither formally guarantee the syntactic validity of their output nor leverage the hierarchical structure defining the target language. While existing…

18
arXiv — NLP / Computation & Language research 5d ago

SpeechEQ: Benchmarking Emotional Intelligence Quotient in Socially Aware Voice Conversational Models

arXiv:2606.25990v1 Announce Type: new Abstract: As multimodal conversational systems increasingly engage in spoken interaction, their ability to navigate paralinguistic social cues has become a critical bottleneck for natural human-AI communication. However, existing evaluations…

29
arXiv — NLP / Computation & Language research 5d ago

Dziri Voicebot: An End-to-End Low-Resource Speech-to-Speech Conversational System for Algerian Dialect

arXiv:2606.26003v1 Announce Type: new Abstract: Automatic speech and language technologies are still heavily biased toward high-resource languages, limiting their applicability to dialectal and low-resource settings such as Algerian Dialect. This language presents additional…

28
arXiv — NLP / Computation & Language research 5d ago

The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar

arXiv:2606.26015v1 Announce Type: new Abstract: Text detoxification, the automated detection and mitigation of abusive and harmful content, is essential for ensuring the safety of online communities and protecting users. However, low resource languages such as Tatar have…

10
arXiv — NLP / Computation & Language research 5d ago

Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It

arXiv:2606.26027v1 Announce Type: new Abstract: Tool use enables large language models (LLMs) to perform complex tasks, and recent agentic reinforcement learning (RL) methods show promise for enhancing model capabilities. However, RL alone often leads to instability or limited…

17
arXiv — NLP / Computation & Language research 5d ago

Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning

arXiv:2606.26036v1 Announce Type: new Abstract: Training-time data poisoning during fine-tuning poses a significant threat to large language models (LLMs) deployed for abstractive text summarization, where small task-specific datasets exert disproportionate influence on model…

26
arXiv — NLP / Computation & Language research 5d ago

AI translation of literary texts is "fine", but readers still prefer human translations

arXiv:2606.26040v1 Announce Type: new Abstract: AI translation of literary works is increasingly common. While the content may be rendered adequately, we do not know enough about how readers experience it in terms of immersiveness and literary effect, aspects poorly captured by…

35
arXiv — NLP / Computation & Language research 5d ago

When Certainty Is an Artifact: Keyword Lexicon Blindness and the (Mis)Measurement of Rhetorical Stance

arXiv:2606.26062v1 Announce Type: new Abstract: Can a statistically significant, large-effect-size finding in computational social science be entirely an artifact of the measurement instrument? We present a case where the answer appears to be yes. Analyzing 85 interviews across…

20
arXiv — NLP / Computation & Language research 5d ago

Same Evidence, Different Answer: Auditing Order Sensitivity in Multimodal Large Language Models

arXiv:2606.26079v1 Announce Type: new Abstract: Standard benchmarks for multimodal large language models (MLLMs) score each item on one canonical ordering and miss whether order-irrelevant shuffling changes the answer, a baseline reliability property called for by emerging AI…

31
arXiv — NLP / Computation & Language research 5d ago

Real-Time Voice AI Hears but Does Not Listen

arXiv:2606.26083v1 Announce Type: new Abstract: Speech conveys information through both words and vocal delivery. We evaluate four leading production realtime voice systems-OpenAI's GPT Realtime 2, Google's Gemini 3.1 Flash Live, and Alibaba's Qwen3.5 Omni Plus and Omni Flash-on…

34
arXiv — NLP / Computation & Language research 5d ago

Invisible to humans, visible to machines: a preregistered audit of Unicode fidelity across four biomedical bibliographic APIs

arXiv:2606.24897v1 Announce Type: cross Abstract: Biomedical text mining, scientometrics, and the construction of training corpora for biomedical large language models (LLMs) all assume that the abstract text returned by a bibliographic API faithfully reproduces the published…

8
arXiv — NLP / Computation & Language research 5d ago

The Hitchhiker's Guide to Agentic AI: From Foundations to Systems

arXiv:2606.24937v1 Announce Type: cross Abstract: The Hitchhiker's Guide to Agentic AI is a comprehensive practitioner's reference for building autonomous AI systems. The book covers the full stack from first principles to production deployment, organized around a central…

25
arXiv — NLP / Computation & Language research 5d ago

Digital Twin-Driven Adaptive Sim-to-Real Alignment via Reinforcement Learning for Vibration-Based Bearing Health Monitoring Under Data Scarcity

arXiv:2606.24954v1 Announce Type: cross Abstract: Vibration-based health monitoring of rotating machinery requires reliable fault diagnosis under operational data constraints, yet condition assessment remains challenged by structural scarcity of fault events and heterogeneous…

30
arXiv — NLP / Computation & Language research 5d ago

Why Do Accumulated Transformations Extrapolate?

arXiv:2606.24975v1 Announce Type: cross Abstract: PaTH Attention showed that replacing RoPE's position-indexed rotations with accumulated data-dependent Householder reflections yields strong length extrapolation, though performance degrades at extreme context lengths. We ask…

22
arXiv — NLP / Computation & Language research 5d ago

Diagnosing and Mitigating Compounding Failures in Agentic Persuasion via Taxonomic Strategy Retrieval

arXiv:2606.24976v1 Announce Type: cross Abstract: Foundation-model agents in multi-step, open-ended environments frequently suffer from compounding errors, where early mistakes contaminate long-horizon trajectories. While Multi-Agent Debate (MAD) succeeds in deterministic…

10
arXiv — NLP / Computation & Language research 5d ago

Learning Diachronic Representations of Ancient Greek Letterforms

arXiv:2606.24984v1 Announce Type: cross Abstract: Learning representations that remain robust across centuries of variation in handwriting is a key challenge in diachronic representation learning. Taking one of the longest continuously used writing systems, ancient Greek, as a…

27
arXiv — NLP / Computation & Language research 5d ago

Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients

arXiv:2606.25008v1 Announce Type: cross Abstract: Neural scaling laws describe how pre-training loss decays as power laws with training time, model size, and compute. This position paper argues that the exponents of these power laws are fixed by generic mechanisms: a one-third…

13
arXiv — NLP / Computation & Language research 5d ago

Emergent Capabilities Arise Randomly from Learning Sparse Attention Patterns

arXiv:2606.25010v1 Announce Type: cross Abstract: Neural scaling laws for transformer language models predict smooth improvements in pretraining loss with increasing parameters, but downstream capabilities such as in-context learning are known to emerge abruptly past a certain…

19
arXiv — NLP / Computation & Language research 5d ago

Do Thinking Tokens Help with Safety?

arXiv:2606.25013v1 Announce Type: cross Abstract: Today's reasoning models use thinking tokens to attain stronger performance on benchmarks than their instruction-tuned counterparts. It is also generally believed that this more "deliberative" mode should improve alignment and…

37
arXiv — NLP / Computation & Language research 5d ago

LLM-ACES: Closed-Loop Discovery of Dynamical Systems with LLM-Guided Adaptive Search

arXiv:2606.25039v1 Announce Type: cross Abstract: Recovering governing Ordinary Differential Equations (ODEs) from data is a central challenge in modeling dynamical systems across scientific domains. Existing approaches cast discovery as a static inference problem over fixed…

35
arXiv — NLP / Computation & Language research 5d ago

To Isolate or to Score? Model-Adaptive Assessment for Cost-Efficient Multi-Agent RAG

arXiv:2606.25191v1 Announce Type: cross Abstract: Multi-agent document assessment for retrieval-augmented generation is computationally expensive, driving practitioners toward smaller, deployable models whose assessment mechanisms remain poorly understood. We conduct a…

29
arXiv — NLP / Computation & Language research 5d ago

RAVEN: Long-Horizon Reasoning & Navigation with a Visuo-Spatio-Temporal Memory

arXiv:2606.25206v1 Announce Type: cross Abstract: Long-term robot deployment requires a compact and scalable memory that preserves fine-grained visual semantics, grounds observations in space and time, and enables efficient storage and retrieval. In this paper, we propose RAVEN,…

21

Memory Makes the Difference: Evaluating How Different Memory Roles Shape Conversational Agents

Neural Machine Translation for Low-Resource Tangkhul--English

Three Buddhist Vocabularies: Computational Stylometry of the English Pali Canon across Sutta, Vinaya, and Abhidhamma

Story Operators: Decomposing the Original $\to$ Sequel Transformation in Embedding Space

A Survey of Toxicity Detection and Mitigation Strategies for Multilingual Language Models

Introducing corpora Hlava Cor and Hlava AD: Human Label Variation in Coreference and Discourse Relations

Beyond Next-Observation Prediction: Agent-Authored World Modeling for Sequential Decision Making

PolicyAlign: Direct Policy-Based Safety Alignment for Large Language Models

Reclaim Evaluation: A Lossy Memory Is Worse Than an Empty One

Probing in the Wild: A Case Study of Self-Supervised Speech Representations on Mandarin Sub-dialects with Unsupervised Articulatory Analysis

Optimizing Abstractive Summarization With Fine-Tuned PEGASUS

A Red Teaming Framework for Large Language Models: A Case Study on Faithfulness Evaluation

How Reliable Is Your Jailbreak Judge? Calibration and Adversarial Robustness of Automated ASR Scoring

Spam and Sentiment Detection in Arabic Tweets Using MARBERT Model

Fault of Our Stars: Behavioral Drivers of Rating-Sentiment Incongruence

SFL-MTSC: Leveraging Semantic Frame-Level Multi-Task Self-Consistency for Robust Multi-Intent Spoken Language Understanding

BiPACE: Bisimulation-Guided Policy Optimization with Action Counterfactual Estimation for LLM Agents

Riazi-8B: An Urdu Large Language Model for Mathematical Reasoning

Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints

Staying In Character: Perspective-Bounded Memory For Book-Based Role-Playing Agents

MedGuards: Multi-Agent System for Reliable Medical Error Detection and Correction

Is GraphRAG Needed? From Basic RAG to Graph-/Agentic Solutions with Context Optimization

BitNet Text Embeddings

OPERA: Aligning Open-Ended Reasoning via Objective Perplexity-based Reinforcement Learning

Do Encoders Suffice? A Systematic Comparison of Encoder and Decoder Safety Judges for LLM Adversarial Evaluation

Beyond Function Calling: Benchmarking Tool-Using Agents under Tool-Environment Unreliability

SARA: Unlocking Multilingual Knowledge in Mixture-of-Experts via Semantically Anchored Routing Alignment

Overview of HIPE-2026: Person-Place Relation Extraction from Multilingual Historical Texts

Weave of Formal Thought

SpeechEQ: Benchmarking Emotional Intelligence Quotient in Socially Aware Voice Conversational Models

Dziri Voicebot: An End-to-End Low-Resource Speech-to-Speech Conversational System for Algerian Dialect

The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar

Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It

Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning

AI translation of literary texts is "fine", but readers still prefer human translations

When Certainty Is an Artifact: Keyword Lexicon Blindness and the (Mis)Measurement of Rhetorical Stance

Same Evidence, Different Answer: Auditing Order Sensitivity in Multimodal Large Language Models

Real-Time Voice AI Hears but Does Not Listen

Invisible to humans, visible to machines: a preregistered audit of Unicode fidelity across four biomedical bibliographic APIs

The Hitchhiker's Guide to Agentic AI: From Foundations to Systems

Digital Twin-Driven Adaptive Sim-to-Real Alignment via Reinforcement Learning for Vibration-Based Bearing Health Monitoring Under Data Scarcity

Why Do Accumulated Transformations Extrapolate?

Diagnosing and Mitigating Compounding Failures in Agentic Persuasion via Taxonomic Strategy Retrieval

Learning Diachronic Representations of Ancient Greek Letterforms

Neural Scaling Universality: If Exponents Are Fixed, Time to Understand Coefficients

Emergent Capabilities Arise Randomly from Learning Sparse Attention Patterns

Do Thinking Tokens Help with Safety?

LLM-ACES: Closed-Loop Discovery of Dynamical Systems with LLM-Guided Adaptive Search

To Isolate or to Score? Model-Adaptive Assessment for Cost-Efficient Multi-Agent RAG

RAVEN: Long-Horizon Reasoning & Navigation with a Visuo-Spatio-Temporal Memory