arXiv — NLP / Computation & Language

500 articles archived · Visit source ↗ · RSS

arXiv — NLP / Computation & Language research 1d ago

Joint Transcription and Decryption of Images of Encrypted Handwritten Documents: A Comparison with the Traditional Pipeline

arXiv:2606.27700v1 Announce Type: cross Abstract: Historical encrypted manuscripts present a challenging problem at the intersection of cryptology, linguistics, paleography, and computer vision. Current automatic decipherment approaches usually rely on a two-stage pipeline:…

7
arXiv — NLP / Computation & Language research 1d ago

Verifiable Geometry Problem Solving: Solver-Driven Autoformalization and Theorem Proposing

arXiv:2606.27926v1 Announce Type: cross Abstract: Geometry Problem Solving have increasingly adopt the neuro-symbolic paradigm, combining neural intuition with symbolic rigor. However, current frameworks suffer from severe bottlenecks in two core stages: autoformalization, which…

28
arXiv — NLP / Computation & Language research 1d ago

AI Persuasive Framing in Collective Dilemmas

arXiv:2606.27951v1 Announce Type: cross Abstract: AI agents are promising tools that can act as flexible behavioral nudges to enhance human cooperation in addressing large-scale societal problems. However, evidence on whether AI agents can effectively boost cooperation remains…

32
arXiv — NLP / Computation & Language research 1d ago

DG^VoiC: Speaker Clustering for Fraud Investigation under Real Call-Centre Conditions

arXiv:2606.28048v1 Announce Type: cross Abstract: Insurance fraud remains costly and operationally difficult, particularly in call-centre workflows where many customer interactions begin at FNOL. While recent fraud detection methods mainly rely on structured data, text, or…

19
arXiv — NLP / Computation & Language research 1d ago

Single and Multi Truth Data Fusion using Large Language Models

arXiv:2606.28062v1 Announce Type: cross Abstract: Data fusion, also known as truth discovery, is a data integration problem that aims to determine the correct value or set of values for each attribute of an object when presented with potentially conflicting values from multiple…

27
arXiv — NLP / Computation & Language research 1d ago

Scaling limit of the Random Language Model

arXiv:2606.28105v1 Announce Type: cross Abstract: We develop a quantitative theory of the Random Language Model (RLM), an ensemble of stochastic context-free grammars, in a scaling limit where the number of hidden symbols $N \to \infty$ while the grammar temperature…

10
arXiv — NLP / Computation & Language research 1d ago

HPRO: Hierarchical Progressive Reward Optimization via Preference Extraction for Emotional Text-to-Speech

arXiv:2606.28249v1 Announce Type: cross Abstract: Recently, Large Language Model (LLM)-based Text-to-Speech (TTS) models have achieved remarkable naturalness. However, the standard Supervised Fine-Tuning paradigm often converges to statistically averaged prosody, limiting…

20
arXiv — NLP / Computation & Language research 1d ago

Towards Automating Scientific Review with Google's Paper Assistant Tool

arXiv:2606.28277v1 Announce Type: cross Abstract: Artificial intelligence is driving a revolution in scientific discovery, accelerating everything from hypothesis generation to mathematical theorem proving. However, this rapid acceleration is creating a systemic challenge:…

24
arXiv — NLP / Computation & Language research 1d ago

Continual Memorization of Factoids in Language Models

arXiv:2411.07175v3 Announce Type: replace Abstract: As new knowledge rapidly accumulates, language models (LMs) with pretrained knowledge quickly become obsolete. A common approach to updating LMs is fine-tuning them directly on new knowledge. However, recent studies have shown…

27
arXiv — NLP / Computation & Language research 1d ago

ReFreeKV: Towards Threshold-Free KV Cache Compression

arXiv:2502.16886v4 Announce Type: replace Abstract: To reduce memory consumption during LLM inference, a handful of methods have been proposed for KV cache pruning. While these techniques can accomplish lossless memory reduction on many datasets, they often hinge on an…

28
arXiv — NLP / Computation & Language research 1d ago

On the Effect of Uncertainty on Layer-wise Inference Dynamics

arXiv:2507.06722v2 Announce Type: replace Abstract: Understanding how large language models (LLMs) internally represent and process their predictions is central to detecting uncertainty and preventing hallucinations. While several studies have shown that models encode…

33
arXiv — NLP / Computation & Language research 1d ago

Training-free Truthfulness Detection via Sparse MLP Value Vectors

arXiv:2509.17932v2 Announce Type: replace Abstract: Large language models (LLMs) are prone to generating factually incorrect content, motivating methods for assessing truthfulness from internal model signals. While supervised probing approaches can be effective, they require…

5
arXiv — NLP / Computation & Language research 1d ago

Check Yourself Before You Wreck Yourself: Selectively Quitting Improves LLM Agent Safety

arXiv:2510.16492v4 Announce Type: replace Abstract: As Large Language Model (LLM) agents increasingly operate in complex environments with real-world consequences, their safety becomes critical. While uncertainty quantification is well-studied for single-turn tasks, multi-turn…

20
arXiv — NLP / Computation & Language research 1d ago

Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification

arXiv:2511.03217v2 Announce Type: replace Abstract: Large language models (LLMs) excel in generating fluent utterances but can lack reliable grounding in verified information. At the same time, knowledge-graph-based fact-checkers deliver precise and interpretable evidence, yet…

4
arXiv — NLP / Computation & Language research 1d ago

Safe Language Generation in the Limit

arXiv:2601.08648v2 Announce Type: replace Abstract: Recent results in learning a language in the limit have shown that, although language identification is impossible, language generation is tractable. As this foundational area expands, we need to consider the implications of…

5
arXiv — NLP / Computation & Language research 1d ago

Learning to Evict from Key-Value Cache

arXiv:2602.10238v2 Announce Type: replace Abstract: The growing size of Large Language Models (LLMs) makes efficient inference challenging, primarily due to the memory demands of the autoregressive Key-Value (KV) cache. Existing eviction or compression methods reduce cost but…

25
arXiv — NLP / Computation & Language research 1d ago

Measuring the Redundancy of Decoder Layers in SpeechLLMs

arXiv:2603.05121v2 Announce Type: replace Abstract: Speech Large Language Models route speech encoder representations into an LLM decoder that typically accounts for over 90% of total parameters. We study how much of this decoder capacity is actually needed for speech tasks.…

36
arXiv — NLP / Computation & Language research 1d ago

LiveClawBench: Benchmarking LLM Agents on Complex, Real-World Assistant Tasks

arXiv:2604.13072v2 Announce Type: replace Abstract: OpenClaw-style personal assistants extend LLM agents from isolated tool use to open-ended, stateful, and personalized software environments. Evaluating these assistants is fundamentally a fidelity problem: benchmarks must be…

28
arXiv — NLP / Computation & Language research 1d ago

Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining

arXiv:2604.17633v2 Announce Type: replace Abstract: Large language models exhibit impressive cross-lingual capabilities. However, prior work analyzes this phenomenon through isolated factors and at sparse points during training, limiting our understanding of how cross-lingual…

29
arXiv — NLP / Computation & Language research 1d ago

Subject-level Inference for Realistic Text Anonymization Evaluation

arXiv:2604.21211v2 Announce Type: replace Abstract: Current text anonymization evaluation relies on span-based metrics that fail to capture what an adversary could actually infer, and assumes a single data subject, ignoring multi-subject scenarios. To address these limitations,…

6
arXiv — NLP / Computation & Language research 1d ago

Characterizing the Expressivity of Local Attention in Transformers

arXiv:2605.00768v3 Announce Type: replace Abstract: The transformer is the most popular neural architecture for language modeling. The cornerstone of the transformer is its global attention mechanism, which lets the model aggregate information from all preceding tokens before…

16
arXiv — NLP / Computation & Language research 1d ago

ELF: Embedded Language Flows

arXiv:2605.10938v2 Announce Type: replace Abstract: Diffusion and flow-based models have become the de facto approaches for generating continuous data, e.g., in domains such as images and videos. Their success has attracted growing interest in applying them to language modeling.…

22
arXiv — NLP / Computation & Language research 1d ago

Machine Learning for Coding Retail Product Names to Consumer-Price Categories: A Rule-plus-Bag-of-Words Pipeline with Reliability-Weighted Human-in-the-Loop Labeling

arXiv:2606.02004v2 Announce Type: replace Abstract: Consumer-price measurement increasingly draws on alternative data sources -- scanner, web-scraped, and transaction/receipt data -- whose product descriptions are short, noisy, and carry no standard product code, so each item…

4
arXiv — NLP / Computation & Language research 1d ago

Self-Stigma Is Not a Monolith, but Generic Empathy Is: Persona-Conditioned LLM Support for People Who Use Drugs

arXiv:2606.23387v2 Announce Type: replace Abstract: Self-stigma predicts treatment avoidance and disengagement among people who use drugs (PWUD), yet conversational systems aiming to provide support typically treat self-stigma expression as a uniform signal. We present a…

9
arXiv — NLP / Computation & Language research 1d ago

SIGNER: Temporally Grounded Sign Language Generation via Time-Resolved Conditioning

arXiv:2506.07460v2 Announce Type: replace-cross Abstract: Sign language generation (SLG), also known as text-to-sign generation, aims to bridge the communication gap between signers and non-signers. Unlike many other generative tasks, SLG must satisfy two fundamental linguistic…

16
arXiv — NLP / Computation & Language research 1d ago

PRISON: Unmasking the Criminal Potential of Large Language Models

arXiv:2506.16150v4 Announce Type: replace-cross Abstract: As large language models (LLMs) advance, concerns about their misconduct in complex social contexts intensify. Existing research overlooked the systematic understanding and assessment of their criminal capability in…

37
arXiv — NLP / Computation & Language research 1d ago

Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting

arXiv:2510.18874v3 Announce Type: replace-cross Abstract: Adapting language models (LMs) to new tasks via post-training carries the risk of degrading existing capabilities -- a phenomenon classically known as catastrophic forgetting. In this paper, toward identifying guidelines…

38
arXiv — NLP / Computation & Language research 1d ago

Psychometric Comparability of LLM-Based Digital Twins

arXiv:2601.14264v2 Announce Type: replace-cross Abstract: Large language models (LLMs) act as digital twins for human respondents, yet their psychometric comparability remains uncertain. We propose a construct validity framework spanning construct representation and the…

23
arXiv — NLP / Computation & Language research 1d ago

EXPLORE-Bench: Egocentric Scene Prediction with Long-Horizon Reasoning

arXiv:2603.09731v3 Announce Type: replace-cross Abstract: Multimodal large language models (MLLMs) are increasingly considered as a foundation for embodied agents, yet it remains unclear whether they can reliably reason about the long-term physical consequences of actions from…

34
arXiv — NLP / Computation & Language research 1d ago

RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory

arXiv:2605.06675v2 Announce Type: replace-cross Abstract: Large language models cache all previously computed key-value (KV) pairs during generation, and this KV cache grows linearly with sequence length, making it a primary memory bottleneck for serving. Quantizing the KV cache…

5
arXiv — NLP / Computation & Language research 1d ago

Auto-Configuring Scientific Simulators with Lightweight Coding-Agent Adapters

arXiv:2606.09774v2 Announce Type: replace-cross Abstract: Configuring an advanced scientific simulator, translating a modeling goal into a valid, runnable input deck, is a persistent bottleneck that costs domain scientists hours to days. Input decks are executable interfaces:…

33
arXiv — NLP / Computation & Language research 1d ago

Multimodal Evaluator Preference Collapse: Cross-Modal Coupling in Self-Evolving Agents

arXiv:2606.16682v3 Announce Type: replace-cross Abstract: When AI agents use language models to evaluate their own outputs in a feedback loop, systematic biases emerge. We show that Evaluator Preference Collapse (EPC) is dramatically amplified in multimodal settings. Using…

4
arXiv — NLP / Computation & Language research 1d ago

SingGuard: A Policy-Adaptive Multimodal LLM Guardrail with Dynamic Reasoning

arXiv:2606.22873v3 Announce Type: replace-cross Abstract: Vision-language models (VLMs) are increasingly deployed in consumer, medical, financial, and enterprise applications. This broad deployment expands the safety surface: risks can arise from multimodal question answering,…

31
arXiv — NLP / Computation & Language research 4d ago

HierBias: Context-Conditioned Hierarchical Media Bias Detection with Multi-Task Type Classification

arXiv:2606.26100v1 Announce Type: new Abstract: Media bias detection is a critical task for ensuring fair and balanced information dissemination, yet existing sentence-level approaches classify each sentence independently, ignoring inter-sentence contextual signals that human…

17
arXiv — NLP / Computation & Language research 4d ago

Know2Guess: A Contamination-Aware Multi-Zone Benchmark for Knowledge-Boundary Evaluation in Large Language Models

arXiv:2606.26101v1 Announce Type: new Abstract: Reliable evaluation of large language models should separate supported answering from unsupported guessing without conflating either with data contamination, prompt idiosyncrasy, or generic refusal behavior. We present a…

21
arXiv — NLP / Computation & Language research 4d ago

Helpfulness Hurts: Domain-Dependent Degradation of Mid-Trained Compassion Values Under Post-Training

arXiv:2606.26102v1 Announce Type: new Abstract: Standard post-training pipelines apply supervised fine-tuning (SFT) and reinforcement learning (RL) to make language models helpful, but these processes may inadvertently degrade values instilled during pre-training. We investigate…

22
arXiv — NLP / Computation & Language research 4d ago

Investigating LLM's Problem Solving Capability -- a Study on Statics Questions

arXiv:2606.26103v1 Announce Type: new Abstract: Large Language Models (LLMs) have rapidly influenced many aspects of society, particularly education, due to their demonstrated ability to complete assignments and examinations across a wide range of subjects. Although prior…

35
arXiv — NLP / Computation & Language research 4d ago

Assert, don't describe: Linguistic features that shift LLM reasoning about animal welfare

arXiv:2606.26104v1 Announce Type: new Abstract: Animal-welfare advocates produce a lot of writing, and increasingly that writing trains the language models that millions of people then ask about animal welfare. Using vocabulary-matched stance-contrast probes on a held-out…

19
arXiv — NLP / Computation & Language research 4d ago

Context Recycling for Long-Horizon LLM Inference

arXiv:2606.26105v1 Announce Type: new Abstract: Large language models (LLMs) exhibit strong capabilities in short-context reasoning but degrade in performance over long conversational horizons due to context window limitations and inefficient token usage. We introduce…

27
arXiv — NLP / Computation & Language research 4d ago

Reducing Conversational Escalation in Large Language Model Dialogue with Nonviolent Communication Constraints

arXiv:2606.26106v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used in emotionally charged situations involving interpersonal conflict, frustration, and distress. While prior safety research has focused on preventing explicit harms such as toxic or…

26
arXiv — NLP / Computation & Language research 4d ago

Low Resource Multimodal Translation of Nepali Spoken Words into Emotion-Conditioned Sign Language Avatars

arXiv:2606.26107v1 Announce Type: new Abstract: Sign language communication systems, that integrate emotional expression remain underexplored, particularly for low-resource languages. This pilot study presents NEST-V1 (Nepali Emotion and Speech Transformer - Version 1), a…

37
arXiv — NLP / Computation & Language research 4d ago

Where Larger Models Excel: The Primacy of Constraint-Guided Reasoning

arXiv:2606.26108v1 Announce Type: new Abstract: Larger language models consistently outperform smaller ones on reasoning benchmarks, yet the reasoning differences underlying this gap remain underexplored. Across benchmarks in mathematics, physics, chemistry, and programming, we…

35
arXiv — NLP / Computation & Language research 4d ago

From Lexicon to AI: A Structured-Data Pipeline for Specialized Conversational Systems in Low-Resource Languages

arXiv:2606.26112v1 Announce Type: new Abstract: Low-resource languages face a critical challenge in AI development: creating specialized conversational systems without access to massive training corpora. We present a systematic methodology for transforming structured linguistic…

36
arXiv — NLP / Computation & Language research 4d ago

Dynamic-dLLM: Dynamic Cache-Budget and Adaptive Parallel Decoding for Training-Free Acceleration of Diffusion LLM

arXiv:2606.26120v1 Announce Type: new Abstract: Diffusion Large Language Models (dLLMs) offer a promising alternative to autoregressive models, excelling in text generation tasks due to their bidirectional attention mechanisms. However, their computational complexity scales on…

15
arXiv — NLP / Computation & Language research 4d ago

Thinking Like a Scientist? A Structural Study of LLM-Generated Research Methods

arXiv:2606.26130v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly used to guide research methodology, yet their default methodological tendencies under minimal prompting remain unclear. Here, we prompt GPT-5.1, Gemini 3 Pro, and DeepSeek-V3.2 with an…

38
arXiv — NLP / Computation & Language research 4d ago

From Structure to Synergy: A Survey of Vision-Language Perception Paradigm Evolution in Multimodal Large Language Models

arXiv:2606.26196v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have recently made remarkable progress in unifying vision-language understanding and reasoning, especially following the introduction of models such as OpenAI's O-series and DeepSeek's…

12
arXiv — NLP / Computation & Language research 4d ago

Phonetic and semantic analyses of spoken corpora of Beijing and Taiwan Mandarin indicate that the neutral tone is a lexical tone

arXiv:2606.26360v1 Announce Type: new Abstract: The neutral, or floating, tone of Mandarin Chinese is a tone with an enigmatic set of properties. It has been described as a reduced tone, or as a tone that sometimes is lexically fixed but that can also be toneless. In…

4
arXiv — NLP / Computation & Language research 4d ago

Charting the Growth of Social-Physical HRI (spHRI): A Systematic Review Pipeline Augmented by Small Language Models

arXiv:2606.26382v1 Announce Type: new Abstract: Social-physical human-robot interaction (spHRI) has grown rapidly across robotics, human-computer interaction, human-robot interaction, and haptics. Yet, fragmented terminology and inconsistent methodologies make systematic…

35
arXiv — NLP / Computation & Language research 4d ago

ProfileFoundry: A Synthetic Person-Object Substrate for Privacy, Memory, and Tool-Use Evaluation in LLM Agent

arXiv:2606.26403v1 Announce Type: new Abstract: Foundation-model research increasingly needs data about people: user state, personal histories, relationships, contact-like fields, documents, and longitudinal updates. Real user data is difficult to share, perturb, audit, or…

34
arXiv — NLP / Computation & Language research 4d ago

ConflictScore: Identifying and Measuring How Language Models Handle Conflicting Evidence

arXiv:2606.26437v1 Announce Type: new Abstract: Existing metrics for factuality and faithfulness evaluate whether an answer is supported or contradicted by its grounding documents, but they fail to capture when both supporting and contradicting evidence coexist. We introduce…

6

Joint Transcription and Decryption of Images of Encrypted Handwritten Documents: A Comparison with the Traditional Pipeline

Verifiable Geometry Problem Solving: Solver-Driven Autoformalization and Theorem Proposing

AI Persuasive Framing in Collective Dilemmas

DG^VoiC: Speaker Clustering for Fraud Investigation under Real Call-Centre Conditions

Single and Multi Truth Data Fusion using Large Language Models

Scaling limit of the Random Language Model

HPRO: Hierarchical Progressive Reward Optimization via Preference Extraction for Emotional Text-to-Speech

Towards Automating Scientific Review with Google's Paper Assistant Tool

Continual Memorization of Factoids in Language Models

ReFreeKV: Towards Threshold-Free KV Cache Compression

On the Effect of Uncertainty on Layer-wise Inference Dynamics

Training-free Truthfulness Detection via Sparse MLP Value Vectors

Check Yourself Before You Wreck Yourself: Selectively Quitting Improves LLM Agent Safety

Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification

Safe Language Generation in the Limit

Learning to Evict from Key-Value Cache

Measuring the Redundancy of Decoder Layers in SpeechLLMs

LiveClawBench: Benchmarking LLM Agents on Complex, Real-World Assistant Tasks

Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining

Subject-level Inference for Realistic Text Anonymization Evaluation

Characterizing the Expressivity of Local Attention in Transformers

ELF: Embedded Language Flows

Machine Learning for Coding Retail Product Names to Consumer-Price Categories: A Rule-plus-Bag-of-Words Pipeline with Reliability-Weighted Human-in-the-Loop Labeling

Self-Stigma Is Not a Monolith, but Generic Empathy Is: Persona-Conditioned LLM Support for People Who Use Drugs

SIGNER: Temporally Grounded Sign Language Generation via Time-Resolved Conditioning

PRISON: Unmasking the Criminal Potential of Large Language Models

Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting

Psychometric Comparability of LLM-Based Digital Twins

EXPLORE-Bench: Egocentric Scene Prediction with Long-Horizon Reasoning

RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory

Auto-Configuring Scientific Simulators with Lightweight Coding-Agent Adapters

Multimodal Evaluator Preference Collapse: Cross-Modal Coupling in Self-Evolving Agents

SingGuard: A Policy-Adaptive Multimodal LLM Guardrail with Dynamic Reasoning

HierBias: Context-Conditioned Hierarchical Media Bias Detection with Multi-Task Type Classification

Know2Guess: A Contamination-Aware Multi-Zone Benchmark for Knowledge-Boundary Evaluation in Large Language Models

Helpfulness Hurts: Domain-Dependent Degradation of Mid-Trained Compassion Values Under Post-Training

Investigating LLM's Problem Solving Capability -- a Study on Statics Questions

Assert, don't describe: Linguistic features that shift LLM reasoning about animal welfare

Context Recycling for Long-Horizon LLM Inference

Reducing Conversational Escalation in Large Language Model Dialogue with Nonviolent Communication Constraints

Low Resource Multimodal Translation of Nepali Spoken Words into Emotion-Conditioned Sign Language Avatars

Where Larger Models Excel: The Primacy of Constraint-Guided Reasoning

From Lexicon to AI: A Structured-Data Pipeline for Specialized Conversational Systems in Low-Resource Languages

Dynamic-dLLM: Dynamic Cache-Budget and Adaptive Parallel Decoding for Training-Free Acceleration of Diffusion LLM

Thinking Like a Scientist? A Structural Study of LLM-Generated Research Methods

From Structure to Synergy: A Survey of Vision-Language Perception Paradigm Evolution in Multimodal Large Language Models

Phonetic and semantic analyses of spoken corpora of Beijing and Taiwan Mandarin indicate that the neutral tone is a lexical tone

Charting the Growth of Social-Physical HRI (spHRI): A Systematic Review Pipeline Augmented by Small Language Models

ProfileFoundry: A Synthetic Person-Object Substrate for Privacy, Memory, and Tool-Use Evaluation in LLM Agent

ConflictScore: Identifying and Measuring How Language Models Handle Conflicting Evidence