arXiv — NLP / Computation & Language

500 articles archived · Visit source ↗ · RSS

arXiv — NLP / Computation & Language research 5d ago

ASAP: Agent-System Co-Design for Wall-Clock-Centered Auto HPO Research for ML Experiments

arXiv:2606.25207v1 Announce Type: cross Abstract: Hyperparameter Optimization (HPO) is essential for maximizing machine learning model performance, and its core challenge is sample efficiency: finding strong configurations within a limited budget. Because every HPO tool relies…

27
arXiv — NLP / Computation & Language research 5d ago

Multilingual Hematology Visual Question Answering Dataset

arXiv:2606.25246v1 Announce Type: cross Abstract: Vision Language Models (VLMs) have shown promising capabilities in medical image analysis by jointly understanding visual and textual information for tasks such as Visual Question Answering. However, existing hematology…

5
arXiv — NLP / Computation & Language research 5d ago

Measuring Research Difficulty of Academic Papers: A Case Study in Natural Language Processing

arXiv:2606.25307v1 Announce Type: cross Abstract: With the rapid growth of the number of academic papers, systematically evaluating the difficulty of research and its relationship to academic impact offers important significance for research topic selection and resource…

25
arXiv — NLP / Computation & Language research 5d ago

Data-Driven Evolution of Library and Information Science Research Methods (1990-2022): A Perspective Based on Fine-grained Method Entities

arXiv:2606.25320v1 Announce Type: cross Abstract: Since the 1990s, advancements in big data and information technology have increasingly driven data-centric research in the field of Library and Information Science (LIS). To assess the influence of this data-driven research…

16
arXiv — NLP / Computation & Language research 5d ago

Sarashina2.2-TTS: Tackling Kanji Polyphony in Japanese Speech Generation via Data Scaling and Targeted Data Synthesis

arXiv:2606.25369v1 Announce Type: cross Abstract: While large language model (LLM)-based text-to-speech (TTS) systems have achieved high-quality speech synthesis, most existing systems focus on English and Chinese. Japanese, however, remains under-explored, and its unique…

36
arXiv — NLP / Computation & Language research 5d ago

Adaptive Oscillatory Inductive Bias for Modeling Sharp Prosodic Dynamics in Diffusion-Based TTS

arXiv:2606.25424v1 Announce Type: cross Abstract: Diffusion-based text-to-speech (TTS) models have achieved significant improvements in speech quality. However, modeling sharp prosodic transitions and rapid pitch variations in expressive speech remains challenging. Existing…

37
arXiv — NLP / Computation & Language research 5d ago

Evaluating Japanese Dialect Robustness Across Speech and Text-based Large Language Models

arXiv:2606.25436v1 Announce Type: cross Abstract: Dialogue systems based on large language models (LLMs) have advanced significantly in recent years. However, dialectal variation remains a major challenge, particularly for systems that process spoken input. LLM-based speech…

34
arXiv — NLP / Computation & Language research 5d ago

Does Translation-Enhanced Speech Encoder Pre-training Affect Speech LLMs?

arXiv:2606.25444v1 Announce Type: cross Abstract: Connecting a pre-trained speech encoder to a Large Language Model (LLM) is the standard architecture for building Speech LLMs. However, a structural misalignment exists between the encoder and the LLM. Unlike encoders based on…

23
arXiv — NLP / Computation & Language research 5d ago

The Interplay of Harness Design and Post-Training in LLM Agents

arXiv:2606.25447v1 Announce Type: cross Abstract: Tool-integrated LLM agents are often wrapped within a harness: the scaffolding that determines which tools are exposed, how they are described, and what auxiliary information accompanies each per-step observation. While agents…

15
arXiv — NLP / Computation & Language research 5d ago

The Generalization Spectrum: A Chromatographic Approach to Evaluating Learning Algorithms

arXiv:2606.25450v1 Announce Type: cross Abstract: Traditional evaluations measure a learning algorithm's final performance on an i.i.d. test set, reducing learning to a single aggregate score. This approach obscures a fundamental question: to what extent does learning from a…

12
arXiv — NLP / Computation & Language research 5d ago

Fully Differentiable Neural Forced Alignment via Soft Dynamic Programming

arXiv:2606.25460v1 Announce Type: cross Abstract: Recent advances in sequence modeling have significantly improved ASR systems, bringing them close to human-level recognition accuracy and enhancing robustness across diverse acoustic conditions and languages. In contrast, Forced…

24
arXiv — NLP / Computation & Language research 5d ago

Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning

arXiv:2606.25524v1 Announce Type: cross Abstract: Large language models (LLMs) reach high accuracy in mathematical reasoning, but individual traces on the same problem diverge; some arrive at the correct answer while others fail. Prior work analyzes failure at the step, chunk,…

32
arXiv — NLP / Computation & Language research 5d ago

Evaluating LLMs on Real-World Software Performance Optimization

arXiv:2606.25530v1 Announce Type: cross Abstract: Software performance optimization is a notoriously complex and manual task. Despite the growing use of Large Language Models (LLMs) for code refinement, we still lack benchmarks that capture how optimization actually happens in…

17
arXiv — NLP / Computation & Language research 5d ago

Security and Privacy in Retrieval-Augmented Generation: Architectures, Threats, Defenses, and Future Directions for Building Trustworthy Systems

arXiv:2606.25533v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) has emerged as a dominant paradigm for enhancing large language models with external knowledge. By coupling retrieval mechanisms with generative models, RAG systems improve factual grounding…

31
arXiv — NLP / Computation & Language research 5d ago

Tracing Target Answers in Poisoned Retrieval Corpora via Token Influence Attribution

arXiv:2606.25721v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) systems are vulnerable to corpus poisoning attacks that manipulate model outputs through malicious retrieved documents. Existing detection methods typically rely on auxiliary classifiers or…

30
arXiv — NLP / Computation & Language research 5d ago

RAS: Measuring LLM Safety Through Refusal Alignment

arXiv:2606.25750v1 Announce Type: cross Abstract: Safety evaluation of large language models (LLMs) is commonly performed by querying models with unsafe or jailbreak prompts and judging whether their outputs violate a safety policy. Although useful, output-level evaluation is…

27
arXiv — NLP / Computation & Language research 5d ago

Uncertainty Quantification for Computer-Use Agents: A Benchmark across Vision-Language Models and GUI Grounding Datasets

arXiv:2606.25760v1 Announce Type: cross Abstract: Computer-use agents turn vision-language model (VLM) predictions into executable GUI clicks, so reliable uncertainty estimates are essential for rejection, calibration, miss-severity ranking, and spatial safety regions. Yet…

14
arXiv — NLP / Computation & Language research 5d ago

Space-Efficient Language Generation in the Limit

arXiv:2606.25777v1 Announce Type: cross Abstract: We initiate a resource-aware theory of \textit{language generation in the limit} under the minimal constraint of space efficiency. In our framework, a learner observes an adversarial positive stream from a target language $K$ and…

7
arXiv — NLP / Computation & Language research 5d ago

How Large Language Models Source Brand Reputation Across Languages and Markets

arXiv:2606.25787v1 Announce Type: cross Abstract: When a large language model (LLM) answers a question about a company, it grounds the answer in retrieved web sources, and those sources decide what the model says. Most analysis of AI brand visibility looks at the answer text.…

37
arXiv — NLP / Computation & Language research 5d ago

Autodata: An agentic data scientist to create high quality synthetic data

arXiv:2606.25996v1 Announce Type: cross Abstract: We introduce Autodata, a general method that enables AI agents to act as data scientists who build high quality training and evaluation data. We show how to train (meta-optimize) such a data scientist agent, so that it learns to…

30
arXiv — NLP / Computation & Language research 5d ago

How Robust is OCR-Reasoning? Evaluating OCR-Reasoning Robustness of Vision-Language Models under Visual Perturbations

arXiv:2606.26041v1 Announce Type: cross Abstract: Vision-language models (VLMs) have achieved strong performance on OCR-based benchmarks and increasingly focused on text-rich understanding, but their robustness under controlled visual degradation remains insufficiently…

29
arXiv — NLP / Computation & Language research 5d ago

Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

arXiv:2606.26050v1 Announce Type: cross Abstract: Midway through an ordinary pretraining run, a small language model learns the pronoun-gender rule: cued with a girl's name ("Sue cried because"), it resolves the next pronoun to she, generalizing to held-out probes (0.94 by step…

4
arXiv — NLP / Computation & Language research 5d ago

Learning to Erase Private Knowledge from Multi-Documents for Retrieval-Augmented Large Language Models

arXiv:2504.09910v2 Announce Type: replace Abstract: Retrieval-Augmented Generation (RAG) is a promising technique for applying LLMs to proprietary domains. However, retrieved documents may contain sensitive knowledge, posing risks of privacy leakage in generative results. Thus,…

4
arXiv — NLP / Computation & Language research 5d ago

A Systematic Analysis of Hybrid Linear Attention

arXiv:2507.06457v2 Announce Type: replace Abstract: Transformers face quadratic complexity and memory issues with long sequences, prompting the adoption of linear attention mechanisms using fixed-size hidden states. However, linear models often suffer from limited recall…

34
arXiv — NLP / Computation & Language research 5d ago

Scale or Reason? A Compute-Equivalent Analysis of Reasoning Distillation

arXiv:2509.22193v2 Announce Type: replace Abstract: Distilling reasoning traces from strong teacher models has become the standard recipe for building capable small language models. Yet reasoning traces are 5-20$\times$ longer than standard instruction fine-tuning (IFT) outputs,…

19
arXiv — NLP / Computation & Language research 5d ago

Robustness assessment of large audio language models in multiple-choice evaluation

arXiv:2510.04584v2 Announce Type: replace Abstract: Recent advances in large audio language models (LALMs) have primarily been assessed using a multiple-choice question answering (MCQA) framework. However, subtle changes, such as shifting the order of choices, result in…

13
arXiv — NLP / Computation & Language research 5d ago

How Pragmatics Shape Articulation: A Computational Case Study in STEM ASL Discourse

arXiv:2510.23842v2 Announce Type: replace Abstract: Most state-of-the-art sign language models are trained on interpreter or isolated vocabulary data, which overlooks the variability that characterizes natural dialogue. However, human communication dynamically adapts to contexts…

31
arXiv — NLP / Computation & Language research 5d ago

Reinforcement Learning Improves Traversal of Parametric Knowledge in LLMs

arXiv:2511.05933v2 Announce Type: replace Abstract: Reinforcement learning (RL) is often credited with improving language model reasoning at the expense of knowledge. We challenge this narrative by showing that reasoning models consistently outperform their instruction-tuned…

11
arXiv — NLP / Computation & Language research 5d ago

Constituency Structure over Eojeol in Korean Treebanks

arXiv:2512.22487v2 Announce Type: replace Abstract: The design of Korean constituency treebanks raises a central representational question concerning the choice of terminal units. Although Korean words are morphologically complex, treating morphemes as constituency terminals can…

14
arXiv — NLP / Computation & Language research 5d ago

Membox: Weaving Topic Continuity into Long-Range Memory for LLM Agents

arXiv:2601.03785v3 Announce Type: replace Abstract: Long-term human-agent dialogues are organized by topic continuity: adjacent turns often develop the same goal, plan, problem, or event, while related activities may recur across distant sessions. Yet many LLM agent memory…

25
arXiv — NLP / Computation & Language research 5d ago

Paid Voices vs. Public Feeds: Interpretable Cross-Platform Theme-Based Analysis of Climate Discourse

arXiv:2601.13317v2 Announce Type: replace Abstract: Climate discourse online shapes public understanding of climate change and informs political and policy debate, yet it unfolds across structurally different environments: paid advertising platforms host targeted,…

9
arXiv — NLP / Computation & Language research 5d ago

ConPress: Learning Efficient Reasoning from Multi-Question Contextual Pressure

arXiv:2602.01472v2 Announce Type: replace Abstract: Large reasoning models (LRMs) typically solve reasoning-intensive tasks by generating long chain-of-thought (CoT) traces, leading to substantial inference overhead. We identify a reproducible inference-time phenomenon, termed…

5
arXiv — NLP / Computation & Language research 5d ago

Am I More Pointwise or Pairwise? Revealing Position Bias in Rubric-Based LLM-as-a-Judge

arXiv:2602.02219v2 Announce Type: replace Abstract: Large language models are widely employed as evaluators, a paradigm commonly referred to as LLM-as-a-judge. Prior research has predominantly examined point-wise or pair-wise evaluation protocols; in contrast, our focus is on…

8
arXiv — NLP / Computation & Language research 6d ago

EXPO-SQL: Execution-based Clause-level Policy Optimization for Text-to-SQL

arXiv:2606.23693v1 Announce Type: new Abstract: Text-to-SQL enables users to query databases using natural language by generating executable SQL queries. Recent methods have increasingly adopted Large Language Models based reinforcement learning (RL) to leverage execution…

18
arXiv — NLP / Computation & Language research 6d ago

ModTGCN: Modularity-aware Graph Neural Networks for Text Classification

arXiv:2606.23694v1 Announce Type: new Abstract: Graph-based text classification models typically rely on local neighborhood aggregation and overlook global community structure, despite semantic document graphs exhibiting strong class-consistent clustering. Ignoring this can blur…

22
arXiv — NLP / Computation & Language research 6d ago

Quantifying Prior Dominance in RAG Systems

arXiv:2606.23695v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) grounds Large Language Models in external knowledge, yet current evaluations rely on discrete heuristics that suffer from ''epistemic blindness'' - failing to distinguish genuine contextual…

28
arXiv — NLP / Computation & Language research 6d ago

Self-Recognition Finetuning can Prevent and Reverse Emergent Misalignment

arXiv:2606.23700v1 Announce Type: new Abstract: Emergent misalignment (EM) has been linked to the activation of misaligned persona vectors and evil character traits, suggesting that EM operates through disruption of the model's aligned character rather than direct learning of…

8
arXiv — NLP / Computation & Language research 6d ago

Evaluating LLM Usage for Efficient and Explainable Numerical and Classified Implicit Sentiment Analysis of Product Desirability

arXiv:2606.23701v1 Announce Type: new Abstract: Qualitative product feedback can reveal nuanced user experiences, but its implicit sentiment is difficult to measure. This paper presents a scalable and interpretable framework that uses large language models (LLMs) to quantify…

32
arXiv — NLP / Computation & Language research 6d ago

Ground Then Rank: Revisiting Knowledge-Based VQA with Training-Free Entity Identification

arXiv:2606.23881v1 Announce Type: new Abstract: Knowledge-Based Visual Question Answering (KB-VQA) requires grounding visual queries to external knowledge beyond directly observable content in images. While recent multi modal large language models (MLLMs) show strong perceptual…

13
arXiv — NLP / Computation & Language research 6d ago

One Year Later...The Harms Persist, But So Do We!

arXiv:2606.23884v1 Announce Type: new Abstract: General-purpose large language models (LLMs) are increasingly used for mental health-related conversations, yet safety safeguards remain inadequate and inconsistent across clinical conditions. This study evaluates six proprietary…

26
arXiv — NLP / Computation & Language research 6d ago

Do LLM Attribution Metrics Transfer? Auditing Retrieval-Augmented Generation Evaluation Across Datasets and Constructs

arXiv:2606.23915v1 Announce Type: new Abstract: Practice often treats automatic metrics for attribution in LLM retrieval-augmented generation as interchangeable. We audit eight automatic scorers -- lexical, embedding, and BERTScore baselines alongside…

28
arXiv — NLP / Computation & Language research 6d ago

When Retrieval Metrics Mislead: Measuring Policy Signal in Long-Horizon Tool-Use Agents

arXiv:2606.23937v1 Announce Type: new Abstract: Exact-match retrieval recall is often used as a proxy for whether a retriever supplies useful policy context to a downstream decision model. We test this proxy for pre-action policy classification in tau-bench using Qwen2.5-3B/7B…

11
arXiv — NLP / Computation & Language research 6d ago

QuechuaTok: Morphological Boundary Accuracy as a Necessary Metric for Tokenizer Evaluation in Agglutinative Low-Resource Languages

arXiv:2606.23943v1 Announce Type: new Abstract: Tokenization is a foundational step in NLP pipelines, yet standard evaluation metrics such as fertility rate fail to capture morphological correctness for agglutinative languages. We present QuechuaTok, a systematic benchmark…

32
arXiv — NLP / Computation & Language research 6d ago

Layer-wise Probing of wav2vec 2.0 and Whisper for Consonant Cluster Reduction in African American English

arXiv:2606.23948v1 Announce Type: new Abstract: Self-supervised and supervised speech models are increasingly used to investigate which linguistic information their internal representations encode, and at what level of abstraction they encode it. One underexplored phenomenon is…

6
arXiv — NLP / Computation & Language research 6d ago

Does My Embedding Reflect That $A = B$? Evaluating Mathematical Equivalence in Embedding Models

arXiv:2606.23959v1 Announce Type: new Abstract: Because mathematics is highly abstract, a single statement can take very different forms depending on what subfield it is framed in. There are many examples where breakthroughs occurred after researchers discovered that a question…

25
arXiv — NLP / Computation & Language research 6d ago

Faithful by Construction: Claim-Anchored Attribution for Multi-Document Summarization

arXiv:2606.23989v1 Announce Type: new Abstract: End-to-end large language models (LLMs) produce fluent multi-document summaries but remain prone to hallucination, and the attributions they offer are typically coarse (whole documents or passages) and generated post hoc, leaving…

33
arXiv — NLP / Computation & Language research 6d ago

RASC+: Retrieval-Constrained LLM Adjudication for Clinical Value Set Authoring

arXiv:2606.23992v1 Announce Type: new Abstract: Clinical value sets define the standardized terminology codes used in quality measurement, phenotyping, cohort construction, and clinical decision support. The recently introduced Retrieval-Augmented Set Completion (RASC) benchmark…

32
arXiv — NLP / Computation & Language research 6d ago

Towards Spec Learning: Inference-Time Alignment from Preference Pairs

arXiv:2606.24004v1 Announce Type: new Abstract: Steering a large language model (LLM) toward a desired behavior typically relies on an iterative process of hand-crafting a prompt based on a careful inspection of the model's responses. This is an involved, brittle, and…

28
arXiv — NLP / Computation & Language research 6d ago

Towards Version-aware Operations and Transaction Memories for Multi-layer MeMo

arXiv:2606.24040v1 Announce Type: new Abstract: MeMo proposes language models with explicit multi-layer correlation matrix memories (CMMs), where memorization, retrieval, and forgetting are architectural operations. This paper asks how such memories can reduce the need for…

36
arXiv — NLP / Computation & Language research 6d ago

Best Preprocessing Techniques for Sentiment Analysis

arXiv:2606.24055v1 Announce Type: new Abstract: Sentiment analysis in Twitter datasets is important because it enables monitoring public opinion on products and analysis of political and social movements. One critical step is preprocessing: the automated processing of text for…

23

ASAP: Agent-System Co-Design for Wall-Clock-Centered Auto HPO Research for ML Experiments

Multilingual Hematology Visual Question Answering Dataset

Measuring Research Difficulty of Academic Papers: A Case Study in Natural Language Processing

Data-Driven Evolution of Library and Information Science Research Methods (1990-2022): A Perspective Based on Fine-grained Method Entities

Sarashina2.2-TTS: Tackling Kanji Polyphony in Japanese Speech Generation via Data Scaling and Targeted Data Synthesis

Adaptive Oscillatory Inductive Bias for Modeling Sharp Prosodic Dynamics in Diffusion-Based TTS

Evaluating Japanese Dialect Robustness Across Speech and Text-based Large Language Models

Does Translation-Enhanced Speech Encoder Pre-training Affect Speech LLMs?

The Interplay of Harness Design and Post-Training in LLM Agents

The Generalization Spectrum: A Chromatographic Approach to Evaluating Learning Algorithms

Fully Differentiable Neural Forced Alignment via Soft Dynamic Programming

Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning

Evaluating LLMs on Real-World Software Performance Optimization

Security and Privacy in Retrieval-Augmented Generation: Architectures, Threats, Defenses, and Future Directions for Building Trustworthy Systems

Tracing Target Answers in Poisoned Retrieval Corpora via Token Influence Attribution

RAS: Measuring LLM Safety Through Refusal Alignment

Uncertainty Quantification for Computer-Use Agents: A Benchmark across Vision-Language Models and GUI Grounding Datasets

Space-Efficient Language Generation in the Limit

How Large Language Models Source Brand Reputation Across Languages and Markets

Autodata: An agentic data scientist to create high quality synthetic data

How Robust is OCR-Reasoning? Evaluating OCR-Reasoning Robustness of Vision-Language Models under Visual Perturbations

Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

Learning to Erase Private Knowledge from Multi-Documents for Retrieval-Augmented Large Language Models

A Systematic Analysis of Hybrid Linear Attention

Scale or Reason? A Compute-Equivalent Analysis of Reasoning Distillation

Robustness assessment of large audio language models in multiple-choice evaluation

How Pragmatics Shape Articulation: A Computational Case Study in STEM ASL Discourse

Reinforcement Learning Improves Traversal of Parametric Knowledge in LLMs

Constituency Structure over Eojeol in Korean Treebanks

Membox: Weaving Topic Continuity into Long-Range Memory for LLM Agents

Paid Voices vs. Public Feeds: Interpretable Cross-Platform Theme-Based Analysis of Climate Discourse

ConPress: Learning Efficient Reasoning from Multi-Question Contextual Pressure

Am I More Pointwise or Pairwise? Revealing Position Bias in Rubric-Based LLM-as-a-Judge

EXPO-SQL: Execution-based Clause-level Policy Optimization for Text-to-SQL

ModTGCN: Modularity-aware Graph Neural Networks for Text Classification

Quantifying Prior Dominance in RAG Systems

Self-Recognition Finetuning can Prevent and Reverse Emergent Misalignment

Evaluating LLM Usage for Efficient and Explainable Numerical and Classified Implicit Sentiment Analysis of Product Desirability

Ground Then Rank: Revisiting Knowledge-Based VQA with Training-Free Entity Identification

One Year Later...The Harms Persist, But So Do We!

Do LLM Attribution Metrics Transfer? Auditing Retrieval-Augmented Generation Evaluation Across Datasets and Constructs

When Retrieval Metrics Mislead: Measuring Policy Signal in Long-Horizon Tool-Use Agents

QuechuaTok: Morphological Boundary Accuracy as a Necessary Metric for Tokenizer Evaluation in Agglutinative Low-Resource Languages

Layer-wise Probing of wav2vec 2.0 and Whisper for Consonant Cluster Reduction in African American English

Does My Embedding Reflect That $A = B$? Evaluating Mathematical Equivalence in Embedding Models

Faithful by Construction: Claim-Anchored Attribution for Multi-Document Summarization

RASC+: Retrieval-Constrained LLM Adjudication for Clinical Value Set Authoring

Towards Spec Learning: Inference-Time Alignment from Preference Pairs

Towards Version-aware Operations and Transaction Memories for Multi-layer MeMo

Best Preprocessing Techniques for Sentiment Analysis