arXiv — NLP / Computation & Language
500 articles archived · Visit source ↗ · RSS
-
arXiv — NLP / Computation & Language research 5d ago
ASAP: Agent-System Co-Design for Wall-Clock-Centered Auto HPO Research for ML Experiments
arXiv:2606.25207v1 Announce Type: cross Abstract: Hyperparameter Optimization (HPO) is essential for maximizing machine learning model performance, and its core challenge is sample efficiency: finding strong configurations within a limited budget. Because every HPO tool relies…
27 -
arXiv — NLP / Computation & Language research 5d ago
Multilingual Hematology Visual Question Answering Dataset
arXiv:2606.25246v1 Announce Type: cross Abstract: Vision Language Models (VLMs) have shown promising capabilities in medical image analysis by jointly understanding visual and textual information for tasks such as Visual Question Answering. However, existing hematology…
5 -
arXiv — NLP / Computation & Language research 5d ago
Measuring Research Difficulty of Academic Papers: A Case Study in Natural Language Processing
arXiv:2606.25307v1 Announce Type: cross Abstract: With the rapid growth of the number of academic papers, systematically evaluating the difficulty of research and its relationship to academic impact offers important significance for research topic selection and resource…
25 -
arXiv — NLP / Computation & Language research 5d ago
Data-Driven Evolution of Library and Information Science Research Methods (1990-2022): A Perspective Based on Fine-grained Method Entities
arXiv:2606.25320v1 Announce Type: cross Abstract: Since the 1990s, advancements in big data and information technology have increasingly driven data-centric research in the field of Library and Information Science (LIS). To assess the influence of this data-driven research…
16 -
arXiv — NLP / Computation & Language research 5d ago
Sarashina2.2-TTS: Tackling Kanji Polyphony in Japanese Speech Generation via Data Scaling and Targeted Data Synthesis
arXiv:2606.25369v1 Announce Type: cross Abstract: While large language model (LLM)-based text-to-speech (TTS) systems have achieved high-quality speech synthesis, most existing systems focus on English and Chinese. Japanese, however, remains under-explored, and its unique…
36 -
arXiv — NLP / Computation & Language research 5d ago
Adaptive Oscillatory Inductive Bias for Modeling Sharp Prosodic Dynamics in Diffusion-Based TTS
arXiv:2606.25424v1 Announce Type: cross Abstract: Diffusion-based text-to-speech (TTS) models have achieved significant improvements in speech quality. However, modeling sharp prosodic transitions and rapid pitch variations in expressive speech remains challenging. Existing…
37 -
arXiv — NLP / Computation & Language research 5d ago
Evaluating Japanese Dialect Robustness Across Speech and Text-based Large Language Models
arXiv:2606.25436v1 Announce Type: cross Abstract: Dialogue systems based on large language models (LLMs) have advanced significantly in recent years. However, dialectal variation remains a major challenge, particularly for systems that process spoken input. LLM-based speech…
34 -
arXiv — NLP / Computation & Language research 5d ago
Does Translation-Enhanced Speech Encoder Pre-training Affect Speech LLMs?
arXiv:2606.25444v1 Announce Type: cross Abstract: Connecting a pre-trained speech encoder to a Large Language Model (LLM) is the standard architecture for building Speech LLMs. However, a structural misalignment exists between the encoder and the LLM. Unlike encoders based on…
23 -
arXiv — NLP / Computation & Language research 5d ago
The Interplay of Harness Design and Post-Training in LLM Agents
arXiv:2606.25447v1 Announce Type: cross Abstract: Tool-integrated LLM agents are often wrapped within a harness: the scaffolding that determines which tools are exposed, how they are described, and what auxiliary information accompanies each per-step observation. While agents…
15 -
arXiv — NLP / Computation & Language research 5d ago
The Generalization Spectrum: A Chromatographic Approach to Evaluating Learning Algorithms
arXiv:2606.25450v1 Announce Type: cross Abstract: Traditional evaluations measure a learning algorithm's final performance on an i.i.d. test set, reducing learning to a single aggregate score. This approach obscures a fundamental question: to what extent does learning from a…
12 -
arXiv — NLP / Computation & Language research 5d ago
Fully Differentiable Neural Forced Alignment via Soft Dynamic Programming
arXiv:2606.25460v1 Announce Type: cross Abstract: Recent advances in sequence modeling have significantly improved ASR systems, bringing them close to human-level recognition accuracy and enhancing robustness across diverse acoustic conditions and languages. In contrast, Forced…
24 -
arXiv — NLP / Computation & Language research 5d ago
Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning
arXiv:2606.25524v1 Announce Type: cross Abstract: Large language models (LLMs) reach high accuracy in mathematical reasoning, but individual traces on the same problem diverge; some arrive at the correct answer while others fail. Prior work analyzes failure at the step, chunk,…
32 -
arXiv — NLP / Computation & Language research 5d ago
Evaluating LLMs on Real-World Software Performance Optimization
arXiv:2606.25530v1 Announce Type: cross Abstract: Software performance optimization is a notoriously complex and manual task. Despite the growing use of Large Language Models (LLMs) for code refinement, we still lack benchmarks that capture how optimization actually happens in…
17 -
arXiv — NLP / Computation & Language research 5d ago
Security and Privacy in Retrieval-Augmented Generation: Architectures, Threats, Defenses, and Future Directions for Building Trustworthy Systems
arXiv:2606.25533v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) has emerged as a dominant paradigm for enhancing large language models with external knowledge. By coupling retrieval mechanisms with generative models, RAG systems improve factual grounding…
31 -
arXiv — NLP / Computation & Language research 5d ago
Tracing Target Answers in Poisoned Retrieval Corpora via Token Influence Attribution
arXiv:2606.25721v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) systems are vulnerable to corpus poisoning attacks that manipulate model outputs through malicious retrieved documents. Existing detection methods typically rely on auxiliary classifiers or…
30 -
arXiv — NLP / Computation & Language research 5d ago
RAS: Measuring LLM Safety Through Refusal Alignment
arXiv:2606.25750v1 Announce Type: cross Abstract: Safety evaluation of large language models (LLMs) is commonly performed by querying models with unsafe or jailbreak prompts and judging whether their outputs violate a safety policy. Although useful, output-level evaluation is…
27 -
arXiv — NLP / Computation & Language research 5d ago
Uncertainty Quantification for Computer-Use Agents: A Benchmark across Vision-Language Models and GUI Grounding Datasets
arXiv:2606.25760v1 Announce Type: cross Abstract: Computer-use agents turn vision-language model (VLM) predictions into executable GUI clicks, so reliable uncertainty estimates are essential for rejection, calibration, miss-severity ranking, and spatial safety regions. Yet…
14 -
arXiv — NLP / Computation & Language research 5d ago
Space-Efficient Language Generation in the Limit
arXiv:2606.25777v1 Announce Type: cross Abstract: We initiate a resource-aware theory of \textit{language generation in the limit} under the minimal constraint of space efficiency. In our framework, a learner observes an adversarial positive stream from a target language $K$ and…
7 -
arXiv — NLP / Computation & Language research 5d ago
How Large Language Models Source Brand Reputation Across Languages and Markets
arXiv:2606.25787v1 Announce Type: cross Abstract: When a large language model (LLM) answers a question about a company, it grounds the answer in retrieved web sources, and those sources decide what the model says. Most analysis of AI brand visibility looks at the answer text.…
37 -
arXiv — NLP / Computation & Language research 5d ago
Autodata: An agentic data scientist to create high quality synthetic data
arXiv:2606.25996v1 Announce Type: cross Abstract: We introduce Autodata, a general method that enables AI agents to act as data scientists who build high quality training and evaluation data. We show how to train (meta-optimize) such a data scientist agent, so that it learns to…
30 -
arXiv — NLP / Computation & Language research 5d ago
How Robust is OCR-Reasoning? Evaluating OCR-Reasoning Robustness of Vision-Language Models under Visual Perturbations
arXiv:2606.26041v1 Announce Type: cross Abstract: Vision-language models (VLMs) have achieved strong performance on OCR-based benchmarks and increasingly focused on text-rich understanding, but their robustness under controlled visual degradation remains insufficiently…
29 -
arXiv — NLP / Computation & Language research 5d ago
Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining
arXiv:2606.26050v1 Announce Type: cross Abstract: Midway through an ordinary pretraining run, a small language model learns the pronoun-gender rule: cued with a girl's name ("Sue cried because"), it resolves the next pronoun to she, generalizing to held-out probes (0.94 by step…
4 -
arXiv — NLP / Computation & Language research 5d ago
Learning to Erase Private Knowledge from Multi-Documents for Retrieval-Augmented Large Language Models
arXiv:2504.09910v2 Announce Type: replace Abstract: Retrieval-Augmented Generation (RAG) is a promising technique for applying LLMs to proprietary domains. However, retrieved documents may contain sensitive knowledge, posing risks of privacy leakage in generative results. Thus,…
4 -
arXiv — NLP / Computation & Language research 5d ago
A Systematic Analysis of Hybrid Linear Attention
arXiv:2507.06457v2 Announce Type: replace Abstract: Transformers face quadratic complexity and memory issues with long sequences, prompting the adoption of linear attention mechanisms using fixed-size hidden states. However, linear models often suffer from limited recall…
34 -
arXiv — NLP / Computation & Language research 5d ago
Scale or Reason? A Compute-Equivalent Analysis of Reasoning Distillation
arXiv:2509.22193v2 Announce Type: replace Abstract: Distilling reasoning traces from strong teacher models has become the standard recipe for building capable small language models. Yet reasoning traces are 5-20$\times$ longer than standard instruction fine-tuning (IFT) outputs,…
19 -
arXiv — NLP / Computation & Language research 5d ago
Robustness assessment of large audio language models in multiple-choice evaluation
arXiv:2510.04584v2 Announce Type: replace Abstract: Recent advances in large audio language models (LALMs) have primarily been assessed using a multiple-choice question answering (MCQA) framework. However, subtle changes, such as shifting the order of choices, result in…
13 -
arXiv — NLP / Computation & Language research 5d ago
How Pragmatics Shape Articulation: A Computational Case Study in STEM ASL Discourse
arXiv:2510.23842v2 Announce Type: replace Abstract: Most state-of-the-art sign language models are trained on interpreter or isolated vocabulary data, which overlooks the variability that characterizes natural dialogue. However, human communication dynamically adapts to contexts…
31 -
arXiv — NLP / Computation & Language research 5d ago
Reinforcement Learning Improves Traversal of Parametric Knowledge in LLMs
arXiv:2511.05933v2 Announce Type: replace Abstract: Reinforcement learning (RL) is often credited with improving language model reasoning at the expense of knowledge. We challenge this narrative by showing that reasoning models consistently outperform their instruction-tuned…
11 -
arXiv — NLP / Computation & Language research 5d ago
Constituency Structure over Eojeol in Korean Treebanks
arXiv:2512.22487v2 Announce Type: replace Abstract: The design of Korean constituency treebanks raises a central representational question concerning the choice of terminal units. Although Korean words are morphologically complex, treating morphemes as constituency terminals can…
14 -
arXiv — NLP / Computation & Language research 5d ago
Membox: Weaving Topic Continuity into Long-Range Memory for LLM Agents
arXiv:2601.03785v3 Announce Type: replace Abstract: Long-term human-agent dialogues are organized by topic continuity: adjacent turns often develop the same goal, plan, problem, or event, while related activities may recur across distant sessions. Yet many LLM agent memory…
25 -
arXiv — NLP / Computation & Language research 5d ago
Paid Voices vs. Public Feeds: Interpretable Cross-Platform Theme-Based Analysis of Climate Discourse
arXiv:2601.13317v2 Announce Type: replace Abstract: Climate discourse online shapes public understanding of climate change and informs political and policy debate, yet it unfolds across structurally different environments: paid advertising platforms host targeted,…
9 -
arXiv — NLP / Computation & Language research 5d ago
ConPress: Learning Efficient Reasoning from Multi-Question Contextual Pressure
arXiv:2602.01472v2 Announce Type: replace Abstract: Large reasoning models (LRMs) typically solve reasoning-intensive tasks by generating long chain-of-thought (CoT) traces, leading to substantial inference overhead. We identify a reproducible inference-time phenomenon, termed…
5 -
arXiv — NLP / Computation & Language research 5d ago
Am I More Pointwise or Pairwise? Revealing Position Bias in Rubric-Based LLM-as-a-Judge
arXiv:2602.02219v2 Announce Type: replace Abstract: Large language models are widely employed as evaluators, a paradigm commonly referred to as LLM-as-a-judge. Prior research has predominantly examined point-wise or pair-wise evaluation protocols; in contrast, our focus is on…
8 -
arXiv — NLP / Computation & Language research 6d ago
EXPO-SQL: Execution-based Clause-level Policy Optimization for Text-to-SQL
arXiv:2606.23693v1 Announce Type: new Abstract: Text-to-SQL enables users to query databases using natural language by generating executable SQL queries. Recent methods have increasingly adopted Large Language Models based reinforcement learning (RL) to leverage execution…
18 -
arXiv — NLP / Computation & Language research 6d ago
ModTGCN: Modularity-aware Graph Neural Networks for Text Classification
arXiv:2606.23694v1 Announce Type: new Abstract: Graph-based text classification models typically rely on local neighborhood aggregation and overlook global community structure, despite semantic document graphs exhibiting strong class-consistent clustering. Ignoring this can blur…
22 -
arXiv — NLP / Computation & Language research 6d ago
Quantifying Prior Dominance in RAG Systems
arXiv:2606.23695v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) grounds Large Language Models in external knowledge, yet current evaluations rely on discrete heuristics that suffer from ''epistemic blindness'' - failing to distinguish genuine contextual…
28 -
arXiv — NLP / Computation & Language research 6d ago
Self-Recognition Finetuning can Prevent and Reverse Emergent Misalignment
arXiv:2606.23700v1 Announce Type: new Abstract: Emergent misalignment (EM) has been linked to the activation of misaligned persona vectors and evil character traits, suggesting that EM operates through disruption of the model's aligned character rather than direct learning of…
8 -
arXiv — NLP / Computation & Language research 6d ago
Evaluating LLM Usage for Efficient and Explainable Numerical and Classified Implicit Sentiment Analysis of Product Desirability
arXiv:2606.23701v1 Announce Type: new Abstract: Qualitative product feedback can reveal nuanced user experiences, but its implicit sentiment is difficult to measure. This paper presents a scalable and interpretable framework that uses large language models (LLMs) to quantify…
32 -
arXiv — NLP / Computation & Language research 6d ago
Ground Then Rank: Revisiting Knowledge-Based VQA with Training-Free Entity Identification
arXiv:2606.23881v1 Announce Type: new Abstract: Knowledge-Based Visual Question Answering (KB-VQA) requires grounding visual queries to external knowledge beyond directly observable content in images. While recent multi modal large language models (MLLMs) show strong perceptual…
13 -
arXiv — NLP / Computation & Language research 6d ago
One Year Later...The Harms Persist, But So Do We!
arXiv:2606.23884v1 Announce Type: new Abstract: General-purpose large language models (LLMs) are increasingly used for mental health-related conversations, yet safety safeguards remain inadequate and inconsistent across clinical conditions. This study evaluates six proprietary…
26 -
arXiv — NLP / Computation & Language research 6d ago
Do LLM Attribution Metrics Transfer? Auditing Retrieval-Augmented Generation Evaluation Across Datasets and Constructs
arXiv:2606.23915v1 Announce Type: new Abstract: Practice often treats automatic metrics for attribution in LLM retrieval-augmented generation as interchangeable. We audit eight automatic scorers -- lexical, embedding, and BERTScore baselines alongside…
28 -
arXiv — NLP / Computation & Language research 6d ago
When Retrieval Metrics Mislead: Measuring Policy Signal in Long-Horizon Tool-Use Agents
arXiv:2606.23937v1 Announce Type: new Abstract: Exact-match retrieval recall is often used as a proxy for whether a retriever supplies useful policy context to a downstream decision model. We test this proxy for pre-action policy classification in tau-bench using Qwen2.5-3B/7B…
11 -
arXiv — NLP / Computation & Language research 6d ago
QuechuaTok: Morphological Boundary Accuracy as a Necessary Metric for Tokenizer Evaluation in Agglutinative Low-Resource Languages
arXiv:2606.23943v1 Announce Type: new Abstract: Tokenization is a foundational step in NLP pipelines, yet standard evaluation metrics such as fertility rate fail to capture morphological correctness for agglutinative languages. We present QuechuaTok, a systematic benchmark…
32 -
arXiv — NLP / Computation & Language research 6d ago
Layer-wise Probing of wav2vec 2.0 and Whisper for Consonant Cluster Reduction in African American English
arXiv:2606.23948v1 Announce Type: new Abstract: Self-supervised and supervised speech models are increasingly used to investigate which linguistic information their internal representations encode, and at what level of abstraction they encode it. One underexplored phenomenon is…
6 -
arXiv — NLP / Computation & Language research 6d ago
Does My Embedding Reflect That $A = B$? Evaluating Mathematical Equivalence in Embedding Models
arXiv:2606.23959v1 Announce Type: new Abstract: Because mathematics is highly abstract, a single statement can take very different forms depending on what subfield it is framed in. There are many examples where breakthroughs occurred after researchers discovered that a question…
25 -
arXiv — NLP / Computation & Language research 6d ago
Faithful by Construction: Claim-Anchored Attribution for Multi-Document Summarization
arXiv:2606.23989v1 Announce Type: new Abstract: End-to-end large language models (LLMs) produce fluent multi-document summaries but remain prone to hallucination, and the attributions they offer are typically coarse (whole documents or passages) and generated post hoc, leaving…
33 -
arXiv — NLP / Computation & Language research 6d ago
RASC+: Retrieval-Constrained LLM Adjudication for Clinical Value Set Authoring
arXiv:2606.23992v1 Announce Type: new Abstract: Clinical value sets define the standardized terminology codes used in quality measurement, phenotyping, cohort construction, and clinical decision support. The recently introduced Retrieval-Augmented Set Completion (RASC) benchmark…
32 -
arXiv — NLP / Computation & Language research 6d ago
Towards Spec Learning: Inference-Time Alignment from Preference Pairs
arXiv:2606.24004v1 Announce Type: new Abstract: Steering a large language model (LLM) toward a desired behavior typically relies on an iterative process of hand-crafting a prompt based on a careful inspection of the model's responses. This is an involved, brittle, and…
28 -
arXiv — NLP / Computation & Language research 6d ago
Towards Version-aware Operations and Transaction Memories for Multi-layer MeMo
arXiv:2606.24040v1 Announce Type: new Abstract: MeMo proposes language models with explicit multi-layer correlation matrix memories (CMMs), where memorization, retrieval, and forgetting are architectural operations. This paper asks how such memories can reduce the need for…
36 -
arXiv — NLP / Computation & Language research 6d ago
Best Preprocessing Techniques for Sentiment Analysis
arXiv:2606.24055v1 Announce Type: new Abstract: Sentiment analysis in Twitter datasets is important because it enables monitoring public opinion on products and analysis of political and social movements. One critical step is preprocessing: the automated processing of text for…
23