arXiv — NLP / Computation & Language
500 articles archived · Visit source ↗ · RSS
-
arXiv — NLP / Computation & Language research 6d ago
Less is More: Quality-Aware Training Data Selection for Scientific Summarization
arXiv:2606.24828v1 Announce Type: new Abstract: Scientific long-document summarization datasets commonly treat author-written abstracts as gold reference summaries, although their quality and alignment with the source article vary. At the same time, publicly available scientific…
38 -
arXiv — NLP / Computation & Language research 6d ago
EvidenceLens: A Claim-Evidence Matrix for Auditing Financial Question Answering
arXiv:2606.23724v1 Announce Type: cross Abstract: Large language models are increasingly used to answer questions over annual reports, earnings decks, and analyst notes, yet their outputs remain difficult to verify in high-stakes financial workflows. A fluent answer can blend…
32 -
arXiv — NLP / Computation & Language research 6d ago
From Task-Guided Conversational Graphs to Goal-Oriented Dialogue Runtimes
arXiv:2606.23797v1 Announce Type: cross Abstract: Graph and multi-agent orchestration frameworks make production large language model (LLM) workflows practical, but they do not by themselves solve conversational continuity when users maintain several interdependent objectives.…
16 -
arXiv — NLP / Computation & Language research 6d ago
ESBMC-PLC+: A Unified IEC~61131-3 Formal Verification Framework as a PLCverif Successor
arXiv:2606.23870v1 Announce Type: cross Abstract: PLCverif is the most mature open-source platform for PLC formal verification, developed at CERN and in production use since 2019. Yet it has two fundamental limitations: no support for Ladder Diagram (LD) programs, the dominant…
35 -
arXiv — NLP / Computation & Language research 6d ago
Mind the Heads: Topological Representation Alignment for Multimodal LLMs
arXiv:2606.23885v1 Announce Type: cross Abstract: Representation alignment has emerged as an effective approach to improve Multimodal Large Language Models (MLLMs) by regularizing their internal representations toward those of an external vision encoder. However, existing…
17 -
arXiv — NLP / Computation & Language research 6d ago
Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs
arXiv:2606.23938v1 Announce Type: cross Abstract: Driving VLA models incorporating Chain-of-Thought (CoT) reasoning are attractive because they leverage pretrained VLM representations and expose intermediate decisions in natural language, yet current rationales often lack the…
4 -
arXiv — NLP / Computation & Language research 6d ago
Reinforcement Learning Towards Broadly and Persistently Beneficial Models
arXiv:2606.24014v1 Announce Type: cross Abstract: As AI systems are deployed across increasingly diverse and high-stakes settings, model alignment must generalize beyond the tasks and domains seen during training. This is especially important for reinforcement learning (RL),…
5 -
arXiv — NLP / Computation & Language research 6d ago
RoPE-Aware Bit Allocation for KV-Cache Quantization
arXiv:2606.24033v1 Announce Type: cross Abstract: Existing low-bit KV-cache quantizers often treat each cached key as a flat vector. Under RoPE, however, a key's contribution to a future attention logit decomposes into a position-dependent sum over two-dimensional frequency…
5 -
arXiv — NLP / Computation & Language research 6d ago
VieSpeaker: A Large-Scale Vietnamese Speaker Recognition Dataset Beyond Visual Dependency
arXiv:2606.24066v1 Announce Type: cross Abstract: Speaker recognition has advanced rapidly with large-scale training datasets, yet Vietnamese remains under-resourced, with existing corpora limited in scale and acoustic diversity. Most large-scale datasets rely on facial cues to…
25 -
arXiv — NLP / Computation & Language research 6d ago
Blockwise Policy-Drift Gating for On-Policy Distillation
arXiv:2606.24084v1 Announce Type: cross Abstract: On-policy distillation (OPD) trains a student policy using teacher signals computed on trajectories sampled by the student itself. Recent work shows that sampled-token OPD can be fragile on long-horizon reasoning tasks and that…
30 -
arXiv — NLP / Computation & Language research 6d ago
Exploring Academic Influence of Algorithms by Co-occurrence Network Based on Full-text of Academic Papers
arXiv:2606.24099v1 Announce Type: cross Abstract: Algorithms have become central to scientific research in the era of artificial intelligence (AI). Although algorithm mentions in papers are often used to indicate popularity and influence, existing studies usually evaluate…
14 -
arXiv — NLP / Computation & Language research 6d ago
When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs
arXiv:2606.24119v1 Announce Type: cross Abstract: Discrete diffusion language model (DLM) fine-tuning inherits inexpensive diagnostics from denoising-time confidence monitors, but their PEFT-training meaning is untested. We test top-1 argmax concentration as a collapse warning.…
12 -
arXiv — NLP / Computation & Language research 6d ago
Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning
arXiv:2606.24133v1 Announce Type: cross Abstract: The composition of training data, governed by the diversity of sources and their mixing strategy, is a cornerstone of Large Language Model (LLM) pre-training. Online Data Mixing (ODM), the technique of adaptively adjusting data…
13 -
arXiv — NLP / Computation & Language research 6d ago
Progressive Alignment Objectives for Aligner-Encoder based ASR
arXiv:2606.24147v1 Announce Type: cross Abstract: Aligner-Encoders are recently proposed seq2seq end-to-end ASR models that replace decoder attention by predicting the uth token directly from the u-th encoder position, so the encoder must learn the alignment internally without…
23 -
arXiv — NLP / Computation & Language research 6d ago
CORE-BREW: LLR-Based Soft Decoding for Robust Multi-Bit LLM Watermarking
arXiv:2606.24163v1 Announce Type: cross Abstract: Reliable provenance for LLM outputs requires multi-bit watermarks that remain robust under editing while maintaining strict false-positive control. Existing ECC-based LLM watermarks rely largely on hard-decision decoding,…
8 -
arXiv — NLP / Computation & Language research 6d ago
Agon: An Autonomous Large-Scale Omnidisciplinary Research System Built on Prompt Economy
arXiv:2606.24177v1 Announce Type: cross Abstract: Large language models are making research production scalable, shifting the bottleneck from producing artifacts to judging claims. We present \textsc{Agon}, a research orchestrator that validates what can be checked inside the…
24 -
arXiv — NLP / Computation & Language research 6d ago
Co-occurring associated retained concepts in Diffusion Unlearning
arXiv:2606.24192v1 Announce Type: cross Abstract: Unlearning has emerged as a key technique to mitigate harmful content generation in diffusion models. However, existing methods often remove not only the target concept, but also benign co-occurring concepts. As illustrated in…
25 -
arXiv — NLP / Computation & Language research 6d ago
Dialogue to Discovery: Attribute-Aware Preference Elicitation for Conversational Product Search Assistants
arXiv:2606.24194v1 Announce Type: cross Abstract: Conversational product search assistants offer a more expressive, natural, and interactive alternative to traditional keyword-based product search. With limited screen space, showing only a few items increases the need for…
36 -
arXiv — NLP / Computation & Language research 6d ago
PETRA: Transforming Web Text for Petroleum-Engineering Domain Adaptation
arXiv:2606.24346v1 Announce Type: cross Abstract: Petroleum-engineering search exposes a supervision gap for strong general retrievers: relevant evidence exists in public web text, but domain relevance labels are scarce. To address this gap, we propose PETRA, a large-scale…
28 -
arXiv — NLP / Computation & Language research 6d ago
ComputeFHE: A Privacy-Preserving General-Purpose Computation Library
arXiv:2606.24379v1 Announce Type: cross Abstract: Fully Homomorphic Encryption (FHE) enables computations to be performed directly on encrypted data while preserving data confidentiality. However, its practical applications remain limited by high computational costs and…
6 -
arXiv — NLP / Computation & Language research 6d ago
Age of LLM: A Strategic 1v1 Benchmark for Reasoning, Diplomacy and Reliability of Large Language Models under Fog of War
arXiv:2606.24391v1 Announce Type: cross Abstract: We introduce Age of LLM, a turn-based 1v1 benchmark in which two LLMs face off on a 13x7 grid to destroy the enemy base. Three stressors are deliberate: fog of war, full diplomacy (messages, ceasefires, ultimatums; uranium kept…
29 -
arXiv — NLP / Computation & Language research 6d ago
Bayesian control for coding agents
arXiv:2606.24453v1 Announce Type: cross Abstract: Modern coding agents pair LLM generators with various tools, including cheap diagnostics and expensive verifiers. The tool-use decisions are typically governed by orchestrators that often use fixed rules and ignore uncertainty.…
21 -
arXiv — NLP / Computation & Language research 6d ago
An LLM-based Two-Stage Transformer Framework for Cross-Domain Bearing Fault Diagnosis with Limited Data
arXiv:2606.24459v1 Announce Type: cross Abstract: Bearing fault diagnosis faces critical challenges when dataset heterogeneity, operating condition variations, and limited labeled data occur simultaneously in industrial environments. Existing approaches address these issues in…
30 -
arXiv — NLP / Computation & Language research 6d ago
A specialized reasoning large language model for accelerating rare disease diagnosis: a randomized AI physician assistance trial
arXiv:2606.24510v1 Announce Type: cross Abstract: Rare diseases affect millions of individuals worldwide, yet timely diagnosis remains a major public health challenge due to scarcity of specialized clinical expertise. While large language models (LLMs) show promise to support…
28 -
arXiv — NLP / Computation & Language research 6d ago
AdversaBench: Automated LLM Red-Teaming with Multi-Judge Confirmation and Cross-Model Transferability
arXiv:2606.24589v1 Announce Type: cross Abstract: Scaling adversarial evaluation of large language models requires both a method for generating hard inputs and a reliable way to confirm that resulting failures are real. We present AdversaBench, an end-to-end red-teaming pipeline…
25 -
arXiv — NLP / Computation & Language research 6d ago
ParaPairAudioBench: Paralinguistic Pairwise Audio Benchmark for LALM-as-a-Judge
arXiv:2606.24648v1 Announce Type: cross Abstract: Large Audio-Language Models (LALMs) have been widely used as judge models for the automatic evaluation of generated speech. However, prior approaches predominantly focus on holistic naturalness, leaving fine-grained…
15 -
arXiv — NLP / Computation & Language research 6d ago
Matching Tasks to Objectives: Fine-Tuning and Prompt-Tuning Strategies for Encoder-Decoder Pre-trained Language Models
arXiv:2606.24841v1 Announce Type: cross Abstract: Prompt-based learning has emerged as a dominant paradigm in natural language processing. This study explores the impact of diverse pre-training objectives on the performance of encoder-decoder pre-trained language models across…
18 -
arXiv — NLP / Computation & Language research 6d ago
Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering
arXiv:2403.04890v4 Announce Type: replace Abstract: In this paper, we propose a modified version of the MedQA-USMLE dataset, named MEDQA-OPEN, which contains open-ended medical questions without options to mimic clinical scenarios, along with clinician-approved reasoned answers.…
7 -
arXiv — NLP / Computation & Language research 6d ago
CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
arXiv:2409.11363v2 Announce Type: replace Abstract: AI agents have the potential to aid users on a variety of consequential tasks, including conducting scientific research. To spur the development of useful agents, we need benchmarks that are challenging, but more crucially,…
20 -
arXiv — NLP / Computation & Language research 6d ago
Benchmarking LLMs' Mathematical Reasoning with Unseen Random Variables Questions
arXiv:2501.11790v5 Announce Type: replace Abstract: Recent studies have raised significant concerns regarding the reliability of current mathematics benchmarks, highlighting issues such as simplistic design and potential data contamination. Consequently, developing a reliable…
29 -
arXiv — NLP / Computation & Language research 6d ago
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
arXiv:2503.13505v3 Announce Type: replace Abstract: Generative Pretrained Transformers (GPTs) are foundational Large Language Models (LLMs) for text generation. However, individual LLMs often produce inconsistent outputs and exhibit biases, limiting their representation of…
10 -
arXiv — NLP / Computation & Language research 6d ago
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs
arXiv:2504.17768v3 Announce Type: replace Abstract: Sparse attention offers a promising strategy to extend long-context capabilities in Transformer LLMs, yet its efficiency-accuracy trade-offs remain unclear due to the lack of comprehensive evaluation. We address this gap with…
29 -
arXiv — NLP / Computation & Language research 6d ago
Business as Rulesual: A Benchmark and Framework for Business Rule Flow Modeling with LLMs
arXiv:2505.18542v4 Announce Type: replace Abstract: Extracting structured procedural knowledge from unstructured business documents is a critical yet unresolved bottleneck in process automation. While prior work has focused on extracting linear action flows from instructional…
32 -
arXiv — NLP / Computation & Language research 11d ago
Exposing the Unsaid: Visualizing Hidden LLM Bias through Stochastic Path Aggregation
arXiv:2606.19344v1 Announce Type: new Abstract: Large Language Models (LLMs) exhibit representational and syntactic biases that are difficult to evaluate due to the stochastic nature of text generation. Standard auditing methods rely on a single output inspection or static…
38 -
arXiv — NLP / Computation & Language research 11d ago
Ensembles of Large Language Models for Identifying EQ-5D Studies in PubMed Based on Their Abstracts
arXiv:2606.19345v1 Announce Type: new Abstract: The rapid increase in scientific publications leads to the fact that manual study screening in systematic literature reviews (SLRs) is increasingly resource consuming, inefficient, and inconsistent. Classifying studies that clearly…
25 -
arXiv — NLP / Computation & Language research 11d ago
Disentangling Linguistic Relatedness from Task Alignment in Cross-Lingual Transfer
arXiv:2606.19346v1 Announce Type: new Abstract: We study cross-lingual transfer by fine-tuning seven large language models (4B--671B parameters) on Arabic and evaluating zero-shot reading comprehension on Semitic languages and non-Semitic controls. Across dense and…
6 -
arXiv — NLP / Computation & Language research 11d ago
How LLMs Fail and Generalize in RTL Coding for Hardware Design?
arXiv:2606.19347v1 Announce Type: new Abstract: Translating sequential programming priors into the parallel temporal logic of hardware design remains a crucial bottleneck for large language models(LLM). To investigate this, we introduce a new error taxonomy grounded in problem…
9 -
arXiv — NLP / Computation & Language research 11d ago
DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence
arXiv:2606.19348v1 Announce Type: new Abstract: We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models -- DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) --…
11 -
arXiv — NLP / Computation & Language research 11d ago
Where to Place the Query? Unveiling and Mitigating Positional Bias in In-Context Learning for Diffusion LLMs via Decoding Dynamics
arXiv:2606.19349v1 Announce Type: new Abstract: While In-Context Learning (ICL) is extensively studied in Autoregressive (AR) LLMs, its mechanism within Diffusion Large Language Models (dLLMs) remains largely unexplored. Unlike AR models restricted by unidirectional causal…
33 -
arXiv — NLP / Computation & Language research 11d ago
Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models
arXiv:2606.19350v1 Announce Type: new Abstract: Large language models (LLMs) excel at multi-step reasoning but incur substantial inference cost. We introduce Causal Attribution Pruning (CAP), a training-free method that identifies critical attention heads by measuring their…
34 -
arXiv — NLP / Computation & Language research 11d ago
Detecting Hallucinations for Large Language Model-based Knowledge Graph Reasoning
arXiv:2606.19351v1 Announce Type: new Abstract: Knowledge graph (KG) reasoning infers new knowledge from existing facts and is widely applied in question answering, recommendation, and decision support. With the rapid development of large language models (LLMs), LLM-based KG…
25 -
arXiv — NLP / Computation & Language research 11d ago
Sign-Language Datasets at Scale: A Comprehensive Survey on Resources, Benchmarks, and Annotation Standards
arXiv:2606.19352v1 Announce Type: new Abstract: Sign languages are expressive visual languages used by Deaf and Hard-of-Hearing (DHH) communities. Despite substantial progress in sign-language recognition, translation, and production, advances remain constrained by fragmented…
16 -
arXiv — NLP / Computation & Language research 11d ago
Quantifying Aleatoric Uncertainty of In-Context Learning for Robust Measure of LLM Prediction Confidence
arXiv:2606.19353v1 Announce Type: new Abstract: In-Context Learning (ICL) allows LLMs to adapt to new tasks from a few demonstrations, but its reliability remains a concern: predictions are highly sensitive to both prompt design and the model's ability to understand the context,…
32 -
arXiv — NLP / Computation & Language research 11d ago
Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling
arXiv:2606.19354v1 Announce Type: new Abstract: Test-time scaling (TTS) has emerged as a powerful paradigm for improving the reasoning performance of large language models (LLMs) by investing additional compute at inference time. A central component of TTS is the…
5 -
arXiv — NLP / Computation & Language research 11d ago
Trustworthy Multi-Agent Systems: Mitigating Semantic Drift with the Argent Signaling Protocol
arXiv:2606.19356v1 Announce Type: new Abstract: When multi-agent LLM systems produce bad answers, not all failures are equal: some answers are grounded in the right material but incomplete, while others are simply ungrounded and should be stopped. Current retry strategies treat…
35 -
arXiv — NLP / Computation & Language research 11d ago
Characterizing Narrative Content in Web-scale LLM Pretraining Data
arXiv:2606.19468v1 Announce Type: new Abstract: The narrative composition of web-scale LLM pretraining corpora remains largely unexplored even though narrative is a fundamental mode of human communication. We present the first fine-grained study of narrative features in Dolma, a…
32 -
arXiv — NLP / Computation & Language research 11d ago
Reliability without Validity: A Systematic, Large-Scale Evaluation of LLM-as-a-Judge Models Across Agreement, Consistency, and Bias
arXiv:2606.19544v1 Announce Type: new Abstract: LLM-as-a-Judge has become the dominant evaluation paradigm for language models, but judge validation in practice relies on exact-match agreement, a metric that does not correct for chance and systematically overstates…
34 -
arXiv — NLP / Computation & Language research 11d ago
LaViSA: A Language and Vision Structural Ambiguity Benchmark
arXiv:2606.19552v1 Announce Type: new Abstract: Structural ambiguity arises when a single sentence admits multiple valid interpretations due to its syntactic structure, posing a fundamental challenge for language understanding. Visual scenes serve as useful cues for resolving…
22 -
arXiv — NLP / Computation & Language research 11d ago
A BART-based approach with hierarchical strategy for Vietnamese abstractive multi-document summarization
arXiv:2606.19591v1 Announce Type: new Abstract: In this technical report, we focus on solving the challenge of Vietnamese multi-document abstractive summarization, introduced in the International Workshop on Vietnamese Language and Speech Processing (VLSP) 2022. We choose to…
9 -
arXiv — NLP / Computation & Language research 11d ago
Where Does Social Reasoning Come From? Capability Provenance in Language Models
arXiv:2606.19625v1 Announce Type: new Abstract: We use training-data attribution as an interpretable tool for capability discovery, mapping which regions of the pretraining corpus support social-reasoning versus STEM-reasoning in OLMo3-7B. Training-data attribution measures how…
9