arXiv — NLP / Computation & Language

500 articles archived · Visit source ↗ · RSS

arXiv — NLP / Computation & Language research 6d ago

Less is More: Quality-Aware Training Data Selection for Scientific Summarization

arXiv:2606.24828v1 Announce Type: new Abstract: Scientific long-document summarization datasets commonly treat author-written abstracts as gold reference summaries, although their quality and alignment with the source article vary. At the same time, publicly available scientific…

38
arXiv — NLP / Computation & Language research 6d ago

EvidenceLens: A Claim-Evidence Matrix for Auditing Financial Question Answering

arXiv:2606.23724v1 Announce Type: cross Abstract: Large language models are increasingly used to answer questions over annual reports, earnings decks, and analyst notes, yet their outputs remain difficult to verify in high-stakes financial workflows. A fluent answer can blend…

32
arXiv — NLP / Computation & Language research 6d ago

From Task-Guided Conversational Graphs to Goal-Oriented Dialogue Runtimes

arXiv:2606.23797v1 Announce Type: cross Abstract: Graph and multi-agent orchestration frameworks make production large language model (LLM) workflows practical, but they do not by themselves solve conversational continuity when users maintain several interdependent objectives.…

16
arXiv — NLP / Computation & Language research 6d ago

ESBMC-PLC+: A Unified IEC~61131-3 Formal Verification Framework as a PLCverif Successor

arXiv:2606.23870v1 Announce Type: cross Abstract: PLCverif is the most mature open-source platform for PLC formal verification, developed at CERN and in production use since 2019. Yet it has two fundamental limitations: no support for Ladder Diagram (LD) programs, the dominant…

35
arXiv — NLP / Computation & Language research 6d ago

Mind the Heads: Topological Representation Alignment for Multimodal LLMs

arXiv:2606.23885v1 Announce Type: cross Abstract: Representation alignment has emerged as an effective approach to improve Multimodal Large Language Models (MLLMs) by regularizing their internal representations toward those of an external vision encoder. However, existing…

17
arXiv — NLP / Computation & Language research 6d ago

Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs

arXiv:2606.23938v1 Announce Type: cross Abstract: Driving VLA models incorporating Chain-of-Thought (CoT) reasoning are attractive because they leverage pretrained VLM representations and expose intermediate decisions in natural language, yet current rationales often lack the…

4
arXiv — NLP / Computation & Language research 6d ago

Reinforcement Learning Towards Broadly and Persistently Beneficial Models

arXiv:2606.24014v1 Announce Type: cross Abstract: As AI systems are deployed across increasingly diverse and high-stakes settings, model alignment must generalize beyond the tasks and domains seen during training. This is especially important for reinforcement learning (RL),…

5
arXiv — NLP / Computation & Language research 6d ago

RoPE-Aware Bit Allocation for KV-Cache Quantization

arXiv:2606.24033v1 Announce Type: cross Abstract: Existing low-bit KV-cache quantizers often treat each cached key as a flat vector. Under RoPE, however, a key's contribution to a future attention logit decomposes into a position-dependent sum over two-dimensional frequency…

5
arXiv — NLP / Computation & Language research 6d ago

VieSpeaker: A Large-Scale Vietnamese Speaker Recognition Dataset Beyond Visual Dependency

arXiv:2606.24066v1 Announce Type: cross Abstract: Speaker recognition has advanced rapidly with large-scale training datasets, yet Vietnamese remains under-resourced, with existing corpora limited in scale and acoustic diversity. Most large-scale datasets rely on facial cues to…

25
arXiv — NLP / Computation & Language research 6d ago

Blockwise Policy-Drift Gating for On-Policy Distillation

arXiv:2606.24084v1 Announce Type: cross Abstract: On-policy distillation (OPD) trains a student policy using teacher signals computed on trajectories sampled by the student itself. Recent work shows that sampled-token OPD can be fragile on long-horizon reasoning tasks and that…

30
arXiv — NLP / Computation & Language research 6d ago

Exploring Academic Influence of Algorithms by Co-occurrence Network Based on Full-text of Academic Papers

arXiv:2606.24099v1 Announce Type: cross Abstract: Algorithms have become central to scientific research in the era of artificial intelligence (AI). Although algorithm mentions in papers are often used to indicate popularity and influence, existing studies usually evaluate…

14
arXiv — NLP / Computation & Language research 6d ago

When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs

arXiv:2606.24119v1 Announce Type: cross Abstract: Discrete diffusion language model (DLM) fine-tuning inherits inexpensive diagnostics from denoising-time confidence monitors, but their PEFT-training meaning is untested. We test top-1 argmax concentration as a collapse warning.…

12
arXiv — NLP / Computation & Language research 6d ago

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

arXiv:2606.24133v1 Announce Type: cross Abstract: The composition of training data, governed by the diversity of sources and their mixing strategy, is a cornerstone of Large Language Model (LLM) pre-training. Online Data Mixing (ODM), the technique of adaptively adjusting data…

13
arXiv — NLP / Computation & Language research 6d ago

Progressive Alignment Objectives for Aligner-Encoder based ASR

arXiv:2606.24147v1 Announce Type: cross Abstract: Aligner-Encoders are recently proposed seq2seq end-to-end ASR models that replace decoder attention by predicting the uth token directly from the u-th encoder position, so the encoder must learn the alignment internally without…

23
arXiv — NLP / Computation & Language research 6d ago

CORE-BREW: LLR-Based Soft Decoding for Robust Multi-Bit LLM Watermarking

arXiv:2606.24163v1 Announce Type: cross Abstract: Reliable provenance for LLM outputs requires multi-bit watermarks that remain robust under editing while maintaining strict false-positive control. Existing ECC-based LLM watermarks rely largely on hard-decision decoding,…

8
arXiv — NLP / Computation & Language research 6d ago

Agon: An Autonomous Large-Scale Omnidisciplinary Research System Built on Prompt Economy

arXiv:2606.24177v1 Announce Type: cross Abstract: Large language models are making research production scalable, shifting the bottleneck from producing artifacts to judging claims. We present \textsc{Agon}, a research orchestrator that validates what can be checked inside the…

24
arXiv — NLP / Computation & Language research 6d ago

Co-occurring associated retained concepts in Diffusion Unlearning

arXiv:2606.24192v1 Announce Type: cross Abstract: Unlearning has emerged as a key technique to mitigate harmful content generation in diffusion models. However, existing methods often remove not only the target concept, but also benign co-occurring concepts. As illustrated in…

25
arXiv — NLP / Computation & Language research 6d ago

Dialogue to Discovery: Attribute-Aware Preference Elicitation for Conversational Product Search Assistants

arXiv:2606.24194v1 Announce Type: cross Abstract: Conversational product search assistants offer a more expressive, natural, and interactive alternative to traditional keyword-based product search. With limited screen space, showing only a few items increases the need for…

36
arXiv — NLP / Computation & Language research 6d ago

PETRA: Transforming Web Text for Petroleum-Engineering Domain Adaptation

arXiv:2606.24346v1 Announce Type: cross Abstract: Petroleum-engineering search exposes a supervision gap for strong general retrievers: relevant evidence exists in public web text, but domain relevance labels are scarce. To address this gap, we propose PETRA, a large-scale…

28
arXiv — NLP / Computation & Language research 6d ago

ComputeFHE: A Privacy-Preserving General-Purpose Computation Library

arXiv:2606.24379v1 Announce Type: cross Abstract: Fully Homomorphic Encryption (FHE) enables computations to be performed directly on encrypted data while preserving data confidentiality. However, its practical applications remain limited by high computational costs and…

6
arXiv — NLP / Computation & Language research 6d ago

Age of LLM: A Strategic 1v1 Benchmark for Reasoning, Diplomacy and Reliability of Large Language Models under Fog of War

arXiv:2606.24391v1 Announce Type: cross Abstract: We introduce Age of LLM, a turn-based 1v1 benchmark in which two LLMs face off on a 13x7 grid to destroy the enemy base. Three stressors are deliberate: fog of war, full diplomacy (messages, ceasefires, ultimatums; uranium kept…

29
arXiv — NLP / Computation & Language research 6d ago

Bayesian control for coding agents

arXiv:2606.24453v1 Announce Type: cross Abstract: Modern coding agents pair LLM generators with various tools, including cheap diagnostics and expensive verifiers. The tool-use decisions are typically governed by orchestrators that often use fixed rules and ignore uncertainty.…

21
arXiv — NLP / Computation & Language research 6d ago

An LLM-based Two-Stage Transformer Framework for Cross-Domain Bearing Fault Diagnosis with Limited Data

arXiv:2606.24459v1 Announce Type: cross Abstract: Bearing fault diagnosis faces critical challenges when dataset heterogeneity, operating condition variations, and limited labeled data occur simultaneously in industrial environments. Existing approaches address these issues in…

30
arXiv — NLP / Computation & Language research 6d ago

A specialized reasoning large language model for accelerating rare disease diagnosis: a randomized AI physician assistance trial

arXiv:2606.24510v1 Announce Type: cross Abstract: Rare diseases affect millions of individuals worldwide, yet timely diagnosis remains a major public health challenge due to scarcity of specialized clinical expertise. While large language models (LLMs) show promise to support…

28
arXiv — NLP / Computation & Language research 6d ago

AdversaBench: Automated LLM Red-Teaming with Multi-Judge Confirmation and Cross-Model Transferability

arXiv:2606.24589v1 Announce Type: cross Abstract: Scaling adversarial evaluation of large language models requires both a method for generating hard inputs and a reliable way to confirm that resulting failures are real. We present AdversaBench, an end-to-end red-teaming pipeline…

25
arXiv — NLP / Computation & Language research 6d ago

ParaPairAudioBench: Paralinguistic Pairwise Audio Benchmark for LALM-as-a-Judge

arXiv:2606.24648v1 Announce Type: cross Abstract: Large Audio-Language Models (LALMs) have been widely used as judge models for the automatic evaluation of generated speech. However, prior approaches predominantly focus on holistic naturalness, leaving fine-grained…

15
arXiv — NLP / Computation & Language research 6d ago

Matching Tasks to Objectives: Fine-Tuning and Prompt-Tuning Strategies for Encoder-Decoder Pre-trained Language Models

arXiv:2606.24841v1 Announce Type: cross Abstract: Prompt-based learning has emerged as a dominant paradigm in natural language processing. This study explores the impact of diverse pre-training objectives on the performance of encoder-decoder pre-trained language models across…

18
arXiv — NLP / Computation & Language research 6d ago

Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering

arXiv:2403.04890v4 Announce Type: replace Abstract: In this paper, we propose a modified version of the MedQA-USMLE dataset, named MEDQA-OPEN, which contains open-ended medical questions without options to mimic clinical scenarios, along with clinician-approved reasoned answers.…

7
arXiv — NLP / Computation & Language research 6d ago

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

arXiv:2409.11363v2 Announce Type: replace Abstract: AI agents have the potential to aid users on a variety of consequential tasks, including conducting scientific research. To spur the development of useful agents, we need benchmarks that are challenging, but more crucially,…

20
arXiv — NLP / Computation & Language research 6d ago

Benchmarking LLMs' Mathematical Reasoning with Unseen Random Variables Questions

arXiv:2501.11790v5 Announce Type: replace Abstract: Recent studies have raised significant concerns regarding the reliability of current mathematics benchmarks, highlighting issues such as simplistic design and potential data contamination. Consequently, developing a reliable…

29
arXiv — NLP / Computation & Language research 6d ago

Ensemble Learning for Large Language Models in Text and Code Generation: A Survey

arXiv:2503.13505v3 Announce Type: replace Abstract: Generative Pretrained Transformers (GPTs) are foundational Large Language Models (LLMs) for text generation. However, individual LLMs often produce inconsistent outputs and exhibit biases, limiting their representation of…

10
arXiv — NLP / Computation & Language research 6d ago

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

arXiv:2504.17768v3 Announce Type: replace Abstract: Sparse attention offers a promising strategy to extend long-context capabilities in Transformer LLMs, yet its efficiency-accuracy trade-offs remain unclear due to the lack of comprehensive evaluation. We address this gap with…

29
arXiv — NLP / Computation & Language research 6d ago

Business as Rulesual: A Benchmark and Framework for Business Rule Flow Modeling with LLMs

arXiv:2505.18542v4 Announce Type: replace Abstract: Extracting structured procedural knowledge from unstructured business documents is a critical yet unresolved bottleneck in process automation. While prior work has focused on extracting linear action flows from instructional…

32
arXiv — NLP / Computation & Language research 11d ago

Exposing the Unsaid: Visualizing Hidden LLM Bias through Stochastic Path Aggregation

arXiv:2606.19344v1 Announce Type: new Abstract: Large Language Models (LLMs) exhibit representational and syntactic biases that are difficult to evaluate due to the stochastic nature of text generation. Standard auditing methods rely on a single output inspection or static…

38
arXiv — NLP / Computation & Language research 11d ago

Ensembles of Large Language Models for Identifying EQ-5D Studies in PubMed Based on Their Abstracts

arXiv:2606.19345v1 Announce Type: new Abstract: The rapid increase in scientific publications leads to the fact that manual study screening in systematic literature reviews (SLRs) is increasingly resource consuming, inefficient, and inconsistent. Classifying studies that clearly…

25
arXiv — NLP / Computation & Language research 11d ago

Disentangling Linguistic Relatedness from Task Alignment in Cross-Lingual Transfer

arXiv:2606.19346v1 Announce Type: new Abstract: We study cross-lingual transfer by fine-tuning seven large language models (4B--671B parameters) on Arabic and evaluating zero-shot reading comprehension on Semitic languages and non-Semitic controls. Across dense and…

6
arXiv — NLP / Computation & Language research 11d ago

How LLMs Fail and Generalize in RTL Coding for Hardware Design?

arXiv:2606.19347v1 Announce Type: new Abstract: Translating sequential programming priors into the parallel temporal logic of hardware design remains a crucial bottleneck for large language models(LLM). To investigate this, we introduce a new error taxonomy grounded in problem…

9
arXiv — NLP / Computation & Language research 11d ago

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

arXiv:2606.19348v1 Announce Type: new Abstract: We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models -- DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) --…

11
arXiv — NLP / Computation & Language research 11d ago

Where to Place the Query? Unveiling and Mitigating Positional Bias in In-Context Learning for Diffusion LLMs via Decoding Dynamics

arXiv:2606.19349v1 Announce Type: new Abstract: While In-Context Learning (ICL) is extensively studied in Autoregressive (AR) LLMs, its mechanism within Diffusion Large Language Models (dLLMs) remains largely unexplored. Unlike AR models restricted by unidirectional causal…

33
arXiv — NLP / Computation & Language research 11d ago

Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models

arXiv:2606.19350v1 Announce Type: new Abstract: Large language models (LLMs) excel at multi-step reasoning but incur substantial inference cost. We introduce Causal Attribution Pruning (CAP), a training-free method that identifies critical attention heads by measuring their…

34
arXiv — NLP / Computation & Language research 11d ago

Detecting Hallucinations for Large Language Model-based Knowledge Graph Reasoning

arXiv:2606.19351v1 Announce Type: new Abstract: Knowledge graph (KG) reasoning infers new knowledge from existing facts and is widely applied in question answering, recommendation, and decision support. With the rapid development of large language models (LLMs), LLM-based KG…

25
arXiv — NLP / Computation & Language research 11d ago

Sign-Language Datasets at Scale: A Comprehensive Survey on Resources, Benchmarks, and Annotation Standards

arXiv:2606.19352v1 Announce Type: new Abstract: Sign languages are expressive visual languages used by Deaf and Hard-of-Hearing (DHH) communities. Despite substantial progress in sign-language recognition, translation, and production, advances remain constrained by fragmented…

16
arXiv — NLP / Computation & Language research 11d ago

Quantifying Aleatoric Uncertainty of In-Context Learning for Robust Measure of LLM Prediction Confidence

arXiv:2606.19353v1 Announce Type: new Abstract: In-Context Learning (ICL) allows LLMs to adapt to new tasks from a few demonstrations, but its reliability remains a concern: predictions are highly sensitive to both prompt design and the model's ability to understand the context,…

32
arXiv — NLP / Computation & Language research 11d ago

Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling

arXiv:2606.19354v1 Announce Type: new Abstract: Test-time scaling (TTS) has emerged as a powerful paradigm for improving the reasoning performance of large language models (LLMs) by investing additional compute at inference time. A central component of TTS is the…

5
arXiv — NLP / Computation & Language research 11d ago

Trustworthy Multi-Agent Systems: Mitigating Semantic Drift with the Argent Signaling Protocol

arXiv:2606.19356v1 Announce Type: new Abstract: When multi-agent LLM systems produce bad answers, not all failures are equal: some answers are grounded in the right material but incomplete, while others are simply ungrounded and should be stopped. Current retry strategies treat…

35
arXiv — NLP / Computation & Language research 11d ago

Characterizing Narrative Content in Web-scale LLM Pretraining Data

arXiv:2606.19468v1 Announce Type: new Abstract: The narrative composition of web-scale LLM pretraining corpora remains largely unexplored even though narrative is a fundamental mode of human communication. We present the first fine-grained study of narrative features in Dolma, a…

32
arXiv — NLP / Computation & Language research 11d ago

Reliability without Validity: A Systematic, Large-Scale Evaluation of LLM-as-a-Judge Models Across Agreement, Consistency, and Bias

arXiv:2606.19544v1 Announce Type: new Abstract: LLM-as-a-Judge has become the dominant evaluation paradigm for language models, but judge validation in practice relies on exact-match agreement, a metric that does not correct for chance and systematically overstates…

34
arXiv — NLP / Computation & Language research 11d ago

LaViSA: A Language and Vision Structural Ambiguity Benchmark

arXiv:2606.19552v1 Announce Type: new Abstract: Structural ambiguity arises when a single sentence admits multiple valid interpretations due to its syntactic structure, posing a fundamental challenge for language understanding. Visual scenes serve as useful cues for resolving…

22
arXiv — NLP / Computation & Language research 11d ago

A BART-based approach with hierarchical strategy for Vietnamese abstractive multi-document summarization

arXiv:2606.19591v1 Announce Type: new Abstract: In this technical report, we focus on solving the challenge of Vietnamese multi-document abstractive summarization, introduced in the International Workshop on Vietnamese Language and Speech Processing (VLSP) 2022. We choose to…

9
arXiv — NLP / Computation & Language research 11d ago

Where Does Social Reasoning Come From? Capability Provenance in Language Models

arXiv:2606.19625v1 Announce Type: new Abstract: We use training-data attribution as an interpretable tool for capability discovery, mapping which regions of the pretraining corpus support social-reasoning versus STEM-reasoning in OLMo3-7B. Training-data attribution measures how…

9

Less is More: Quality-Aware Training Data Selection for Scientific Summarization

EvidenceLens: A Claim-Evidence Matrix for Auditing Financial Question Answering

From Task-Guided Conversational Graphs to Goal-Oriented Dialogue Runtimes

ESBMC-PLC+: A Unified IEC~61131-3 Formal Verification Framework as a PLCverif Successor

Mind the Heads: Topological Representation Alignment for Multimodal LLMs

Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs

Reinforcement Learning Towards Broadly and Persistently Beneficial Models

RoPE-Aware Bit Allocation for KV-Cache Quantization

VieSpeaker: A Large-Scale Vietnamese Speaker Recognition Dataset Beyond Visual Dependency

Blockwise Policy-Drift Gating for On-Policy Distillation

Exploring Academic Influence of Algorithms by Co-occurrence Network Based on Full-text of Academic Papers

When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

Progressive Alignment Objectives for Aligner-Encoder based ASR

CORE-BREW: LLR-Based Soft Decoding for Robust Multi-Bit LLM Watermarking

Agon: An Autonomous Large-Scale Omnidisciplinary Research System Built on Prompt Economy

Co-occurring associated retained concepts in Diffusion Unlearning

Dialogue to Discovery: Attribute-Aware Preference Elicitation for Conversational Product Search Assistants

PETRA: Transforming Web Text for Petroleum-Engineering Domain Adaptation

ComputeFHE: A Privacy-Preserving General-Purpose Computation Library

Age of LLM: A Strategic 1v1 Benchmark for Reasoning, Diplomacy and Reliability of Large Language Models under Fog of War

Bayesian control for coding agents

An LLM-based Two-Stage Transformer Framework for Cross-Domain Bearing Fault Diagnosis with Limited Data

A specialized reasoning large language model for accelerating rare disease diagnosis: a randomized AI physician assistance trial

AdversaBench: Automated LLM Red-Teaming with Multi-Judge Confirmation and Cross-Model Transferability

ParaPairAudioBench: Paralinguistic Pairwise Audio Benchmark for LALM-as-a-Judge

Matching Tasks to Objectives: Fine-Tuning and Prompt-Tuning Strategies for Encoder-Decoder Pre-trained Language Models

Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

Benchmarking LLMs' Mathematical Reasoning with Unseen Random Variables Questions

Ensemble Learning for Large Language Models in Text and Code Generation: A Survey

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

Business as Rulesual: A Benchmark and Framework for Business Rule Flow Modeling with LLMs

Exposing the Unsaid: Visualizing Hidden LLM Bias through Stochastic Path Aggregation

Ensembles of Large Language Models for Identifying EQ-5D Studies in PubMed Based on Their Abstracts

Disentangling Linguistic Relatedness from Task Alignment in Cross-Lingual Transfer

How LLMs Fail and Generalize in RTL Coding for Hardware Design?

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

Where to Place the Query? Unveiling and Mitigating Positional Bias in In-Context Learning for Diffusion LLMs via Decoding Dynamics

Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models

Detecting Hallucinations for Large Language Model-based Knowledge Graph Reasoning

Sign-Language Datasets at Scale: A Comprehensive Survey on Resources, Benchmarks, and Annotation Standards

Quantifying Aleatoric Uncertainty of In-Context Learning for Robust Measure of LLM Prediction Confidence

Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling

Trustworthy Multi-Agent Systems: Mitigating Semantic Drift with the Argent Signaling Protocol

Characterizing Narrative Content in Web-scale LLM Pretraining Data

Reliability without Validity: A Systematic, Large-Scale Evaluation of LLM-as-a-Judge Models Across Agreement, Consistency, and Bias

LaViSA: A Language and Vision Structural Ambiguity Benchmark

A BART-based approach with hierarchical strategy for Vietnamese abstractive multi-document summarization

Where Does Social Reasoning Come From? Capability Provenance in Language Models