Tag

Research papers

500 articles archived under #paper · RSS

arXiv — NLP / Computation & Language research 4h ago

To Reason or to Fabricate: Reasoning Without Shortcuts via Hint-Anchored Pairwise Aggregation

arXiv:2606.29481v1 Announce Type: new Abstract: While reinforcement learning (RL) significantly enhances LLM reasoning, its efficacy is severely undermined by Pre-RL data overlap, where RL datasets overlap with pretraining or SFT corpora, causing models to exploit shortcuts by…

11
arXiv — NLP / Computation & Language research 4h ago

Which Tokens Need Context? A Reference-Based Analysis of Translation Responsibility Using Fertility and Entropy

arXiv:2606.29489v1 Announce Type: new Abstract: When humans translate, not every word depends equally on the surrounding context. Some tokens, particularly function words like pronouns and auxiliaries, rely heavily on preceding or following sentences, while others, such as…

32
arXiv — NLP / Computation & Language research 4h ago

The Verbose Context Problem in Medical Records

arXiv:2606.29503v1 Announce Type: new Abstract: The verbose context problem occurs when structured concepts have token-inefficient textual representations. This bottleneck is acute in population health: cohort-level analysis of longitudinal patient records requires reasoning…

30
arXiv — NLP / Computation & Language research 4h ago

Preference-ASR: A Preference-Aware Test Set for Benchmarking ASR in the Era of Speech LLMs

arXiv:2606.29534v1 Announce Type: new Abstract: Popular ASR test sets adopt inconsistent conventions for numbers, disfluencies, entities, and casing, while standard normalizers erase the format distinctions users care about. Current benchmarks therefore cannot measure whether a…

23
arXiv — NLP / Computation & Language research 4h ago

AURORA: Asymmetry and Update-Induced Rotation for Robust Hallucination Detection in Large Language Models

arXiv:2606.29545v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing tasks. However, their tendency to generate hallucinations, namely factually incorrect or unfaithful outputs,…

27
arXiv — NLP / Computation & Language research 4h ago

Coverage-Driven KV Cache Eviction for Efficient and Improved Inference of LLM

arXiv:2606.29563v1 Announce Type: new Abstract: Large language models (LLMs) excel at complex tasks like question answering and summarization, thanks to their ability to handle long-context inputs. However, deploying LLMs is costly, not only due to the high computational demands…

7
arXiv — NLP / Computation & Language research 4h ago

Anisotropy Decides Cosine vs. Rank Metrics for Text Embeddings

arXiv:2606.29571v1 Announce Type: new Abstract: The standard way to compare two text embeddings is cosine similarity. Scattered studies report that a different metric does better, but never pin down the geometric condition that decides when, or why. We settle both with a…

10
arXiv — NLP / Computation & Language research 4h ago

MAM-AI: An On-Device Medical Retrieval-Augmented Generation System for Nurses and Midwives in Zanzibar

arXiv:2606.29580v1 Announce Type: new Abstract: Maternal and newborn mortality remain among the highest in sub-Saharan Africa, where midwifery care is often delivered by nurses who lack midwifery training to international standards, and consulting authoritative guidance at the…

7
arXiv — NLP / Computation & Language research 4h ago

How much of an LLM-generated clinical corpus is actually new? A production-scale measurement of content redundancy for provenance classification

arXiv:2606.29605v1 Announce Type: new Abstract: Clinical machine learning increasingly relies on training corpora generated by large language models (LLMs) rather than annotated by clinicians, and such corpora are described and reused largely on the basis of their reported…

12
arXiv — NLP / Computation & Language research 4h ago

Do We Still Need Fine Tuning? Turkish Sentiment Analysis in the Era of Large Language Model

arXiv:2606.29614v1 Announce Type: new Abstract: This study examines whether supervised fine-tuning remains necessary for Turkish sentiment analysis in the era of large language models. We compare classical machine learning methods, fine-tuned pretrained language models, and…

35
arXiv — NLP / Computation & Language research 4h ago

Two-Stage Prompt Optimization for Few-Shot Relation Extraction: From Reasoning-Guided Search to Gradient-Guided Refinement

arXiv:2606.29639v1 Announce Type: new Abstract: Automatic prompt optimization is still underexplored for episodic few-shot relation extraction with smaller language models. We propose a two-stage framework that combines reasoning-based prompt optimization with gradient-based…

7
arXiv — NLP / Computation & Language research 4h ago

Hybrid Retriever Evolution for Multimodal Document Reasoning Agents

arXiv:2606.29648v1 Announce Type: new Abstract: Different retrievers, including lexical, semantic, and multimodal approaches, provide highly complementary strengths for multimodal document understanding, yet most systems combine them through fixed pipelines that cannot adapt to…

33
arXiv — NLP / Computation & Language research 4h ago

Resolution Thresholds in VLM Detection of Harmful ASCII Art Across Construction Modes and Languages

arXiv:2606.29649v1 Announce Type: new Abstract: Large Vision-Language Models (VLMs) are increasingly deployed as content moderation tools, yet they remain vulnerable to jailbreak attacks in which harmful text is visually encoded as ASCII art. This can allow inappropriate or…

31
arXiv — NLP / Computation & Language research 4h ago

How LLMs See Creativity: Zero-Shot Scoring of Visual Creativity with Interpretable Reasoning

arXiv:2606.29672v1 Announce Type: new Abstract: Evaluating the originality of visual images poses enduring challenges for creativity assessment. Automated scoring using AI models has proven effective in the verbal domain, yet key questions remain about evaluating visual…

11
arXiv — NLP / Computation & Language research 4h ago

Can MLLMs Critique Like Humans? Evaluating Open-Ended Aesthetic Reasoning in Multimodal Large Language Models

arXiv:2606.29689v1 Announce Type: new Abstract: Open-ended aesthetic critique is a challenge for multimodal large language models (MLLMs): unlike multiple-choice aesthetic benchmarks, it has no single correct answer, and most aesthetic evaluation has measured models against…

8
arXiv — NLP / Computation & Language research 4h ago

Why Struggle with Continuous Latents? Interpretable Discrete Latent Reasoning via Rendered Compression

arXiv:2606.29712v1 Announce Type: new Abstract: Large language models achieve high reasoning performance via explicit chain-of-thought and reinforcement learning, but require long output sequences and extended inference time. Latent reasoning reduces this cost by shifting…

22
arXiv — NLP / Computation & Language research 4h ago

SEVA: Self-Evolving Verification Agent with Process Reward for Fact Attribution

arXiv:2606.29713v1 Announce Type: new Abstract: Hallucination is the reliability bottleneck for LLM-based agents, and fact attribution verifiers are the last line of defense -- yet today's verifiers emit only opaque binary labels, leaving agents unable to self-correct and…

24
arXiv — NLP / Computation & Language research 4h ago

How Far Do On-Prem Open LLMs Get on Text-to-SQL? A Cross-Family Size x Technique Frontier on BIRD

arXiv:2606.29733v1 Announce Type: new Abstract: Organizations that cannot send data to a cloud API increasingly ask: how good is Text-to-SQL if the model must run on-premises on open weights, and which popular accuracy "recipes" are worth their compute? We answer with an honest,…

16
arXiv — NLP / Computation & Language research 4h ago

Fast Numbers, Slow Language: Bridging Quantitative and Qualitative Earnings Signals

arXiv:2606.29734v1 Announce Type: new Abstract: Earnings announcements release two types of information sequentially: quantitative surprise (numeric earnings-per-share (EPS)/revenue versus analyst estimate) arrives first in press releases and financial news, processed by…

12
arXiv — NLP / Computation & Language research 4h ago

Managing Map Cardinality in Automatic Disease Classification Mapping: Balancing Precision, Recall and Coverage

arXiv:2606.29750v1 Announce Type: new Abstract: Automatic mapping between disease classification systems, such as the International Classification of Diseases (ICD), is a challenging yet essential task for integrating health data and conducting longitudinal data analysis.…

32
arXiv — NLP / Computation & Language research 4h ago

Are Humans Evolved Instruction Followers? An Underlying Inductive Bias Enables Rapid Instructed Task Learning

arXiv:2606.29792v1 Announce Type: new Abstract: Human adults can often perform a novel task correctly on the first attempt after only receiving verbal or written instructions. This rapid instructed task learning (RITL) is a hallmark of human cognitive flexibility, yet its…

14
arXiv — NLP / Computation & Language research 4h ago

Fund2Persona: A Framework for Building and Refining Financial Advisor Personas from Fund Disclosure Data

arXiv:2606.29793v1 Announce Type: new Abstract: Demand for personalized financial advising is growing, but consistent advisor expertise is difficult to obtain, scale, and encode in LLM systems. Simple persona prompts rarely specify how a financial advisor should reason and often…

11
arXiv — NLP / Computation & Language research 4h ago

How Far Can You Get Without a GPU? A Systematic Benchmark of Lightweight Hallucination Detection Across Question Answering, Dialogue, and Summarisation

arXiv:2606.29809v1 Announce Type: new Abstract: Hallucination detection has become a pressing requirement for trustworthy AI deployment at scale. The most accurate detection methods depend on GPU-intensive inference, proprietary API calls, or white-box access to the generating…

27
arXiv — NLP / Computation & Language research 4h ago

SrDetection: A Self-Referential Framework for Data Leakage Detection in Code Large Language Models

arXiv:2606.29815v1 Announce Type: new Abstract: Evaluating code large language models (Code LLMs) requires reliable detection of data leakage, where benchmark performance is artificially inflated by exposure to benchmark data during pre-training. Existing approaches either…

7
arXiv — NLP / Computation & Language research 4h ago

Neural Procedural Memory: Empowering LLM Agents with Implicit Activation Steering

arXiv:2606.29824v1 Announce Type: new Abstract: While Large Language Models (LLMs) excel as static solvers, transforming them into autonomous agents remains challenging. This transition requires continuous environmental interaction, yet current agents lack the necessary…

17
arXiv — NLP / Computation & Language research 4h ago

Revealing the Technology Development of Natural Language Processing: A Scientific Entity-Centric Perspective

arXiv:2606.29836v1 Announce Type: new Abstract: Most studies on technology development have been conducted from a thematic perspective, but the topics are coarse-grained and insufficient to accurately represent technology. The development of automatic entity recognition…

18
arXiv — NLP / Computation & Language research 4h ago

MATCH: Modulating Attention via In-Context Retrieval for Long-Context Transformers

arXiv:2606.29844v1 Announce Type: new Abstract: The quadratic computational cost of traditional attention mechanisms poses a major bottleneck to the scalability and practical deployment of large language models (LLMs), particularly in long-context scenarios. To improve…

15
arXiv — NLP / Computation & Language research 4h ago

Smooth Scaling Laws Hide Stepwise Token Learning

arXiv:2606.29858v1 Announce Type: new Abstract: Language model loss follows remarkably regular scaling laws over model and data size, yet it remains unclear why the aggregate loss should exhibit a power-law form. Existing explanations often attribute this regularity to a…

7
arXiv — NLP / Computation & Language research 4h ago

Exploring Motivations for Algorithm Mention in the Domain of Natural Language Processing: A Deep Learning Approach

arXiv:2606.29859v1 Announce Type: new Abstract: With the rise of data-intensive science, algorithms have become central to scientific research. In academic papers, algorithms are mentioned for different purposes, such as describing, using, comparing, or improving methods for…

16
arXiv — NLP / Computation & Language research 4h ago

KbSD: Knowledge Boundary aware Self-Distillation for Behavioral Calibration in Agentic Search

arXiv:2606.29863v1 Announce Type: new Abstract: Agentic search equips large language models with dynamic retrieval abilities, but existing reinforcement learning methods remain limited by reward sparsity in knowledge boundary calibration -- deciding when to trust parametric…

38
arXiv — NLP / Computation & Language research 4h ago

ARKD: Adaptive Reinforcement Learning-Guided Bidirectional KL Divergence Distillation for Text Generation

arXiv:2606.29869v1 Announce Type: new Abstract: Knowledge distillation (KD) is a key technique for compressing Large Language Models (LLMs), yet methods relying on a single KL objective often fail to balance primary distribution fitting with long-tail probability modeling,…

14
arXiv — NLP / Computation & Language research 4h ago

Clinical Reasoning Graphs: Structured Evaluation of LLM Diagnostic Reasoning Reveals Competence Without Consistency

arXiv:2606.29876v1 Announce Type: new Abstract: Modern large language models (LLMs) reach 60-70% diagnostic accuracy on complex clinical case benchmarks, but accuracy alone cannot distinguish stable clinically-grounded reasoning from pattern matching. We introduce clinical…

10
arXiv — NLP / Computation & Language research 4h ago

Timesteps of Mamba Align with Human Reading Times

arXiv:2606.29904v1 Announce Type: new Abstract: This study demonstrates an alignment of per-word processing time in a popular state-space language model Mamba and human readers. In Mamba, the recurrent state transition at each layer conceptually takes some duration of time, the…

12
arXiv — NLP / Computation & Language research 4h ago

MemDelta: Controlled Baselines and Hidden Confounds in Agent Memory Evaluation

arXiv:2606.29914v1 Announce Type: new Abstract: Agent memory systems are increasingly evaluated against RAG and full-context baselines, but reported gains often mix changes in the memory method with changes in the language model, embedding model, or retrieval pipeline, making it…

4
arXiv — NLP / Computation & Language research 4h ago

Can LLM-as-a-Judge Reliably Verify Rubrics in Agentic Scenarios?

arXiv:2606.29920v1 Announce Type: new Abstract: Rubric-based scoring has become a widely used paradigm in model evaluation, typically with LLM-as-a-Judge (LaaJ) for rubric scoring. However, the reliability of LaaJ for rubric scoring remains underexplored. This concern is…

17
arXiv — NLP / Computation & Language research 4h ago

Towards Physical Intuitions for Alignment Dynamics: A Case Study With Randomness Crystallization

arXiv:2606.29933v1 Announce Type: new Abstract: The alignment of language models is typically studied through the lens of capability benchmarks, but the dynamics of how models change during post-training remain poorly understood. We argue that the physical sciences, and…

16
arXiv — NLP / Computation & Language research 4h ago

LatentRevise: Learning from Zero-Hit Reasoning

arXiv:2606.29938v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) is bottlenecked by hard prompts on which correct trajectories have low probability, so sampling misses them within a practical budget and leaves the policy update with little…

10
arXiv — NLP / Computation & Language research 4h ago

IHDec: Divergence-Steered Contrastive Decoding for Securing Multi-Turn Instruction Hierarchies

arXiv:2606.29960v1 Announce Type: new Abstract: Large Language Models (LLMs) often fail to maintain instruction hierarchies (IH) when processing multi-source inputs with varying role-level priorities, paradoxically adhering to lower-priority directives during conflicts. While…

29
arXiv — NLP / Computation & Language research 4h ago

Are We Measuring Strategy or Phrasing? The Gap Between Surface- and Approach-Level Diversity in LLM Math Reasoning

arXiv:2606.29985v1 Announce Type: new Abstract: Diversity in LLM mathematical reasoning is critical for exploration, but common diversity metrics mostly capture surface-level variation rather than differences in how a problem is solved. We address this gap by introducing…

27
arXiv — NLP / Computation & Language research 4h ago

LLM Agents Are Latent Context Managers: Eliciting Self-Managed Context via a Proprioceptive Dashboard

arXiv:2606.30005v1 Announce Type: new Abstract: Long-horizon tool agents are bottlenecked by how their context grows toward the limits of the context window. Recent systems make context management agent- or system-controlled, but they either learn a compression policy that…

34
arXiv — NLP / Computation & Language research 4h ago

Node-to-Neighborhood Semantic Consistency: Text-Topology Alignment for TAGs Anomaly Detection

arXiv:2606.30009v1 Announce Type: new Abstract: Graph anomaly detection (GAD) on text-attributed graphs (TAGs) is vital for applications such as fraud detection and academic integrity verification. Existing approaches generally fall into two paradigms. GNN-based methods…

36
arXiv — NLP / Computation & Language research 4h ago

Parametric Skills

arXiv:2606.30015v1 Announce Type: new Abstract: Since intelligence fundamentally relies on efficient skill acquisition (Chollet, 2019), the ability to leverage skills is critical. For LLMs, skills, manually authored or extracted from task trajectories, are textual recipes…

16
arXiv — NLP / Computation & Language research 4h ago

Little Brains, Big Feats: Exploring Compact Language Models

arXiv:2606.30062v1 Announce Type: new Abstract: While large language models have been dominating the research landscape recently, small language models remain highly relevant across various domains; yet, they receive far less attention. In this study, we investigate how smaller…

30
arXiv — NLP / Computation & Language research 4h ago

Not-quite-human tastes: the stylized omnivorousness of LLM survey surrogates

arXiv:2606.30085v1 Announce Type: new Abstract: Large-language models have proven to be remarkable if inconsistent parrots of public attitudes and opinions. The extent to which LLMs are able to produce reasonable approximations of cultural taste remains an open empirical…

6
arXiv — NLP / Computation & Language research 4h ago

Efficient Retrieval-Augmented Generation via Token Co-occurrence Graphs

arXiv:2606.30093v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) mitigates hallucinations in Large Language Models (LLMs) by grounding the generation process on external knowledge. However, standard RAG approaches struggle with multi-hop reasoning. While…

10
arXiv — NLP / Computation & Language research 4h ago

Information Dynamics of Language Communication

arXiv:2606.30096v1 Announce Type: new Abstract: Quantifying how meaning propagates through communicative exchanges remains underdeveloped in computational linguistics. Here we introduce an information-theoretic framework that quantifies the directed flow of semantic content…

31
arXiv — NLP / Computation & Language research 4h ago

Estimating Grammatical Gender Directions in Contextual Embeddings under Controlled and Natural Contexts

arXiv:2606.30152v1 Announce Type: new Abstract: Contextual language models conflate grammatical gender and social semantic bias in gendered languages such as Spanish. Existing gender debiasing approaches only operate on static word embeddings leaving contextual representations…

26
arXiv — NLP / Computation & Language research 4h ago

CORTEX: High-Quality Cross-Domain Organization of Web-Scale Corpora through Ontological Corpus Graph

arXiv:2606.30175v1 Announce Type: new Abstract: The continuous evolution of large language models drives escalating demands on data scale and quality, and as different training stages impose increasingly tailored data requirements, systematic organization of high-quality corpora…

22
arXiv — NLP / Computation & Language research 4h ago

DAIN: Dynamic Agent-Based Interaction Network for Efficient and Collaborative Multimodal Reasoning

arXiv:2606.30189v1 Announce Type: new Abstract: Current multimodal fusion approaches, particularly those based on static Mixture-of-Experts (MoE) architectures, often struggle to provide the adaptive and efficient collaborative reasoning required by complex real-world…

14
arXiv — NLP / Computation & Language research 4h ago

Forewarned is Forearmed: When Non-Sequential Embedding Turns Into an Anomaly Detector

arXiv:2606.30196v1 Announce Type: new Abstract: This paper offers an in-depth analysis of non-sequential multimodal sentence-level embeddings, with a particular focus on the SONAR model. We demonstrate that certain embedding dimensions are sensitive to perturbations and can…

25

To Reason or to Fabricate: Reasoning Without Shortcuts via Hint-Anchored Pairwise Aggregation

Which Tokens Need Context? A Reference-Based Analysis of Translation Responsibility Using Fertility and Entropy

The Verbose Context Problem in Medical Records

Preference-ASR: A Preference-Aware Test Set for Benchmarking ASR in the Era of Speech LLMs

AURORA: Asymmetry and Update-Induced Rotation for Robust Hallucination Detection in Large Language Models

Coverage-Driven KV Cache Eviction for Efficient and Improved Inference of LLM

Anisotropy Decides Cosine vs. Rank Metrics for Text Embeddings

MAM-AI: An On-Device Medical Retrieval-Augmented Generation System for Nurses and Midwives in Zanzibar

How much of an LLM-generated clinical corpus is actually new? A production-scale measurement of content redundancy for provenance classification

Do We Still Need Fine Tuning? Turkish Sentiment Analysis in the Era of Large Language Model

Two-Stage Prompt Optimization for Few-Shot Relation Extraction: From Reasoning-Guided Search to Gradient-Guided Refinement

Hybrid Retriever Evolution for Multimodal Document Reasoning Agents

Resolution Thresholds in VLM Detection of Harmful ASCII Art Across Construction Modes and Languages

How LLMs See Creativity: Zero-Shot Scoring of Visual Creativity with Interpretable Reasoning

Can MLLMs Critique Like Humans? Evaluating Open-Ended Aesthetic Reasoning in Multimodal Large Language Models

Why Struggle with Continuous Latents? Interpretable Discrete Latent Reasoning via Rendered Compression

SEVA: Self-Evolving Verification Agent with Process Reward for Fact Attribution

How Far Do On-Prem Open LLMs Get on Text-to-SQL? A Cross-Family Size x Technique Frontier on BIRD

Fast Numbers, Slow Language: Bridging Quantitative and Qualitative Earnings Signals

Managing Map Cardinality in Automatic Disease Classification Mapping: Balancing Precision, Recall and Coverage

Are Humans Evolved Instruction Followers? An Underlying Inductive Bias Enables Rapid Instructed Task Learning

Fund2Persona: A Framework for Building and Refining Financial Advisor Personas from Fund Disclosure Data

How Far Can You Get Without a GPU? A Systematic Benchmark of Lightweight Hallucination Detection Across Question Answering, Dialogue, and Summarisation

SrDetection: A Self-Referential Framework for Data Leakage Detection in Code Large Language Models

Neural Procedural Memory: Empowering LLM Agents with Implicit Activation Steering

Revealing the Technology Development of Natural Language Processing: A Scientific Entity-Centric Perspective

MATCH: Modulating Attention via In-Context Retrieval for Long-Context Transformers

Smooth Scaling Laws Hide Stepwise Token Learning

Exploring Motivations for Algorithm Mention in the Domain of Natural Language Processing: A Deep Learning Approach

KbSD: Knowledge Boundary aware Self-Distillation for Behavioral Calibration in Agentic Search

ARKD: Adaptive Reinforcement Learning-Guided Bidirectional KL Divergence Distillation for Text Generation

Clinical Reasoning Graphs: Structured Evaluation of LLM Diagnostic Reasoning Reveals Competence Without Consistency

Timesteps of Mamba Align with Human Reading Times

MemDelta: Controlled Baselines and Hidden Confounds in Agent Memory Evaluation

Can LLM-as-a-Judge Reliably Verify Rubrics in Agentic Scenarios?

Towards Physical Intuitions for Alignment Dynamics: A Case Study With Randomness Crystallization

LatentRevise: Learning from Zero-Hit Reasoning

IHDec: Divergence-Steered Contrastive Decoding for Securing Multi-Turn Instruction Hierarchies

Are We Measuring Strategy or Phrasing? The Gap Between Surface- and Approach-Level Diversity in LLM Math Reasoning

LLM Agents Are Latent Context Managers: Eliciting Self-Managed Context via a Proprioceptive Dashboard

Node-to-Neighborhood Semantic Consistency: Text-Topology Alignment for TAGs Anomaly Detection

Parametric Skills

Little Brains, Big Feats: Exploring Compact Language Models

Not-quite-human tastes: the stylized omnivorousness of LLM survey surrogates

Efficient Retrieval-Augmented Generation via Token Co-occurrence Graphs

Information Dynamics of Language Communication

Estimating Grammatical Gender Directions in Contextual Embeddings under Controlled and Natural Contexts

CORTEX: High-Quality Cross-Domain Organization of Web-Scale Corpora through Ontological Corpus Graph

DAIN: Dynamic Agent-Based Interaction Network for Efficient and Collaborative Multimodal Reasoning

Forewarned is Forearmed: When Non-Sequential Embedding Turns Into an Anomaly Detector