News / #paper Tag Research papers 500 articles archived under #paper · RSS Sign in to follow arXiv — NLP / Computation & Language research 4h ago To Reason or to Fabricate: Reasoning Without Shortcuts via Hint-Anchored Pairwise Aggregation arXiv:2606.29481v1 Announce Type: new Abstract: While reinforcement learning (RL) significantly enhances LLM reasoning, its efficacy is severely undermined by Pre-RL data overlap, where RL datasets overlap with pretraining or SFT corpora, causing models to exploit shortcuts by… 11 arXiv — NLP / Computation & Language research 4h ago Which Tokens Need Context? A Reference-Based Analysis of Translation Responsibility Using Fertility and Entropy arXiv:2606.29489v1 Announce Type: new Abstract: When humans translate, not every word depends equally on the surrounding context. Some tokens, particularly function words like pronouns and auxiliaries, rely heavily on preceding or following sentences, while others, such as… 32 arXiv — NLP / Computation & Language research 4h ago The Verbose Context Problem in Medical Records arXiv:2606.29503v1 Announce Type: new Abstract: The verbose context problem occurs when structured concepts have token-inefficient textual representations. This bottleneck is acute in population health: cohort-level analysis of longitudinal patient records requires reasoning… 30 arXiv — NLP / Computation & Language research 4h ago Preference-ASR: A Preference-Aware Test Set for Benchmarking ASR in the Era of Speech LLMs arXiv:2606.29534v1 Announce Type: new Abstract: Popular ASR test sets adopt inconsistent conventions for numbers, disfluencies, entities, and casing, while standard normalizers erase the format distinctions users care about. Current benchmarks therefore cannot measure whether a… 23 arXiv — NLP / Computation & Language research 4h ago AURORA: Asymmetry and Update-Induced Rotation for Robust Hallucination Detection in Large Language Models arXiv:2606.29545v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing tasks. However, their tendency to generate hallucinations, namely factually incorrect or unfaithful outputs,… 27 arXiv — NLP / Computation & Language research 4h ago Coverage-Driven KV Cache Eviction for Efficient and Improved Inference of LLM arXiv:2606.29563v1 Announce Type: new Abstract: Large language models (LLMs) excel at complex tasks like question answering and summarization, thanks to their ability to handle long-context inputs. However, deploying LLMs is costly, not only due to the high computational demands… 7 arXiv — NLP / Computation & Language research 4h ago Anisotropy Decides Cosine vs. Rank Metrics for Text Embeddings arXiv:2606.29571v1 Announce Type: new Abstract: The standard way to compare two text embeddings is cosine similarity. Scattered studies report that a different metric does better, but never pin down the geometric condition that decides when, or why. We settle both with a… 10 arXiv — NLP / Computation & Language research 4h ago MAM-AI: An On-Device Medical Retrieval-Augmented Generation System for Nurses and Midwives in Zanzibar arXiv:2606.29580v1 Announce Type: new Abstract: Maternal and newborn mortality remain among the highest in sub-Saharan Africa, where midwifery care is often delivered by nurses who lack midwifery training to international standards, and consulting authoritative guidance at the… 7 arXiv — NLP / Computation & Language research 4h ago How much of an LLM-generated clinical corpus is actually new? A production-scale measurement of content redundancy for provenance classification arXiv:2606.29605v1 Announce Type: new Abstract: Clinical machine learning increasingly relies on training corpora generated by large language models (LLMs) rather than annotated by clinicians, and such corpora are described and reused largely on the basis of their reported… 12 arXiv — NLP / Computation & Language research 4h ago Do We Still Need Fine Tuning? Turkish Sentiment Analysis in the Era of Large Language Model arXiv:2606.29614v1 Announce Type: new Abstract: This study examines whether supervised fine-tuning remains necessary for Turkish sentiment analysis in the era of large language models. We compare classical machine learning methods, fine-tuned pretrained language models, and… 35 arXiv — NLP / Computation & Language research 4h ago Two-Stage Prompt Optimization for Few-Shot Relation Extraction: From Reasoning-Guided Search to Gradient-Guided Refinement arXiv:2606.29639v1 Announce Type: new Abstract: Automatic prompt optimization is still underexplored for episodic few-shot relation extraction with smaller language models. We propose a two-stage framework that combines reasoning-based prompt optimization with gradient-based… 7 arXiv — NLP / Computation & Language research 4h ago Hybrid Retriever Evolution for Multimodal Document Reasoning Agents arXiv:2606.29648v1 Announce Type: new Abstract: Different retrievers, including lexical, semantic, and multimodal approaches, provide highly complementary strengths for multimodal document understanding, yet most systems combine them through fixed pipelines that cannot adapt to… 33 arXiv — NLP / Computation & Language research 4h ago Resolution Thresholds in VLM Detection of Harmful ASCII Art Across Construction Modes and Languages arXiv:2606.29649v1 Announce Type: new Abstract: Large Vision-Language Models (VLMs) are increasingly deployed as content moderation tools, yet they remain vulnerable to jailbreak attacks in which harmful text is visually encoded as ASCII art. This can allow inappropriate or… 31 arXiv — NLP / Computation & Language research 4h ago How LLMs See Creativity: Zero-Shot Scoring of Visual Creativity with Interpretable Reasoning arXiv:2606.29672v1 Announce Type: new Abstract: Evaluating the originality of visual images poses enduring challenges for creativity assessment. Automated scoring using AI models has proven effective in the verbal domain, yet key questions remain about evaluating visual… 11 arXiv — NLP / Computation & Language research 4h ago Can MLLMs Critique Like Humans? Evaluating Open-Ended Aesthetic Reasoning in Multimodal Large Language Models arXiv:2606.29689v1 Announce Type: new Abstract: Open-ended aesthetic critique is a challenge for multimodal large language models (MLLMs): unlike multiple-choice aesthetic benchmarks, it has no single correct answer, and most aesthetic evaluation has measured models against… 8 arXiv — NLP / Computation & Language research 4h ago Why Struggle with Continuous Latents? Interpretable Discrete Latent Reasoning via Rendered Compression arXiv:2606.29712v1 Announce Type: new Abstract: Large language models achieve high reasoning performance via explicit chain-of-thought and reinforcement learning, but require long output sequences and extended inference time. Latent reasoning reduces this cost by shifting… 22 arXiv — NLP / Computation & Language research 4h ago SEVA: Self-Evolving Verification Agent with Process Reward for Fact Attribution arXiv:2606.29713v1 Announce Type: new Abstract: Hallucination is the reliability bottleneck for LLM-based agents, and fact attribution verifiers are the last line of defense -- yet today's verifiers emit only opaque binary labels, leaving agents unable to self-correct and… 24 arXiv — NLP / Computation & Language research 4h ago How Far Do On-Prem Open LLMs Get on Text-to-SQL? A Cross-Family Size x Technique Frontier on BIRD arXiv:2606.29733v1 Announce Type: new Abstract: Organizations that cannot send data to a cloud API increasingly ask: how good is Text-to-SQL if the model must run on-premises on open weights, and which popular accuracy "recipes" are worth their compute? We answer with an honest,… 16 arXiv — NLP / Computation & Language research 4h ago Fast Numbers, Slow Language: Bridging Quantitative and Qualitative Earnings Signals arXiv:2606.29734v1 Announce Type: new Abstract: Earnings announcements release two types of information sequentially: quantitative surprise (numeric earnings-per-share (EPS)/revenue versus analyst estimate) arrives first in press releases and financial news, processed by… 12 arXiv — NLP / Computation & Language research 4h ago Managing Map Cardinality in Automatic Disease Classification Mapping: Balancing Precision, Recall and Coverage arXiv:2606.29750v1 Announce Type: new Abstract: Automatic mapping between disease classification systems, such as the International Classification of Diseases (ICD), is a challenging yet essential task for integrating health data and conducting longitudinal data analysis.… 32 arXiv — NLP / Computation & Language research 4h ago Are Humans Evolved Instruction Followers? An Underlying Inductive Bias Enables Rapid Instructed Task Learning arXiv:2606.29792v1 Announce Type: new Abstract: Human adults can often perform a novel task correctly on the first attempt after only receiving verbal or written instructions. This rapid instructed task learning (RITL) is a hallmark of human cognitive flexibility, yet its… 14 arXiv — NLP / Computation & Language research 4h ago Fund2Persona: A Framework for Building and Refining Financial Advisor Personas from Fund Disclosure Data arXiv:2606.29793v1 Announce Type: new Abstract: Demand for personalized financial advising is growing, but consistent advisor expertise is difficult to obtain, scale, and encode in LLM systems. Simple persona prompts rarely specify how a financial advisor should reason and often… 11 arXiv — NLP / Computation & Language research 4h ago How Far Can You Get Without a GPU? A Systematic Benchmark of Lightweight Hallucination Detection Across Question Answering, Dialogue, and Summarisation arXiv:2606.29809v1 Announce Type: new Abstract: Hallucination detection has become a pressing requirement for trustworthy AI deployment at scale. The most accurate detection methods depend on GPU-intensive inference, proprietary API calls, or white-box access to the generating… 27 arXiv — NLP / Computation & Language research 4h ago SrDetection: A Self-Referential Framework for Data Leakage Detection in Code Large Language Models arXiv:2606.29815v1 Announce Type: new Abstract: Evaluating code large language models (Code LLMs) requires reliable detection of data leakage, where benchmark performance is artificially inflated by exposure to benchmark data during pre-training. Existing approaches either… 7 arXiv — NLP / Computation & Language research 4h ago Neural Procedural Memory: Empowering LLM Agents with Implicit Activation Steering arXiv:2606.29824v1 Announce Type: new Abstract: While Large Language Models (LLMs) excel as static solvers, transforming them into autonomous agents remains challenging. This transition requires continuous environmental interaction, yet current agents lack the necessary… 17 arXiv — NLP / Computation & Language research 4h ago Revealing the Technology Development of Natural Language Processing: A Scientific Entity-Centric Perspective arXiv:2606.29836v1 Announce Type: new Abstract: Most studies on technology development have been conducted from a thematic perspective, but the topics are coarse-grained and insufficient to accurately represent technology. The development of automatic entity recognition… 18 arXiv — NLP / Computation & Language research 4h ago MATCH: Modulating Attention via In-Context Retrieval for Long-Context Transformers arXiv:2606.29844v1 Announce Type: new Abstract: The quadratic computational cost of traditional attention mechanisms poses a major bottleneck to the scalability and practical deployment of large language models (LLMs), particularly in long-context scenarios. To improve… 15 arXiv — NLP / Computation & Language research 4h ago Smooth Scaling Laws Hide Stepwise Token Learning arXiv:2606.29858v1 Announce Type: new Abstract: Language model loss follows remarkably regular scaling laws over model and data size, yet it remains unclear why the aggregate loss should exhibit a power-law form. Existing explanations often attribute this regularity to a… 7 arXiv — NLP / Computation & Language research 4h ago Exploring Motivations for Algorithm Mention in the Domain of Natural Language Processing: A Deep Learning Approach arXiv:2606.29859v1 Announce Type: new Abstract: With the rise of data-intensive science, algorithms have become central to scientific research. In academic papers, algorithms are mentioned for different purposes, such as describing, using, comparing, or improving methods for… 16 arXiv — NLP / Computation & Language research 4h ago KbSD: Knowledge Boundary aware Self-Distillation for Behavioral Calibration in Agentic Search arXiv:2606.29863v1 Announce Type: new Abstract: Agentic search equips large language models with dynamic retrieval abilities, but existing reinforcement learning methods remain limited by reward sparsity in knowledge boundary calibration -- deciding when to trust parametric… 38 arXiv — NLP / Computation & Language research 4h ago ARKD: Adaptive Reinforcement Learning-Guided Bidirectional KL Divergence Distillation for Text Generation arXiv:2606.29869v1 Announce Type: new Abstract: Knowledge distillation (KD) is a key technique for compressing Large Language Models (LLMs), yet methods relying on a single KL objective often fail to balance primary distribution fitting with long-tail probability modeling,… 14 arXiv — NLP / Computation & Language research 4h ago Clinical Reasoning Graphs: Structured Evaluation of LLM Diagnostic Reasoning Reveals Competence Without Consistency arXiv:2606.29876v1 Announce Type: new Abstract: Modern large language models (LLMs) reach 60-70% diagnostic accuracy on complex clinical case benchmarks, but accuracy alone cannot distinguish stable clinically-grounded reasoning from pattern matching. We introduce clinical… 10 arXiv — NLP / Computation & Language research 4h ago Timesteps of Mamba Align with Human Reading Times arXiv:2606.29904v1 Announce Type: new Abstract: This study demonstrates an alignment of per-word processing time in a popular state-space language model Mamba and human readers. In Mamba, the recurrent state transition at each layer conceptually takes some duration of time, the… 12 arXiv — NLP / Computation & Language research 4h ago MemDelta: Controlled Baselines and Hidden Confounds in Agent Memory Evaluation arXiv:2606.29914v1 Announce Type: new Abstract: Agent memory systems are increasingly evaluated against RAG and full-context baselines, but reported gains often mix changes in the memory method with changes in the language model, embedding model, or retrieval pipeline, making it… 4 arXiv — NLP / Computation & Language research 4h ago Can LLM-as-a-Judge Reliably Verify Rubrics in Agentic Scenarios? arXiv:2606.29920v1 Announce Type: new Abstract: Rubric-based scoring has become a widely used paradigm in model evaluation, typically with LLM-as-a-Judge (LaaJ) for rubric scoring. However, the reliability of LaaJ for rubric scoring remains underexplored. This concern is… 17 arXiv — NLP / Computation & Language research 4h ago Towards Physical Intuitions for Alignment Dynamics: A Case Study With Randomness Crystallization arXiv:2606.29933v1 Announce Type: new Abstract: The alignment of language models is typically studied through the lens of capability benchmarks, but the dynamics of how models change during post-training remain poorly understood. We argue that the physical sciences, and… 16 arXiv — NLP / Computation & Language research 4h ago LatentRevise: Learning from Zero-Hit Reasoning arXiv:2606.29938v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) is bottlenecked by hard prompts on which correct trajectories have low probability, so sampling misses them within a practical budget and leaves the policy update with little… 10 arXiv — NLP / Computation & Language research 4h ago IHDec: Divergence-Steered Contrastive Decoding for Securing Multi-Turn Instruction Hierarchies arXiv:2606.29960v1 Announce Type: new Abstract: Large Language Models (LLMs) often fail to maintain instruction hierarchies (IH) when processing multi-source inputs with varying role-level priorities, paradoxically adhering to lower-priority directives during conflicts. While… 29 arXiv — NLP / Computation & Language research 4h ago Are We Measuring Strategy or Phrasing? The Gap Between Surface- and Approach-Level Diversity in LLM Math Reasoning arXiv:2606.29985v1 Announce Type: new Abstract: Diversity in LLM mathematical reasoning is critical for exploration, but common diversity metrics mostly capture surface-level variation rather than differences in how a problem is solved. We address this gap by introducing… 27 arXiv — NLP / Computation & Language research 4h ago LLM Agents Are Latent Context Managers: Eliciting Self-Managed Context via a Proprioceptive Dashboard arXiv:2606.30005v1 Announce Type: new Abstract: Long-horizon tool agents are bottlenecked by how their context grows toward the limits of the context window. Recent systems make context management agent- or system-controlled, but they either learn a compression policy that… 34 arXiv — NLP / Computation & Language research 4h ago Node-to-Neighborhood Semantic Consistency: Text-Topology Alignment for TAGs Anomaly Detection arXiv:2606.30009v1 Announce Type: new Abstract: Graph anomaly detection (GAD) on text-attributed graphs (TAGs) is vital for applications such as fraud detection and academic integrity verification. Existing approaches generally fall into two paradigms. GNN-based methods… 36 arXiv — NLP / Computation & Language research 4h ago Parametric Skills arXiv:2606.30015v1 Announce Type: new Abstract: Since intelligence fundamentally relies on efficient skill acquisition (Chollet, 2019), the ability to leverage skills is critical. For LLMs, skills, manually authored or extracted from task trajectories, are textual recipes… 16 arXiv — NLP / Computation & Language research 4h ago Little Brains, Big Feats: Exploring Compact Language Models arXiv:2606.30062v1 Announce Type: new Abstract: While large language models have been dominating the research landscape recently, small language models remain highly relevant across various domains; yet, they receive far less attention. In this study, we investigate how smaller… 30 arXiv — NLP / Computation & Language research 4h ago Not-quite-human tastes: the stylized omnivorousness of LLM survey surrogates arXiv:2606.30085v1 Announce Type: new Abstract: Large-language models have proven to be remarkable if inconsistent parrots of public attitudes and opinions. The extent to which LLMs are able to produce reasonable approximations of cultural taste remains an open empirical… 6 arXiv — NLP / Computation & Language research 4h ago Efficient Retrieval-Augmented Generation via Token Co-occurrence Graphs arXiv:2606.30093v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) mitigates hallucinations in Large Language Models (LLMs) by grounding the generation process on external knowledge. However, standard RAG approaches struggle with multi-hop reasoning. While… 10 arXiv — NLP / Computation & Language research 4h ago Information Dynamics of Language Communication arXiv:2606.30096v1 Announce Type: new Abstract: Quantifying how meaning propagates through communicative exchanges remains underdeveloped in computational linguistics. Here we introduce an information-theoretic framework that quantifies the directed flow of semantic content… 31 arXiv — NLP / Computation & Language research 4h ago Estimating Grammatical Gender Directions in Contextual Embeddings under Controlled and Natural Contexts arXiv:2606.30152v1 Announce Type: new Abstract: Contextual language models conflate grammatical gender and social semantic bias in gendered languages such as Spanish. Existing gender debiasing approaches only operate on static word embeddings leaving contextual representations… 26 arXiv — NLP / Computation & Language research 4h ago CORTEX: High-Quality Cross-Domain Organization of Web-Scale Corpora through Ontological Corpus Graph arXiv:2606.30175v1 Announce Type: new Abstract: The continuous evolution of large language models drives escalating demands on data scale and quality, and as different training stages impose increasingly tailored data requirements, systematic organization of high-quality corpora… 22 arXiv — NLP / Computation & Language research 4h ago DAIN: Dynamic Agent-Based Interaction Network for Efficient and Collaborative Multimodal Reasoning arXiv:2606.30189v1 Announce Type: new Abstract: Current multimodal fusion approaches, particularly those based on static Mixture-of-Experts (MoE) architectures, often struggle to provide the adaptive and efficient collaborative reasoning required by complex real-world… 14 arXiv — NLP / Computation & Language research 4h ago Forewarned is Forearmed: When Non-Sequential Embedding Turns Into an Anomaly Detector arXiv:2606.30196v1 Announce Type: new Abstract: This paper offers an in-depth analysis of non-sequential multimodal sentence-level embeddings, with a particular focus on the SONAR model. We demonstrate that certain embedding dimensions are sensitive to perturbations and can… 25 Page 4 of 10 · 500 articles ← Newer Older →