Tag

Research papers

500 articles archived under #paper · RSS

arXiv — NLP / Computation & Language research 3h ago

Generating in the Limit with Infinitely Many Hallucinations

arXiv:2606.28354v1 Announce Type: new Abstract: The classic paradigm of language identification in the limit models learning as a game between an adversary, who reveals strings from an unknown target language, and a learner tasked with identifying that language. The recently…

10
arXiv — NLP / Computation & Language research 3h ago

Extracting Knowledge from an Arabic-English Machine-Readable Dictionary Using Information Extraction

arXiv:2606.28457v1 Announce Type: new Abstract: Natural language processing (NLP) applications need large and rich amount of linguistic knowledge. Furthermore, electronic language sources such as dictionaries, encyclopedia, and corpora became available. So, automatic methods are…

28
arXiv — NLP / Computation & Language research 3h ago

Developmental Trajectories of Situation Modeling and Mentalizing in Transformer Language Models

arXiv:2606.28524v1 Announce Type: new Abstract: Recent work suggests that Large Language Models (LLMs) are sensitive to the belief states of agents described by text, as measured by the false belief task (FBT), yet persistent concerns of construct validity remain. We adopt a…

25
arXiv — NLP / Computation & Language research 3h ago

A French OSCE Dialogue Dataset and Controllable Virtual Patient System for Clinical Training

arXiv:2606.28526v1 Announce Type: new Abstract: The clinical and communication skills of medical students are commonly assessed through Objective Structured Clinical Examinations (OSCEs), which consist of brief scenario-driven simulations of doctor-patient interactions. However,…

36
arXiv — NLP / Computation & Language research 3h ago

Legal Domain Adaptation of Modern BERT Models

arXiv:2606.28538v1 Announce Type: new Abstract: We investigate domain adaptation of modern BERT models in the legal domain. We further pre-train ModernBERT on all US court opinions using the masked language modeling objective. Although ModernBERT has been trained on roughly 500x…

26
arXiv — NLP / Computation & Language research 3h ago

Turn-Averaged SAEs for Feature Discovery and Long-Context Attribution

arXiv:2606.28548v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) have become a useful tool for extracting interpretable features in language models. However, standard SAE architectures operate on individual token activations, meaning that the number of active features…

25
arXiv — NLP / Computation & Language research 3h ago

Depth-Staggered Fibonacci Spacing for Sparse Attention: Static Schedules Beat Learned Dilation and Extrapolate Where Dense Attention Fails

arXiv:2606.28560v1 Announce Type: new Abstract: We study sparse self-attention in which each query attends to a dense local window plus a set of Fibonacci-spaced offsets, with a per-layer scalar alpha that compresses or expands the spacing. Across 21 language models trained…

20
arXiv — NLP / Computation & Language research 3h ago

SEAD: Competence-Aware On-Policy Distillation via Entropy-Guided Supervision

arXiv:2606.28562v1 Announce Type: new Abstract: On-policy distillation (OPD) has a property absent in offline distillation and RL: teacher supervision quality depends on student competence. Incoherent rollouts yield noisy gradients; already-mastered tokens yield redundant ones.…

10
arXiv — NLP / Computation & Language research 3h ago

Correct codes for the wrong reasons? validating LLMs as measurement instruments for theoretical constructs

arXiv:2606.28574v1 Announce Type: new Abstract: When a large language model (LLM) codes a construct in text as a human annotator would, that agreement makes the LLM a reliable coder. Yet reliability leaves construct validity untouched. The instrument may be theory-naive,…

35
arXiv — NLP / Computation & Language research 3h ago

Phonological Perception of Sign Language Models

arXiv:2606.28667v1 Announce Type: new Abstract: Sign languages are compositional systems where meaning arises by combining sublexical phonological parameters, such as handshape, location, and movement. While deep learning models for Sign Language Recognition (SLR) have achieved…

38
arXiv — NLP / Computation & Language research 3h ago

AnTenA: Actionable and Explainable Tensor Analysis System with Large Language Models

arXiv:2606.28708v1 Announce Type: new Abstract: Accurately explaining hidden patterns in multi-aspect data has typically been done by leveraging labels and/or accompanying auxiliary metadata. However, labels and auxiliary data may be inaccurate (e.g. nonstandard, inconsistent),…

21
arXiv — NLP / Computation & Language research 3h ago

SEATauBench: Adapting Tool-Agent-User Evaluation Into Low-Resource Southeast Asian Languages

arXiv:2606.28715v1 Announce Type: new Abstract: While AI development and evaluation for Southeast Asia (SEA) has grown rapidly, agent capabilities in regional languages are still poorly understood despite its importance to sovereign AI. To fill this gap, we introduce…

28
arXiv — NLP / Computation & Language research 3h ago

DriftGuard: Safety-Aware Multi-Monitor Detection and Selective Adaptation for Evolving Toxicity Moderation

arXiv:2606.28725v1 Announce Type: new Abstract: Automated toxicity moderation systems operate in dynamic online environments where harmful behavior evolves through coded language, shifting targets, and strategic adaptation to enforcement. Existing drift detection methods often…

12
arXiv — NLP / Computation & Language research 3h ago

5ting at SemEval-2026 Task 8: Strong End-to-End Multi-Turn RAG via LLM-Based Reranking and Faithfulness Control

arXiv:2606.28737v1 Announce Type: new Abstract: We introduce 5ting, our system for the SemEval2026 Task 8 (MTRAGEval), which evaluates multi-turn Retrieval Augmented Generation (RAG) systems. Multi turn RAG involves context drift, under specification, and hallucination risk. Our…

5
arXiv — NLP / Computation & Language research 3h ago

Majority Vote Silences Minority Values: Annotator Disagreement at the Hate/Offensive Boundary in HateXplain

arXiv:2606.28772v1 Announce Type: new Abstract: Hate speech annotation pipelines routinely collapse annotator disagreement into majority vote labels before training. We show that this aggregation is not neutral: 42.6% of all annotator disagreement in HateXplain concentrates…

28
arXiv — NLP / Computation & Language research 3h ago

Structure-Preserving Document Translation via Multi-Stage LLM Pipeline: A Case Study in Marathi

arXiv:2606.28796v1 Announce Type: new Abstract: Government documents in India are predominantly issued in regional languages such as Marathi, creating substantial accessibility barriers for non-native readers, interstate administrative bodies, and policy analysts. Although…

30
arXiv — NLP / Computation & Language research 3h ago

Labeling Training Data for Entity Matching Using Large Language Models

arXiv:2606.28823v1 Announce Type: new Abstract: Recent large language models (LLMs) achieve strong performance on entity matching without requiring task-specific training data. However, applying these models to large sets of candidate pairs remains slow and costly. In contrast,…

9
arXiv — NLP / Computation & Language research 3h ago

The Heterogeneous Safety Impacts of Benign Multilingual Fine-Tuning

arXiv:2606.28843v1 Announce Type: new Abstract: Fine-tuning a large language model is a ubiquitous method for enhancing its capability on a specific downstream task. However, prior work has shown that this increase in capability comes with a cost: it can increase a model's…

18
arXiv — NLP / Computation & Language research 3h ago

Open but Incompatible: A License Compatibility Analysis of Corpora for Low-Resource African Languages

arXiv:2606.28867v1 Announce Type: new Abstract: Creative Commons licenses dominate African NLP corpus releases, but their compatibility rules are rarely applied. CC-BY-SA and CC-BY-NC cannot be combined in a single published dataset; a NoDerivs clause silently prohibits…

28
arXiv — NLP / Computation & Language research 3h ago

Memory-Managed Long-Context Attention: A Preliminary Study of Editable Request-Local Memory

arXiv:2606.28876v1 Announce Type: new Abstract: Long-context language models often conflate two different goals: compressing history into an efficient state, and maintaining reliable long-term memory. Linear, recurrent, and sparse attention reduce the cost of processing long…

14
arXiv — NLP / Computation & Language research 3h ago

PASTA: A Paraphrasing And Self-Training Approach for Knowledge Updating in LLMs

arXiv:2606.28898v1 Announce Type: new Abstract: Knowledge updating in pre-trained Large Language Models (LLMs) remains an important challenge. While continual training provides a potential avenue for knowledge updating, it continues to present substantial technical difficulties.…

20
arXiv — NLP / Computation & Language research 3h ago

Latent Bridges for Multi-Table Question Answering

arXiv:2606.28916v1 Announce Type: new Abstract: We introduce GRAB, a constructor-encoder-bridge pipeline for table question answering. Our method lifts relational data into an heterogeneous graph, encodes it via message passing, and transfers the signals to an LLM through a…

16
arXiv — NLP / Computation & Language research 3h ago

FinInvest-GTCN: Explainable Graph-Temporal-Causal Modeling for Risk-Aware Investment Decision Optimization

arXiv:2606.28933v1 Announce Type: new Abstract: Venture capital (VC) investment decisions face distinct challenges, such as multi-source heterogeneous data, non-stationary time series, and the demand for explainable predictions in high-stakes, low-data settings. To overcome…

16
arXiv — NLP / Computation & Language research 3h ago

EVLA: An Electro-Aware Multimodal Assistant for Physically-Grounded Driving Reasoning and Control

arXiv:2606.28938v1 Announce Type: new Abstract: Modern vision-language models (VLMs) for driving assistants typically treat vehicle dynamics as a black box, resulting in decisions that lack awareness of the vehicle's real-time electro-mechanical state. To bridge this gap, we…

26
arXiv — NLP / Computation & Language research 3h ago

A3M: Adaptive, Adversarial and Multi-Objective Learning for Strategic Bidding in Repeated Auctions

arXiv:2606.28943v1 Announce Type: new Abstract: Learning to bid in repeated multi-unit auctions with bandit feedback poses a fundamental challenge. Existing methods often rely on rigid explore-then-exploit schedules, assume stationary adversaries, and optimize solely for bidder…

9
arXiv — NLP / Computation & Language research 3h ago

Beyond the Mean: Three-Axis Fidelity for Aligning LLM-Based Survey Simulators from Small Pilot Data

arXiv:2606.28963v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to simulate social survey responses, yet their outputs exhibit systematic biases: marginal distributions are skewed, response variance is poorly calibrated, and predictor-outcome…

20
arXiv — NLP / Computation & Language research 3h ago

Can LLMs Hire Fairly? Racial Bias in Resume Screening

arXiv:2606.28978v1 Announce Type: new Abstract: We audit fourteen mainstream large language models (LLMs) for hiring discrimination using the paired-resume methodology of Kline, Rose, and Walters (2022). The sole 2023-vintage model reproduces the pro-White callback gap…

23
arXiv — NLP / Computation & Language research 3h ago

Fine-Tuning General-Purpose Large Language Models for Agricultural Applications:A Reproducible Framework and Evaluation Protocol Based on Qwen3-8B

arXiv:2606.28992v1 Announce Type: new Abstract: General-purpose large language models (LLMs) have demonstrated strong abilities in opendomain question answering, information extraction, and text generation. Agricultural applications, however, are domain-specific,…

20
arXiv — NLP / Computation & Language research 3h ago

BERTomelo: Your Portuguese Encoder Best Friend

arXiv:2606.28999v1 Announce Type: new Abstract: Encoders have become the state of the art for multiple NLP tasks, especially those requiring deep contextual understanding. While multilingual models offer broad coverage, dedicated monolingual encoders are essential for capturing…

16
arXiv — NLP / Computation & Language research 3h ago

Conversational Domain Adaptation of IndicTrans2 across 21 Indic Languages via Experience Replay and Model Soups

arXiv:2606.29024v1 Announce Type: new Abstract: IndicTrans2 is the strongest open English to Indic translation system, but like most systems it is trained on general text and tends to sound stiff on casual, conversational input. We adapt IndicTrans2-1B to conversational register…

31
arXiv — NLP / Computation & Language research 3h ago

How to Leverage Synthetic Speech for LLM-Based ASR Systems?

arXiv:2606.29031v1 Announce Type: new Abstract: In regulated domains such as banking and healthcare, where privacy constraints make real speech costly to collect and retain, synthetic speech from modern text-to-speech (TTS) is an appealing alternative for training automatic…

15
arXiv — NLP / Computation & Language research 3h ago

The strength of clinical evidence is recoverable from language model representations but not from their stated grades

arXiv:2606.29034v1 Announce Type: new Abstract: Large language models (LLMs) increasingly summarize clinical evidence, where a claim's weight depends on how strongly it is supported. Yet these models convey confidence poorly, and properties they never state, such as truth, are…

17
arXiv — NLP / Computation & Language research 3h ago

Masked Diffusion Decoding as $x$-Prediction Flow

arXiv:2606.29066v1 Announce Type: new Abstract: Masked diffusion language models (MDLMs) generate text by iteratively unmasking tokens, but their standard decoder reduces each step to a binary action: a position is either committed to a single token or left fully masked, with no…

22
arXiv — NLP / Computation & Language research 3h ago

ThinkProbe: Beyond Accuracy -- Structural Profiling of Open-Ended LLM Reasoning Traces via Non-Generative Thought Graphs

arXiv:2606.29067v1 Announce Type: new Abstract: We present ThinkProbe, a framework for structural analysis of LLM reasoning traces. ThinkProbe converts each trace into a Thought Graph a directed graph with cycles, 8 node types, and 6 edge types and derives a 19-metric…

32
arXiv — NLP / Computation & Language research 3h ago

A Comparative Study on Affective Cues in Text Embeddings Across Psychological Emotion Theories

arXiv:2606.29068v1 Announce Type: new Abstract: Text encoders are known for their utility in natural language processing, as they are able to efficiently compress inputs into dense vectors while preserving semantics. These models have been applied to affective computing, in…

19
arXiv — NLP / Computation & Language research 3h ago

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

arXiv:2606.29082v1 Announce Type: new Abstract: Would experience designing faster GPU kernels also help close in on a long-standing open mathematical conjecture? Large Language Models (LLMs) integrated into evolutionary search have recently produced state-of-the-art solutions on…

4
arXiv — NLP / Computation & Language research 3h ago

AB-RAG: Adaptive Budgeted Retrieval-Augmented Generation for Reliable Question Answering

arXiv:2606.29090v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) has become the standard way to ground large language models in external knowledge, yet most systems retrieve a fixed number of passages for every question regardless of its difficulty. This…

11
arXiv — NLP / Computation & Language research 3h ago

Knowing in Advance When an Evolutionary Outer Loop Will Not Help: A Pre-Registered Cheap-Baseline Screening Rule

arXiv:2606.29119v1 Announce Type: new Abstract: We introduce a pre-registered screening rule that decides, before any implementation, whether an evolutionary / population / lifecycle outer loop over neural-network parameters or structure is worth building. Such outer loops cost…

14
arXiv — NLP / Computation & Language research 3h ago

How Anthropomorphic Language Impacts Public Perceptions of AI

arXiv:2606.29121v1 Announce Type: new Abstract: Public discourse about artificial intelligence (AI) often uses anthropomorphic language: language that attributes human capabilities and characteristics to the system. This practice has been criticized for setting misleading…

10
arXiv — NLP / Computation & Language research 3h ago

DistilledGemma: Balanced Efficiency-Accuracy for Person-Place Relation Extraction from Multilingual Historical Articles

arXiv:2606.29130v1 Announce Type: new Abstract: We present DistilledGemma, an efficient and accurate system for the HIPE-2026 shared task on person-place relation extraction from multilingual historical newspaper articles in English, German, and French. Our approach adopts a…

13
arXiv — NLP / Computation & Language research 3h ago

Can OCR-VLMs Read Devanagari? A Stress-Test Benchmark and Post-Correction Study

arXiv:2606.29213v1 Announce Type: new Abstract: OCR systems, ranging from classical engines to specialised OCR vision-language models (OCR-VLMs) and frontier multimodal LLMs, report strong results on English and Chinese document benchmarks, yet their behaviour on Indic scripts…

30
arXiv — NLP / Computation & Language research 3h ago

Understanding Evaluation Illusion in Diffusion Large Language Models

arXiv:2606.29228v1 Announce Type: new Abstract: Despite the capability of parallel decoding, diffusion large language models (dLLMs) require many denoising steps to maintain generation quality, motivating recent research on efficient decoding strategies. However, existing…

23
arXiv — NLP / Computation & Language research 3h ago

Travel-Oriented Reasoning Large Language Model via Domain-Specific Knowledge Graphs

arXiv:2606.29254v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate broad reasoning abilities but struggle with accuracy and reliability in specialized domains such as travel, where reasoning depends on precise definitions, rules, and expert-defined…

12
arXiv — NLP / Computation & Language research 3h ago

MIThinker: A Plug-and-Play Policy-Optimized Thinker For Motivational Interviewing Counseling

arXiv:2606.29265v1 Announce Type: new Abstract: Reasoning large language models (LLMs) have recently made much progress in complex problem-solving, leveraging internal reasoning (or thought) to guide their solution generation. However, existing LLM-based counseling agents,…

17
arXiv — NLP / Computation & Language research 3h ago

A Hybrid Framework for Song Lyric Annotation Based on Human-LLM Alignment

arXiv:2606.29273v1 Announce Type: new Abstract: Emotion recognition of song lyrics is a challenging task since lyrics may not necessarily align with the overall emotion of a song. As a result, lyrics annotation remains largely underexplored. Drawing inspiration from research in…

34
arXiv — NLP / Computation & Language research 3h ago

TriageRA-CCF: Source-Side Clinical Confidence and Coverage Signals for Adaptive Rank Budgeting in Medical LLMs

arXiv:2606.29375v1 Announce Type: new Abstract: Medical large language models are commonly adapted with a fixed low-rank budget, even though medical questions differ substantially in confidence, clinical coverage, and cross-domain difficulty. We study adaptive rank budgeting for…

15
arXiv — NLP / Computation & Language research 3h ago

Cross-Temporal Sinhala OCR: Page-Level Adaptation and Diachronic Analysis

arXiv:2606.29378v1 Announce Type: new Abstract: Sinhala is a morphologically rich abugida spoken by roughly 16 million people in Sri Lanka, and to date, there are no publicly available real-world datasets for page-level Sinhala OCR. All previous studies for assessing Sinhala OCR…

19
arXiv — NLP / Computation & Language research 3h ago

LC-ICL: Label-Guided Contrastive In-Context Learning for Robust Information Extraction

arXiv:2606.29407v1 Announce Type: new Abstract: There has been increasing interest in exploring the capabilities of advanced large language models (LLMs) in the field of information extraction (IE), specifically focusing on tasks related to named entity recognition (NER) and…

28
arXiv — NLP / Computation & Language research 3h ago

EntroRouter: Learning Efficient Model Routing via Entropy Regulation

arXiv:2606.29424v1 Announce Type: new Abstract: Model routing balances solution accuracy and computational cost by selecting among models of varying capabilities. While recent multi-round frameworks interleave reasoning and planning, we identify a structural failure mode termed…

28
arXiv — NLP / Computation & Language research 3h ago

mamabench and mamaretrieval: Benchmarks for Evaluating Medical Retrieval-Augmented Generation in Maternal, Neonatal, and Reproductive Health

arXiv:2606.29467v1 Announce Type: new Abstract: Medical question-answering benchmarks rarely cover the maternal, neonatal, child, and reproductive-health questions a nurse-midwife asks, and, to our knowledge, no public chunk-level relevance benchmark exists for maternal-health…

25

Generating in the Limit with Infinitely Many Hallucinations

Extracting Knowledge from an Arabic-English Machine-Readable Dictionary Using Information Extraction

Developmental Trajectories of Situation Modeling and Mentalizing in Transformer Language Models

A French OSCE Dialogue Dataset and Controllable Virtual Patient System for Clinical Training

Legal Domain Adaptation of Modern BERT Models

Turn-Averaged SAEs for Feature Discovery and Long-Context Attribution

Depth-Staggered Fibonacci Spacing for Sparse Attention: Static Schedules Beat Learned Dilation and Extrapolate Where Dense Attention Fails

SEAD: Competence-Aware On-Policy Distillation via Entropy-Guided Supervision

Correct codes for the wrong reasons? validating LLMs as measurement instruments for theoretical constructs

Phonological Perception of Sign Language Models

AnTenA: Actionable and Explainable Tensor Analysis System with Large Language Models

SEATauBench: Adapting Tool-Agent-User Evaluation Into Low-Resource Southeast Asian Languages

DriftGuard: Safety-Aware Multi-Monitor Detection and Selective Adaptation for Evolving Toxicity Moderation

5ting at SemEval-2026 Task 8: Strong End-to-End Multi-Turn RAG via LLM-Based Reranking and Faithfulness Control

Majority Vote Silences Minority Values: Annotator Disagreement at the Hate/Offensive Boundary in HateXplain

Structure-Preserving Document Translation via Multi-Stage LLM Pipeline: A Case Study in Marathi

Labeling Training Data for Entity Matching Using Large Language Models

The Heterogeneous Safety Impacts of Benign Multilingual Fine-Tuning

Open but Incompatible: A License Compatibility Analysis of Corpora for Low-Resource African Languages

Memory-Managed Long-Context Attention: A Preliminary Study of Editable Request-Local Memory

PASTA: A Paraphrasing And Self-Training Approach for Knowledge Updating in LLMs

Latent Bridges for Multi-Table Question Answering

FinInvest-GTCN: Explainable Graph-Temporal-Causal Modeling for Risk-Aware Investment Decision Optimization

EVLA: An Electro-Aware Multimodal Assistant for Physically-Grounded Driving Reasoning and Control

A3M: Adaptive, Adversarial and Multi-Objective Learning for Strategic Bidding in Repeated Auctions

Beyond the Mean: Three-Axis Fidelity for Aligning LLM-Based Survey Simulators from Small Pilot Data

Can LLMs Hire Fairly? Racial Bias in Resume Screening

Fine-Tuning General-Purpose Large Language Models for Agricultural Applications:A Reproducible Framework and Evaluation Protocol Based on Qwen3-8B

BERTomelo: Your Portuguese Encoder Best Friend

Conversational Domain Adaptation of IndicTrans2 across 21 Indic Languages via Experience Replay and Model Soups

How to Leverage Synthetic Speech for LLM-Based ASR Systems?

The strength of clinical evidence is recoverable from language model representations but not from their stated grades

Masked Diffusion Decoding as $x$-Prediction Flow

ThinkProbe: Beyond Accuracy -- Structural Profiling of Open-Ended LLM Reasoning Traces via Non-Generative Thought Graphs

A Comparative Study on Affective Cues in Text Embeddings Across Psychological Emotion Theories

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

AB-RAG: Adaptive Budgeted Retrieval-Augmented Generation for Reliable Question Answering

Knowing in Advance When an Evolutionary Outer Loop Will Not Help: A Pre-Registered Cheap-Baseline Screening Rule

How Anthropomorphic Language Impacts Public Perceptions of AI

DistilledGemma: Balanced Efficiency-Accuracy for Person-Place Relation Extraction from Multilingual Historical Articles

Can OCR-VLMs Read Devanagari? A Stress-Test Benchmark and Post-Correction Study

Understanding Evaluation Illusion in Diffusion Large Language Models

Travel-Oriented Reasoning Large Language Model via Domain-Specific Knowledge Graphs

MIThinker: A Plug-and-Play Policy-Optimized Thinker For Motivational Interviewing Counseling

A Hybrid Framework for Song Lyric Annotation Based on Human-LLM Alignment

TriageRA-CCF: Source-Side Clinical Confidence and Coverage Signals for Adaptive Rank Budgeting in Medical LLMs

Cross-Temporal Sinhala OCR: Page-Level Adaptation and Diachronic Analysis

LC-ICL: Label-Guided Contrastive In-Context Learning for Robust Information Extraction

EntroRouter: Learning Efficient Model Routing via Entropy Regulation

mamabench and mamaretrieval: Benchmarks for Evaluating Medical Retrieval-Augmented Generation in Maternal, Neonatal, and Reproductive Health