Tag

Research papers

500 articles archived under #paper · RSS

arXiv — NLP / Computation & Language research 1d ago

DysLexLens: A Low-Resource LLM Framework for Analysing Dyslexic Learners Insights from Online Forums

arXiv:2606.27619v1 Announce Type: cross Abstract: Dyslexic learners increasingly use artificial intelligence (AI) tools to support reading, writing, organisation, and study-related tasks. However, their lived experiences with these tools remain largely underexamined. This paper…

23
arXiv — Machine Learning research 1d ago

Physics-Guided Robotic Radiation Source Localization along Arbitrary Measurement Paths in Unstructured Environments

arXiv:2606.27624v1 Announce Type: cross Abstract: Using robots to estimate the location of the radiation source is an effective way to improve efficiency and safety. Existing methods focus on planning the robot's path to achieve precise estimation, typically approaching the…

19
arXiv — NLP / Computation & Language research 1d ago

A Survey of Automated Presentation Coaching: Systems, Methods, and Open Challenges

arXiv:2606.27380v1 Announce Type: new Abstract: Automated coaching for oral presentations sits at the intersection of computer-assisted pronunciation training (CAPT), prosody modeling, and speech synthesis, yet no prior work has systematically surveyed and compared existing…

6
arXiv — NLP / Computation & Language research 1d ago

Causal Connections: Leveraging Multilingual Fine-Tuning for Financial QA@FinCausal 2026

arXiv:2606.27446v1 Announce Type: new Abstract: This paper describes team HSA_CORAL's submission to the FinCausal 2026 shared task on extracting cause-effect relations from financial narratives via extractive question answering in English and Spanish. We compare three modeling…

4
arXiv — NLP / Computation & Language research 1d ago

Developmental approach reveals the statistical learning of Neural Language Models: Transformers generalize from the most abstract statistical patterns

arXiv:2606.27460v1 Announce Type: new Abstract: In this study, we use a developmental approach to investigate the statistical learning and mental representation of neural language models (NLM). A series of Generative Transformer models are trained on a synthetic grammar. The…

4
arXiv — NLP / Computation & Language research 1d ago

The Context-Ready Transformer

arXiv:2606.27538v1 Announce Type: new Abstract: We introduce the context-ready transformer, a new recurrent neural network architecture built from a D-layer transformer block that pre-contextualizes each token before it enters the block. During left-to-right generation, a…

26
arXiv — NLP / Computation & Language research 1d ago

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

arXiv:2606.27595v1 Announce Type: new Abstract: Web-agent benchmarks overwhelmingly measure depth -- pinning one obscure answer behind a chain of constraints -- while breadth, exhaustively enumerating a closed set and filling each item's attributes, is barely evaluated,…

32
arXiv — NLP / Computation & Language research 1d ago

Narrative-UFET: Narrative Generation for Ultra-Fine Entity Typing

arXiv:2606.27598v1 Announce Type: new Abstract: Ultra-fine entity typing (UFET) assigns highly specific types to entity mentions, but current approaches struggle with types in the long tail. We hypothesize that a key limitation is the reliance on sentence-level context, since…

14
arXiv — NLP / Computation & Language research 1d ago

Cross-Platform Chinese Offensive Comment Detection via Dual-Threshold Hard Example Mining

arXiv:2606.27629v1 Announce Type: new Abstract: Cross-platform deployment of offensive comment detection for Chinese social media suffers performance degradation. The paper proposes a dual-threshold hard mining method to address this. First, the clean-Chinese-base RoBERTa is…

16
arXiv — NLP / Computation & Language research 1d ago

Yuvion LLM: An Adversarially-Aware Large Language Model for Content And AI Safety

arXiv:2606.27632v1 Announce Type: new Abstract: As large language models are increasingly deployed in real-world systems, safety failures can still lead to harmful outputs and dangerous misuse. We argue that the essence of safety is adversarial: many failures arise not from…

29
arXiv — NLP / Computation & Language research 1d ago

When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search

arXiv:2606.27669v1 Announce Type: new Abstract: Search agents powered by large language models (LLMs) are increasingly used to solve complex information-seeking tasks, requiring multi-step retrieval and reasoning to fulfill user goals. However, existing benchmarks often assume…

27
arXiv — NLP / Computation & Language research 1d ago

From Signals to Transfer: A Factorised Study of Probe-Based Uncertainty Estimation in Large Language Models

arXiv:2606.27679v1 Announce Type: new Abstract: Probe-based uncertainty estimation (UE) has emerged as a prominent approach to detect hallucinations in Large Language Models (LLMs) by learning uncertainty from internal model signals. Yet, recent methods vary simultaneously…

22
arXiv — NLP / Computation & Language research 1d ago

Mitigating LLM-based p-Hacking by Preregistering for the Next LLM

arXiv:2606.27687v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to generate, classify, and annotate data whose outputs feed downstream hypothesis tests. However, LLM-based research is easy to p-hack: a researcher can tune the prompts, decoding…

32
arXiv — NLP / Computation & Language research 1d ago

Mitigating Position Bias in Transformers via Layer-Specific Positional Embedding Scaling

arXiv:2606.27705v1 Announce Type: new Abstract: Large Language Models (LLMs) still struggle with the ``lost-in-the-middle'' problem, where critical information located in the middle of long-context inputs is often underrepresented or lost. While existing methods attempt to…

4
arXiv — NLP / Computation & Language research 1d ago

Low-Agreeableness Persona Conditioning for Safe LLM Fine-Tuning

arXiv:2606.27709v1 Announce Type: new Abstract: Recent work has shown that fine-tuning large language models (LLMs) for social warmth degrades factual reliability and increases sycophancy. We investigate a related but distinct failure mode: warmth fine-tuning also weakens…

22
arXiv — NLP / Computation & Language research 1d ago

Do Speech Emphasis Models Generalize across Languages and Emotions?

arXiv:2606.27717v1 Announce Type: new Abstract: Prosodic emphasis varies across languages, emotions, and speaking styles, yet existing emphasis detection models are largely trained and evaluated on monolingual neutral read speech. We introduce MMEE (Multilingual Multi-Emotion…

12
arXiv — NLP / Computation & Language research 1d ago

Enhancing Numerical Prediction in LLMs via Smooth MMD Alignment

arXiv:2606.27731v1 Announce Type: new Abstract: Despite their strong general capabilities, large language models (LLMs) often remain unreliable when outputs must be numerically precise. A key reason is the training objective: standard cross-entropy treats numeric tokens as…

31
arXiv — NLP / Computation & Language research 1d ago

KG2Cypher: Data-Centric Pipeline for Building Enterprise Text-to-Cypher Systems

arXiv:2606.27742v1 Announce Type: new Abstract: Enterprise Knowledge Graphs (KGs) are increasingly used for internal search, analytics, and question answering, but building natural-language interfaces for private enterprise graphs remains costly. We present KG2Cypher, a…

14
arXiv — NLP / Computation & Language research 1d ago

Output-Space Allocation Costs for Calibration-Guided LLM Compression: An Empirical Study

arXiv:2606.27785v1 Announce Type: new Abstract: Training-free compression methods for large language models (LLMs) often use calibration data to guide compression decisions. ROCKET, a recent method combining sparse-dictionary factorization with multi-choice knapsack problem…

30
arXiv — NLP / Computation & Language research 1d ago

SHIFT: Gate-Modulated Activation Steering for Knowledge Conflict Mitigation in Retrieval-Augmented Generation

arXiv:2606.27786v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) enhances LLMs by incorporating external knowledge to support response generation. However, conflicts between retrieved context and parametric knowledge have emerged as a critical challenge in…

16
arXiv — NLP / Computation & Language research 1d ago

NLL-Guided Full-Attention Layer Selection for Training-Free Sliding-Window Adaptation

arXiv:2606.27791v1 Announce Type: new Abstract: Hybrid attention models that mix full and sliding-window attention across layers offer a promising approach to efficient long-context inference, but the critical question of \emph{which layers} should retain full attention remains…

19
arXiv — NLP / Computation & Language research 1d ago

Position Bias Correction is Insufficient for One-Pass Attention Sorting

arXiv:2606.27793v1 Announce Type: new Abstract: Long-context language models suffer from position bias, where information in middle positions is underutilized. Attention Sorting addresses this by iteratively reordering documents based on attention patterns, but its multiple…

9
arXiv — NLP / Computation & Language research 1d ago

Learning Complementary Action Modeling from Automotive Maintenance Instructions

arXiv:2606.27808v1 Announce Type: new Abstract: A minute lexical variation can reverse the procedural meaning of an instruction even when the rest of the sentence remains unchanged. In automotive maintenance instructions, this pattern often appears when an action phrase turns an…

8
arXiv — NLP / Computation & Language research 1d ago

A Study of Temporal Fusion Strategies for Named Entity Recognition in Historical Texts

arXiv:2606.27881v1 Announce Type: new Abstract: Temporal variation poses a unique challenge for named entity recognition (NER) in historical texts, where entities drift in surface form and salience across time. While language models (LMs) have made progress in various NLP tasks,…

22
arXiv — NLP / Computation & Language research 1d ago

Triadic Werewolf: A Jester Role for Multi-Hop Theory of Mind in LLMs

arXiv:2606.27909v1 Announce Type: new Abstract: Theory-of-mind evaluations of large language models typically use dyadic social-deduction games, where every observable cue points to a single hidden side, so a model with strong language priors can score well without ever…

15
arXiv — NLP / Computation & Language research 1d ago

VASAE: Naming SAE Dictionary Directions with Vocabulary-Aligned Anchoring

arXiv:2606.27941v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) provide useful decompositions of Transformer residual streams, but their learned features are usually named post hoc rather than directly connected to the Transformer's token vocabulary. We introduce…

35
arXiv — NLP / Computation & Language research 1d ago

An Empirical Analysis of Factual Errors in Human-Written Text and its Application

arXiv:2606.27959v1 Announce Type: new Abstract: Factual Error Detection (FED), which is the task of identifying factually incorrect spans in a given text, has long been recognized as an important research problem. However, with the rapid rise of large language models (LLMs),…

21
arXiv — NLP / Computation & Language research 1d ago

From Black-Box to Clinical Insight: A Multi-Stage Explainable Framework for Speech-Based Cognitive Impairment Detection

arXiv:2606.27973v1 Announce Type: new Abstract: Speech-based cognitive impairment detection offers a noninvasive, accessible alternative to costly biomarker assays, yet transformer-based models remain clinically uninterpretable. We propose a multi-stage explainability framework…

23
arXiv — NLP / Computation & Language research 1d ago

ToxiREX: A Dataset on Toxic REasoning in ConteXt

arXiv:2606.27981v1 Announce Type: new Abstract: We introduce a new, contextual, multilingual dataset called ToxiREX: Toxic REasoning in ConteXt. The dataset consists of threads of Reddit comments and structured characterizations of what the comments imply, following a systematic…

5
arXiv — NLP / Computation & Language research 1d ago

Dialogue to Detection: A Multimodal Hybrid NLP Pipeline for Insurance Fraud Detection

arXiv:2606.28002v1 Announce Type: new Abstract: Insurance fraud imposes substantial financial losses and operational inefficiencies, raising premiums and impacting trust among legitimate policyholders. Early detection at FNOL remains a persistent challenge. Existing approaches…

25
arXiv — NLP / Computation & Language research 1d ago

The Signal-Coverage Matrix: Stratifying Type and Semantic Errors in Statement Autoformalization

arXiv:2606.28013v1 Announce Type: new Abstract: Headline type-correctness (TC\%) of LLM autoformalization has climbed from $\sim$53\% to $\sim$76\% in two years, yet this scalar conceals which errors each method resolves. We propose a signal-coverage matrix that crosses the Lean…

23
arXiv — NLP / Computation & Language research 1d ago

A Tree-of-Thoughts Inspired Hybrid Approach for Legal Case Judgement Summarization using LLMs

arXiv:2606.28044v1 Announce Type: new Abstract: In recent times, Large Language Models (LLMs) are increasingly being used for legal case judgement summarization. Most prior works have tried traditional extractive and abstractive summarization of case judgements. However, hybrid…

34
arXiv — NLP / Computation & Language research 1d ago

Can LLMs Judge Better Than They Generate? Evaluating Task Asymmetry, Mechanistic Interpretability and Transferability for In-Context QA

arXiv:2606.28050v1 Announce Type: new Abstract: LLM-as-a-Judge and self-evaluation pipelines implicitly assume that evaluation is easier than generation. We test this in a controlled in-context QA setting where a context passage is the sole information source and each model…

29
arXiv — NLP / Computation & Language research 1d ago

MultiHashFormer: Hash-based Generative Language Models

arXiv:2606.28057v1 Announce Type: new Abstract: Language models (LMs) represent tokens using embedding matrices that scale linearly with the vocabulary size. To constrain the parameter footprint, prior work proposes hashing many tokens into a single vector within encoder-only…

15
arXiv — NLP / Computation & Language research 1d ago

Mechanism-Driven Monitors for Preemptive Detection of LLM Training Instability

arXiv:2606.28116v1 Announce Type: new Abstract: Frontier large language model training consumes massive accelerator fleets and long wall-clock computation, making stability failures costly when they occur. After a numerical or a hyperparameter fault has already destabilized the…

31
arXiv — NLP / Computation & Language research 1d ago

From Tokens to States: LLMs as a Special Case of World Models and the Continuous Path Beyond

arXiv:2606.28127v1 Announce Type: new Abstract: The AI community has framed the relationship between large language models (LLMs) and world models as a dichotomy: LLMs predict tokens; world models simulate reality. Yann LeCun argues in 2022 that reaching general intelligence…

25
arXiv — NLP / Computation & Language research 1d ago

Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction

arXiv:2606.28186v1 Announce Type: new Abstract: Predicting human item difficulty is central to educational assessment, where reliable estimates support fairness and effective test construction. Existing methods often depend on costly human calibration or item-level textual…

35
arXiv — NLP / Computation & Language research 1d ago

Vision-Default, Prior-Override: Causal Mechanisms of Perception-Knowledge Conflict in Vision-Language Models

arXiv:2606.28273v1 Announce Type: new Abstract: Vision-language models must reconcile visual evidence with memorized world knowledge when the two conflict. How they resolve this conflict shapes the reliability of multimodal systems, yet prior work characterizes it behaviorally…

31
arXiv — NLP / Computation & Language research 1d ago

CalBrief: A Pilot Diagnostic Benchmark for Evidence-Calibrated Scientific Briefing with Large Language Models

arXiv:2606.27383v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used as research assistants, yet it remains unclear whether they can calibrate research takeaways to the strength and scope of the supporting evidence. We study evidence-calibrated…

17
arXiv — NLP / Computation & Language research 1d ago

Cluster, Route, Escalate: Cascaded Framework for Cost-Aware LLM Serving

arXiv:2606.27457v1 Announce Type: cross Abstract: Efficient deployment of large language models (LLMs) in production forces a trade-off between accuracy and cost. Operators often default to a single model that is either expensive for easy queries or insufficient for hard ones.…

20
arXiv — NLP / Computation & Language research 1d ago

DMV-Bench: Diagnosing Long-Horizon Multimodal Agents' Visual Memory with Incidental Cue Injection

arXiv:2606.27499v1 Announce Type: cross Abstract: Research on agent memory has matured rapidly, but almost entirely on the text side: few existing benchmarks ask, in an interactive environment, when an agent genuinely needs to remember what it saw rather than what it could write…

11
arXiv — NLP / Computation & Language research 1d ago

Aloe-Vision: Robust Vision-Language Models for Healthcare

arXiv:2606.27500v1 Announce Type: cross Abstract: Large Vision-Language Models (LVLMs) specialized in healthcare are emerging as a promising research direction due to their potential impact in clinical and biomedical applications. However, progress is constrained by the scarcity…

28
arXiv — NLP / Computation & Language research 1d ago

Joint Transcription and Decryption of Images of Encrypted Handwritten Documents: A Comparison with the Traditional Pipeline

arXiv:2606.27700v1 Announce Type: cross Abstract: Historical encrypted manuscripts present a challenging problem at the intersection of cryptology, linguistics, paleography, and computer vision. Current automatic decipherment approaches usually rely on a two-stage pipeline:…

7
arXiv — NLP / Computation & Language research 1d ago

Verifiable Geometry Problem Solving: Solver-Driven Autoformalization and Theorem Proposing

arXiv:2606.27926v1 Announce Type: cross Abstract: Geometry Problem Solving have increasingly adopt the neuro-symbolic paradigm, combining neural intuition with symbolic rigor. However, current frameworks suffer from severe bottlenecks in two core stages: autoformalization, which…

28
arXiv — NLP / Computation & Language research 1d ago

AI Persuasive Framing in Collective Dilemmas

arXiv:2606.27951v1 Announce Type: cross Abstract: AI agents are promising tools that can act as flexible behavioral nudges to enhance human cooperation in addressing large-scale societal problems. However, evidence on whether AI agents can effectively boost cooperation remains…

32
arXiv — NLP / Computation & Language research 1d ago

DG^VoiC: Speaker Clustering for Fraud Investigation under Real Call-Centre Conditions

arXiv:2606.28048v1 Announce Type: cross Abstract: Insurance fraud remains costly and operationally difficult, particularly in call-centre workflows where many customer interactions begin at FNOL. While recent fraud detection methods mainly rely on structured data, text, or…

19
arXiv — NLP / Computation & Language research 1d ago

Single and Multi Truth Data Fusion using Large Language Models

arXiv:2606.28062v1 Announce Type: cross Abstract: Data fusion, also known as truth discovery, is a data integration problem that aims to determine the correct value or set of values for each attribute of an object when presented with potentially conflicting values from multiple…

27
arXiv — NLP / Computation & Language research 1d ago

Scaling limit of the Random Language Model

arXiv:2606.28105v1 Announce Type: cross Abstract: We develop a quantitative theory of the Random Language Model (RLM), an ensemble of stochastic context-free grammars, in a scaling limit where the number of hidden symbols $N \to \infty$ while the grammar temperature…

10
arXiv — NLP / Computation & Language research 1d ago

HPRO: Hierarchical Progressive Reward Optimization via Preference Extraction for Emotional Text-to-Speech

arXiv:2606.28249v1 Announce Type: cross Abstract: Recently, Large Language Model (LLM)-based Text-to-Speech (TTS) models have achieved remarkable naturalness. However, the standard Supervised Fine-Tuning paradigm often converges to statistically averaged prosody, limiting…

20
arXiv — NLP / Computation & Language research 1d ago

Continual Memorization of Factoids in Language Models

arXiv:2411.07175v3 Announce Type: replace Abstract: As new knowledge rapidly accumulates, language models (LMs) with pretrained knowledge quickly become obsolete. A common approach to updating LMs is fine-tuning them directly on new knowledge. However, recent studies have shown…

27

DysLexLens: A Low-Resource LLM Framework for Analysing Dyslexic Learners Insights from Online Forums

Physics-Guided Robotic Radiation Source Localization along Arbitrary Measurement Paths in Unstructured Environments

A Survey of Automated Presentation Coaching: Systems, Methods, and Open Challenges

Causal Connections: Leveraging Multilingual Fine-Tuning for Financial QA@FinCausal 2026

Developmental approach reveals the statistical learning of Neural Language Models: Transformers generalize from the most abstract statistical patterns

The Context-Ready Transformer

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

Narrative-UFET: Narrative Generation for Ultra-Fine Entity Typing

Cross-Platform Chinese Offensive Comment Detection via Dual-Threshold Hard Example Mining

Yuvion LLM: An Adversarially-Aware Large Language Model for Content And AI Safety

When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search

From Signals to Transfer: A Factorised Study of Probe-Based Uncertainty Estimation in Large Language Models

Mitigating LLM-based p-Hacking by Preregistering for the Next LLM

Mitigating Position Bias in Transformers via Layer-Specific Positional Embedding Scaling

Low-Agreeableness Persona Conditioning for Safe LLM Fine-Tuning

Do Speech Emphasis Models Generalize across Languages and Emotions?

Enhancing Numerical Prediction in LLMs via Smooth MMD Alignment

KG2Cypher: Data-Centric Pipeline for Building Enterprise Text-to-Cypher Systems

Output-Space Allocation Costs for Calibration-Guided LLM Compression: An Empirical Study

SHIFT: Gate-Modulated Activation Steering for Knowledge Conflict Mitigation in Retrieval-Augmented Generation

NLL-Guided Full-Attention Layer Selection for Training-Free Sliding-Window Adaptation

Position Bias Correction is Insufficient for One-Pass Attention Sorting

Learning Complementary Action Modeling from Automotive Maintenance Instructions

A Study of Temporal Fusion Strategies for Named Entity Recognition in Historical Texts

Triadic Werewolf: A Jester Role for Multi-Hop Theory of Mind in LLMs

VASAE: Naming SAE Dictionary Directions with Vocabulary-Aligned Anchoring

An Empirical Analysis of Factual Errors in Human-Written Text and its Application

From Black-Box to Clinical Insight: A Multi-Stage Explainable Framework for Speech-Based Cognitive Impairment Detection

ToxiREX: A Dataset on Toxic REasoning in ConteXt

Dialogue to Detection: A Multimodal Hybrid NLP Pipeline for Insurance Fraud Detection

The Signal-Coverage Matrix: Stratifying Type and Semantic Errors in Statement Autoformalization

A Tree-of-Thoughts Inspired Hybrid Approach for Legal Case Judgement Summarization using LLMs

Can LLMs Judge Better Than They Generate? Evaluating Task Asymmetry, Mechanistic Interpretability and Transferability for In-Context QA

MultiHashFormer: Hash-based Generative Language Models

Mechanism-Driven Monitors for Preemptive Detection of LLM Training Instability

From Tokens to States: LLMs as a Special Case of World Models and the Continuous Path Beyond

Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction

Vision-Default, Prior-Override: Causal Mechanisms of Perception-Knowledge Conflict in Vision-Language Models

CalBrief: A Pilot Diagnostic Benchmark for Evidence-Calibrated Scientific Briefing with Large Language Models

Cluster, Route, Escalate: Cascaded Framework for Cost-Aware LLM Serving

DMV-Bench: Diagnosing Long-Horizon Multimodal Agents' Visual Memory with Incidental Cue Injection

Aloe-Vision: Robust Vision-Language Models for Healthcare

Joint Transcription and Decryption of Images of Encrypted Handwritten Documents: A Comparison with the Traditional Pipeline

Verifiable Geometry Problem Solving: Solver-Driven Autoformalization and Theorem Proposing

AI Persuasive Framing in Collective Dilemmas

DG^VoiC: Speaker Clustering for Fraud Investigation under Real Call-Centre Conditions

Single and Multi Truth Data Fusion using Large Language Models

Scaling limit of the Random Language Model

HPRO: Hierarchical Progressive Reward Optimization via Preference Extraction for Emotional Text-to-Speech

Continual Memorization of Factoids in Language Models