Tag

Research papers

500 articles archived under #paper · RSS

arXiv — NLP / Computation & Language research 1d ago

ReFreeKV: Towards Threshold-Free KV Cache Compression

arXiv:2502.16886v4 Announce Type: replace Abstract: To reduce memory consumption during LLM inference, a handful of methods have been proposed for KV cache pruning. While these techniques can accomplish lossless memory reduction on many datasets, they often hinge on an…

28
arXiv — NLP / Computation & Language research 1d ago

On the Effect of Uncertainty on Layer-wise Inference Dynamics

arXiv:2507.06722v2 Announce Type: replace Abstract: Understanding how large language models (LLMs) internally represent and process their predictions is central to detecting uncertainty and preventing hallucinations. While several studies have shown that models encode…

33
arXiv — NLP / Computation & Language research 1d ago

Training-free Truthfulness Detection via Sparse MLP Value Vectors

arXiv:2509.17932v2 Announce Type: replace Abstract: Large language models (LLMs) are prone to generating factually incorrect content, motivating methods for assessing truthfulness from internal model signals. While supervised probing approaches can be effective, they require…

5
arXiv — NLP / Computation & Language research 1d ago

Check Yourself Before You Wreck Yourself: Selectively Quitting Improves LLM Agent Safety

arXiv:2510.16492v4 Announce Type: replace Abstract: As Large Language Model (LLM) agents increasingly operate in complex environments with real-world consequences, their safety becomes critical. While uncertainty quantification is well-studied for single-turn tasks, multi-turn…

20
arXiv — NLP / Computation & Language research 1d ago

Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification

arXiv:2511.03217v2 Announce Type: replace Abstract: Large language models (LLMs) excel in generating fluent utterances but can lack reliable grounding in verified information. At the same time, knowledge-graph-based fact-checkers deliver precise and interpretable evidence, yet…

4
arXiv — NLP / Computation & Language research 1d ago

Safe Language Generation in the Limit

arXiv:2601.08648v2 Announce Type: replace Abstract: Recent results in learning a language in the limit have shown that, although language identification is impossible, language generation is tractable. As this foundational area expands, we need to consider the implications of…

5
arXiv — NLP / Computation & Language research 1d ago

Learning to Evict from Key-Value Cache

arXiv:2602.10238v2 Announce Type: replace Abstract: The growing size of Large Language Models (LLMs) makes efficient inference challenging, primarily due to the memory demands of the autoregressive Key-Value (KV) cache. Existing eviction or compression methods reduce cost but…

25
arXiv — NLP / Computation & Language research 1d ago

Measuring the Redundancy of Decoder Layers in SpeechLLMs

arXiv:2603.05121v2 Announce Type: replace Abstract: Speech Large Language Models route speech encoder representations into an LLM decoder that typically accounts for over 90% of total parameters. We study how much of this decoder capacity is actually needed for speech tasks.…

36
arXiv — NLP / Computation & Language research 1d ago

LiveClawBench: Benchmarking LLM Agents on Complex, Real-World Assistant Tasks

arXiv:2604.13072v2 Announce Type: replace Abstract: OpenClaw-style personal assistants extend LLM agents from isolated tool use to open-ended, stateful, and personalized software environments. Evaluating these assistants is fundamentally a fidelity problem: benchmarks must be…

28
arXiv — NLP / Computation & Language research 1d ago

Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining

arXiv:2604.17633v2 Announce Type: replace Abstract: Large language models exhibit impressive cross-lingual capabilities. However, prior work analyzes this phenomenon through isolated factors and at sparse points during training, limiting our understanding of how cross-lingual…

29
arXiv — NLP / Computation & Language research 1d ago

Subject-level Inference for Realistic Text Anonymization Evaluation

arXiv:2604.21211v2 Announce Type: replace Abstract: Current text anonymization evaluation relies on span-based metrics that fail to capture what an adversary could actually infer, and assumes a single data subject, ignoring multi-subject scenarios. To address these limitations,…

6
arXiv — NLP / Computation & Language research 1d ago

Characterizing the Expressivity of Local Attention in Transformers

arXiv:2605.00768v3 Announce Type: replace Abstract: The transformer is the most popular neural architecture for language modeling. The cornerstone of the transformer is its global attention mechanism, which lets the model aggregate information from all preceding tokens before…

16
arXiv — NLP / Computation & Language research 1d ago

ELF: Embedded Language Flows

arXiv:2605.10938v2 Announce Type: replace Abstract: Diffusion and flow-based models have become the de facto approaches for generating continuous data, e.g., in domains such as images and videos. Their success has attracted growing interest in applying them to language modeling.…

22
arXiv — NLP / Computation & Language research 1d ago

Machine Learning for Coding Retail Product Names to Consumer-Price Categories: A Rule-plus-Bag-of-Words Pipeline with Reliability-Weighted Human-in-the-Loop Labeling

arXiv:2606.02004v2 Announce Type: replace Abstract: Consumer-price measurement increasingly draws on alternative data sources -- scanner, web-scraped, and transaction/receipt data -- whose product descriptions are short, noisy, and carry no standard product code, so each item…

4
arXiv — NLP / Computation & Language research 1d ago

Self-Stigma Is Not a Monolith, but Generic Empathy Is: Persona-Conditioned LLM Support for People Who Use Drugs

arXiv:2606.23387v2 Announce Type: replace Abstract: Self-stigma predicts treatment avoidance and disengagement among people who use drugs (PWUD), yet conversational systems aiming to provide support typically treat self-stigma expression as a uniform signal. We present a…

9
arXiv — NLP / Computation & Language research 1d ago

SIGNER: Temporally Grounded Sign Language Generation via Time-Resolved Conditioning

arXiv:2506.07460v2 Announce Type: replace-cross Abstract: Sign language generation (SLG), also known as text-to-sign generation, aims to bridge the communication gap between signers and non-signers. Unlike many other generative tasks, SLG must satisfy two fundamental linguistic…

16
arXiv — NLP / Computation & Language research 1d ago

PRISON: Unmasking the Criminal Potential of Large Language Models

arXiv:2506.16150v4 Announce Type: replace-cross Abstract: As large language models (LLMs) advance, concerns about their misconduct in complex social contexts intensify. Existing research overlooked the systematic understanding and assessment of their criminal capability in…

37
arXiv — NLP / Computation & Language research 1d ago

Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting

arXiv:2510.18874v3 Announce Type: replace-cross Abstract: Adapting language models (LMs) to new tasks via post-training carries the risk of degrading existing capabilities -- a phenomenon classically known as catastrophic forgetting. In this paper, toward identifying guidelines…

38
arXiv — NLP / Computation & Language research 1d ago

Psychometric Comparability of LLM-Based Digital Twins

arXiv:2601.14264v2 Announce Type: replace-cross Abstract: Large language models (LLMs) act as digital twins for human respondents, yet their psychometric comparability remains uncertain. We propose a construct validity framework spanning construct representation and the…

23
arXiv — NLP / Computation & Language research 1d ago

EXPLORE-Bench: Egocentric Scene Prediction with Long-Horizon Reasoning

arXiv:2603.09731v3 Announce Type: replace-cross Abstract: Multimodal large language models (MLLMs) are increasingly considered as a foundation for embodied agents, yet it remains unclear whether they can reliably reason about the long-term physical consequences of actions from…

34
arXiv — NLP / Computation & Language research 1d ago

RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory

arXiv:2605.06675v2 Announce Type: replace-cross Abstract: Large language models cache all previously computed key-value (KV) pairs during generation, and this KV cache grows linearly with sequence length, making it a primary memory bottleneck for serving. Quantizing the KV cache…

5
arXiv — NLP / Computation & Language research 1d ago

Auto-Configuring Scientific Simulators with Lightweight Coding-Agent Adapters

arXiv:2606.09774v2 Announce Type: replace-cross Abstract: Configuring an advanced scientific simulator, translating a modeling goal into a valid, runnable input deck, is a persistent bottleneck that costs domain scientists hours to days. Input decks are executable interfaces:…

33
arXiv — NLP / Computation & Language research 1d ago

Multimodal Evaluator Preference Collapse: Cross-Modal Coupling in Self-Evolving Agents

arXiv:2606.16682v3 Announce Type: replace-cross Abstract: When AI agents use language models to evaluate their own outputs in a feedback loop, systematic biases emerge. We show that Evaluator Preference Collapse (EPC) is dramatically amplified in multimodal settings. Using…

4
arXiv — NLP / Computation & Language research 1d ago

SingGuard: A Policy-Adaptive Multimodal LLM Guardrail with Dynamic Reasoning

arXiv:2606.22873v3 Announce Type: replace-cross Abstract: Vision-language models (VLMs) are increasingly deployed in consumer, medical, financial, and enterprise applications. This broad deployment expands the safety surface: risks can arise from multimodal question answering,…

31
arXiv — Machine Learning research 4d ago

Physics-guided Convolutional Neural Network for Domain Growth Prediction in Systems with Conserved Kinetics

arXiv:2606.26128v1 Announce Type: new Abstract: The spatiotemporal evolution of many physical, chemical, and biological systems is described by nonlinear partial differential equations (PDEs). Recently, deep neural network-based surrogate models have gained increasing interest…

16
arXiv — Machine Learning research 4d ago

\chisao{}: A GPU-Native Parallel Optimizer for Multimodal Black-Box Functions via Convergence-Anticonvergence Oscillation

arXiv:2606.26164v1 Announce Type: new Abstract: Finding all modes of a multimodal black-box function is a fundamental challenge in optimization, Bayesian inference, and scientific computing. Existing approaches -- basin-hopping, CMA-ES, multistart gradient descent -- operate…

26
arXiv — Machine Learning research 4d ago

Implementation of reinforcement learning in chemical reaction networks: application to phototaxis as curiosity-driven exploration

arXiv:2606.26168v1 Announce Type: new Abstract: Living systems navigate environments using noisy and incomplete sensory signals. In unicellular algae, phototaxis is often modeled as a mechanistic run--tumble process driven by stimulus--response rules. However, such descriptions…

30
arXiv — Machine Learning research 4d ago

Neural Architecture Search for Generative Adversarial Networks: A Comprehensive Review and Critical Analysis

arXiv:2606.26169v1 Announce Type: new Abstract: Neural Architecture Search (NAS) has emerged as a pivotal technique in optimizing the design of Generative Adversarial Networks (GANs), automating the search for effective architectures while addressing the challenges inherent in…

15
arXiv — Machine Learning research 4d ago

KG-TRACE: A Neuro-Symbolic Framework for Mechanistic Grounding in Antimicrobial Resistance Prediction

arXiv:2606.26179v1 Announce Type: new Abstract: While WGS-based AMR prediction has reached high accuracy, existing models lack a mechanism to ground neural attributions in established biological pathways. We present KG-TRACE, a novel neuro-symbolic framework that integrates the…

36
arXiv — Machine Learning research 4d ago

Necessary but Not Sufficient: Temperature Control and Reproducibility in LLM-as-Judge Safety Evaluations

arXiv:2606.26185v1 Announce Type: new Abstract: LLM-as-judge ("grader") components are now standard in evaluation harnesses, including safety evaluations where a pass/fail verdict may gate downstream deployment decisions. A widespread assumption is that setting the grader's…

4
arXiv — Machine Learning research 4d ago

Clue-Guided Money Laundering Group Discovery

arXiv:2606.26189v1 Announce Type: new Abstract: Money Laundering Group Discovery (MLGD) aims to identify hidden criminal groups and recover their complete structures in large-scale financial networks. Existing graph anomaly detection methods mainly produce node-level risk…

17
arXiv — Machine Learning research 4d ago

Federated Hash Projected Latent Factor Learning

arXiv:2606.26192v1 Announce Type: new Abstract: Hash Learning (HL) is an efficient representation learning approach that maps real-valued data into compact binary representations. Traditional HL methods typically require users to upload personal data to a central server, which…

13
arXiv — Machine Learning research 4d ago

Statistical and Structural Approaches to Algorithmic Fairness

arXiv:2606.26200v1 Announce Type: new Abstract: Modern machine learning systems have outgrown their origins as isolated predictive constructs, evolving into complex socio-technical architectures that actively mediate human opportunity. As algorithms increasingly determine access…

29
arXiv — Machine Learning research 4d ago

Topology-Informed Neural Networks for Flood Detection in Optical and Synthetic Aperture Radar Imagery

arXiv:2606.26204v1 Announce Type: new Abstract: Floods frequently impact regions around the world. Rapid and accurate flood detection is crucial for emergency response and timely mitigation of human and economic loss. The expanding availability of satellite data and advances in…

16
arXiv — Machine Learning research 4d ago

A General Framework for Learning Algebraic Properties from Cayley Graphs using Graph Neural Networks

arXiv:2606.26212v1 Announce Type: new Abstract: A Graph Neural Network (GNN) framework for predicting the solvability of finite groups from their Cayley graph representations was introduced in [1]. In the present work, we generalize this approach and develop a…

18
arXiv — Machine Learning research 4d ago

Fast LeWorldModel

arXiv:2606.26217v1 Announce Type: new Abstract: Joint-Embedding Predictive Architectures (JEPAs), including recent LeWorldModel (LeWM), have become a promising foundation for reconstruction-free visual world models. For visual planning, however, LeWM evaluates candidate action…

32
arXiv — Machine Learning research 4d ago

Dataset Usage Inference without Shadow Models or Held-out Data

arXiv:2606.26257v1 Announce Type: new Abstract: How much of my data was used to train a machine learning model? Dataset Usage Inference (DUI) aims to answer this by estimating what fraction of a dataset contributed to a model's training. However, existing DUI methods rely on…

27
arXiv — Machine Learning research 4d ago

Equivariance and Augmentation for Bayesian Neural Networks

arXiv:2606.26273v1 Announce Type: new Abstract: Symmetries are important for many deep learning tasks, ranging from applications in the sciences to medical imaging. However, there is an ongoing debate about whether to impose symmetry constraints on the neural network…

33
arXiv — Machine Learning research 4d ago

SSM Adapters via Hankel Reduced-order Modeling: Injection Site Determines Task Suitability in Long-Context Fine-Tuning

arXiv:2606.26290v1 Announce Type: new Abstract: While parameter-efficient fine-tuning (PEFT) typically targets attention projectors, its efficacy for tasks requiring sequential state accumulation remains under-explored. We examine if PEFT for such tasks can benefit from state…

18
arXiv — Machine Learning research 4d ago

The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators

arXiv:2606.26294v1 Announce Type: new Abstract: Self-improving agents are state-of-the-art (SOTA) on agentic coding benchmarks and have recently been extended to general domains. However, their search methods generally assume a stationary evaluation criterion: a fixed verifier,…

25
arXiv — Machine Learning research 4d ago

High-Probability PL-SGD with Markovian Noise: Optimal Mixing and Tail Dependence

arXiv:2606.26316v1 Announce Type: new Abstract: We study first-order methods for smooth objectives satisfying the Polyak-\L{}ojasiewicz (PL) condition when gradient samples are generated by an exogenous Markov chain. In the light-tailed setting, prior uniform-in-time…

7
arXiv — Machine Learning research 4d ago

EVOM: Agentic Meta-Evolution of Actor-Critic Architectures for Reinforcement Learning

arXiv:2606.26327v1 Announce Type: new Abstract: In actor-critic reinforcement learning, network architectures are typically manually designed. Automating this design is challenging because each candidate must be trained before evaluation, and the design space is open-ended. To…

29
arXiv — Machine Learning research 4d ago

Mesh-RL: Coupled subgrid reinforcement learning

arXiv:2606.26333v1 Announce Type: new Abstract: Reinforcement learning in large or sparse-reward environments suffers from slow temporal-difference reward propagation, as value information spreads only locally across the state space. We propose Mesh-RL, a spatial…

27
arXiv — Machine Learning research 4d ago

EMA-FS: Accelerating GBDT Training via Gain-Informed Feature Screening

arXiv:2606.26337v1 Announce Type: new Abstract: Gradient Boosted Decision Trees (GBDT), exemplified by LightGBM, spend a dominant fraction of training time -- typically 65-70% -- constructing per-feature histograms. Existing approaches such as random feature subsampling…

23
arXiv — Machine Learning research 4d ago

Does Aurora Encode Atmospheric Structure? Latent Regime Analysis and Attribution

arXiv:2606.26361v1 Announce Type: new Abstract: ML foundation models are able to emulate atmospheric dynamics accurately and efficiently but operate as opaque ``black boxes''. We investigate the internal representations of the Aurora model using spatially pooled PCA and…

35
arXiv — Machine Learning research 4d ago

SOLAR: AI-Powered Speed-of-Light Performance Analysis

arXiv:2606.26383v1 Announce Type: new Abstract: How fast could a deep-learning model run on target hardware, and how far is today's implementation from that limit? These questions are central to software, hardware, and algorithm optimizations. Speed-of-Light (SOL) analysis…

11
arXiv — Machine Learning research 4d ago

At the Edge of Understanding: Sparse Autoencoders Trace The Limits of Transformer Generalization

arXiv:2606.26396v1 Announce Type: new Abstract: Pre-trained transformers have demonstrated remarkable generalization abilities, at times extending beyond the scope of their training data. Yet, real-world deployments often face unexpected or adversarial data that diverges from…

34
arXiv — Machine Learning research 4d ago

Deterministic Pareto-Optimal Policy Synthesis for Multi-Objective Reinforcement Learning

arXiv:2606.26397v1 Announce Type: new Abstract: Real-world decision-making often requires balancing multiple conflicting objectives, a challenge that standard Reinforcement Learning (RL) frequently addresses by aggregating rewards into a single scalar signal. While effective for…

14
arXiv — Machine Learning research 4d ago

Beyond Feedforward Networks: Reentry Neural Systems as the Fundamental Basis of Subjecthood and Intrinsic Safety of Next-Generation AGI

arXiv:2606.26406v1 Announce Type: new Abstract: We propose a complete architectural blueprint for safe artificial general intelligence based on a closed reentry loop (D I cycle). In contrast to feedforward networks, which are directed acyclic graphs (C=0, S=0) incapable of…

37
arXiv — Machine Learning research 4d ago

Otter Weather: Skillful and Computationally Efficient Medium-Range Weather Forecasting

arXiv:2606.26421v1 Announce Type: new Abstract: State-of-the-art medium-range AI weather models can outperform traditional Numerical Weather Prediction (NWP) but require massive training budgets. This restricts usage for under-resourced groups and severely limits fast model…

4

ReFreeKV: Towards Threshold-Free KV Cache Compression

On the Effect of Uncertainty on Layer-wise Inference Dynamics

Training-free Truthfulness Detection via Sparse MLP Value Vectors

Check Yourself Before You Wreck Yourself: Selectively Quitting Improves LLM Agent Safety

Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification

Safe Language Generation in the Limit

Learning to Evict from Key-Value Cache

Measuring the Redundancy of Decoder Layers in SpeechLLMs

LiveClawBench: Benchmarking LLM Agents on Complex, Real-World Assistant Tasks

Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining

Subject-level Inference for Realistic Text Anonymization Evaluation

Characterizing the Expressivity of Local Attention in Transformers

ELF: Embedded Language Flows

Machine Learning for Coding Retail Product Names to Consumer-Price Categories: A Rule-plus-Bag-of-Words Pipeline with Reliability-Weighted Human-in-the-Loop Labeling

Self-Stigma Is Not a Monolith, but Generic Empathy Is: Persona-Conditioned LLM Support for People Who Use Drugs

SIGNER: Temporally Grounded Sign Language Generation via Time-Resolved Conditioning

PRISON: Unmasking the Criminal Potential of Large Language Models

Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting

Psychometric Comparability of LLM-Based Digital Twins

EXPLORE-Bench: Egocentric Scene Prediction with Long-Horizon Reasoning

RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory

Auto-Configuring Scientific Simulators with Lightweight Coding-Agent Adapters

Multimodal Evaluator Preference Collapse: Cross-Modal Coupling in Self-Evolving Agents

SingGuard: A Policy-Adaptive Multimodal LLM Guardrail with Dynamic Reasoning

Physics-guided Convolutional Neural Network for Domain Growth Prediction in Systems with Conserved Kinetics

\chisao{}: A GPU-Native Parallel Optimizer for Multimodal Black-Box Functions via Convergence-Anticonvergence Oscillation

Implementation of reinforcement learning in chemical reaction networks: application to phototaxis as curiosity-driven exploration

Neural Architecture Search for Generative Adversarial Networks: A Comprehensive Review and Critical Analysis

KG-TRACE: A Neuro-Symbolic Framework for Mechanistic Grounding in Antimicrobial Resistance Prediction

Necessary but Not Sufficient: Temperature Control and Reproducibility in LLM-as-Judge Safety Evaluations

Clue-Guided Money Laundering Group Discovery

Federated Hash Projected Latent Factor Learning

Statistical and Structural Approaches to Algorithmic Fairness

Topology-Informed Neural Networks for Flood Detection in Optical and Synthetic Aperture Radar Imagery

A General Framework for Learning Algebraic Properties from Cayley Graphs using Graph Neural Networks

Fast LeWorldModel

Dataset Usage Inference without Shadow Models or Held-out Data

Equivariance and Augmentation for Bayesian Neural Networks

SSM Adapters via Hankel Reduced-order Modeling: Injection Site Determines Task Suitability in Long-Context Fine-Tuning

The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators

High-Probability PL-SGD with Markovian Noise: Optimal Mixing and Tail Dependence

EVOM: Agentic Meta-Evolution of Actor-Critic Architectures for Reinforcement Learning

Mesh-RL: Coupled subgrid reinforcement learning

EMA-FS: Accelerating GBDT Training via Gain-Informed Feature Screening

Does Aurora Encode Atmospheric Structure? Latent Regime Analysis and Attribution

SOLAR: AI-Powered Speed-of-Light Performance Analysis

At the Edge of Understanding: Sparse Autoencoders Trace The Limits of Transformer Generalization

Deterministic Pareto-Optimal Policy Synthesis for Multi-Objective Reinforcement Learning

Beyond Feedforward Networks: Reentry Neural Systems as the Fundamental Basis of Subjecthood and Intrinsic Safety of Next-Generation AGI

Otter Weather: Skillful and Computationally Efficient Medium-Range Weather Forecasting