News / #paper Tag Research papers 500 articles archived under #paper · RSS Sign in to follow arXiv — NLP / Computation & Language research 1d ago ReFreeKV: Towards Threshold-Free KV Cache Compression arXiv:2502.16886v4 Announce Type: replace Abstract: To reduce memory consumption during LLM inference, a handful of methods have been proposed for KV cache pruning. While these techniques can accomplish lossless memory reduction on many datasets, they often hinge on an… 28 arXiv — NLP / Computation & Language research 1d ago On the Effect of Uncertainty on Layer-wise Inference Dynamics arXiv:2507.06722v2 Announce Type: replace Abstract: Understanding how large language models (LLMs) internally represent and process their predictions is central to detecting uncertainty and preventing hallucinations. While several studies have shown that models encode… 33 arXiv — NLP / Computation & Language research 1d ago Training-free Truthfulness Detection via Sparse MLP Value Vectors arXiv:2509.17932v2 Announce Type: replace Abstract: Large language models (LLMs) are prone to generating factually incorrect content, motivating methods for assessing truthfulness from internal model signals. While supervised probing approaches can be effective, they require… 5 arXiv — NLP / Computation & Language research 1d ago Check Yourself Before You Wreck Yourself: Selectively Quitting Improves LLM Agent Safety arXiv:2510.16492v4 Announce Type: replace Abstract: As Large Language Model (LLM) agents increasingly operate in complex environments with real-world consequences, their safety becomes critical. While uncertainty quantification is well-studied for single-turn tasks, multi-turn… 20 arXiv — NLP / Computation & Language research 1d ago Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification arXiv:2511.03217v2 Announce Type: replace Abstract: Large language models (LLMs) excel in generating fluent utterances but can lack reliable grounding in verified information. At the same time, knowledge-graph-based fact-checkers deliver precise and interpretable evidence, yet… 4 arXiv — NLP / Computation & Language research 1d ago Safe Language Generation in the Limit arXiv:2601.08648v2 Announce Type: replace Abstract: Recent results in learning a language in the limit have shown that, although language identification is impossible, language generation is tractable. As this foundational area expands, we need to consider the implications of… 5 arXiv — NLP / Computation & Language research 1d ago Learning to Evict from Key-Value Cache arXiv:2602.10238v2 Announce Type: replace Abstract: The growing size of Large Language Models (LLMs) makes efficient inference challenging, primarily due to the memory demands of the autoregressive Key-Value (KV) cache. Existing eviction or compression methods reduce cost but… 25 arXiv — NLP / Computation & Language research 1d ago Measuring the Redundancy of Decoder Layers in SpeechLLMs arXiv:2603.05121v2 Announce Type: replace Abstract: Speech Large Language Models route speech encoder representations into an LLM decoder that typically accounts for over 90% of total parameters. We study how much of this decoder capacity is actually needed for speech tasks.… 36 arXiv — NLP / Computation & Language research 1d ago LiveClawBench: Benchmarking LLM Agents on Complex, Real-World Assistant Tasks arXiv:2604.13072v2 Announce Type: replace Abstract: OpenClaw-style personal assistants extend LLM agents from isolated tool use to open-ended, stateful, and personalized software environments. Evaluating these assistants is fundamentally a fidelity problem: benchmarks must be… 28 arXiv — NLP / Computation & Language research 1d ago Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining arXiv:2604.17633v2 Announce Type: replace Abstract: Large language models exhibit impressive cross-lingual capabilities. However, prior work analyzes this phenomenon through isolated factors and at sparse points during training, limiting our understanding of how cross-lingual… 29 arXiv — NLP / Computation & Language research 1d ago Subject-level Inference for Realistic Text Anonymization Evaluation arXiv:2604.21211v2 Announce Type: replace Abstract: Current text anonymization evaluation relies on span-based metrics that fail to capture what an adversary could actually infer, and assumes a single data subject, ignoring multi-subject scenarios. To address these limitations,… 6 arXiv — NLP / Computation & Language research 1d ago Characterizing the Expressivity of Local Attention in Transformers arXiv:2605.00768v3 Announce Type: replace Abstract: The transformer is the most popular neural architecture for language modeling. The cornerstone of the transformer is its global attention mechanism, which lets the model aggregate information from all preceding tokens before… 16 arXiv — NLP / Computation & Language research 1d ago ELF: Embedded Language Flows arXiv:2605.10938v2 Announce Type: replace Abstract: Diffusion and flow-based models have become the de facto approaches for generating continuous data, e.g., in domains such as images and videos. Their success has attracted growing interest in applying them to language modeling.… 22 arXiv — NLP / Computation & Language research 1d ago Machine Learning for Coding Retail Product Names to Consumer-Price Categories: A Rule-plus-Bag-of-Words Pipeline with Reliability-Weighted Human-in-the-Loop Labeling arXiv:2606.02004v2 Announce Type: replace Abstract: Consumer-price measurement increasingly draws on alternative data sources -- scanner, web-scraped, and transaction/receipt data -- whose product descriptions are short, noisy, and carry no standard product code, so each item… 4 arXiv — NLP / Computation & Language research 1d ago Self-Stigma Is Not a Monolith, but Generic Empathy Is: Persona-Conditioned LLM Support for People Who Use Drugs arXiv:2606.23387v2 Announce Type: replace Abstract: Self-stigma predicts treatment avoidance and disengagement among people who use drugs (PWUD), yet conversational systems aiming to provide support typically treat self-stigma expression as a uniform signal. We present a… 9 arXiv — NLP / Computation & Language research 1d ago SIGNER: Temporally Grounded Sign Language Generation via Time-Resolved Conditioning arXiv:2506.07460v2 Announce Type: replace-cross Abstract: Sign language generation (SLG), also known as text-to-sign generation, aims to bridge the communication gap between signers and non-signers. Unlike many other generative tasks, SLG must satisfy two fundamental linguistic… 16 arXiv — NLP / Computation & Language research 1d ago PRISON: Unmasking the Criminal Potential of Large Language Models arXiv:2506.16150v4 Announce Type: replace-cross Abstract: As large language models (LLMs) advance, concerns about their misconduct in complex social contexts intensify. Existing research overlooked the systematic understanding and assessment of their criminal capability in… 37 arXiv — NLP / Computation & Language research 1d ago Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting arXiv:2510.18874v3 Announce Type: replace-cross Abstract: Adapting language models (LMs) to new tasks via post-training carries the risk of degrading existing capabilities -- a phenomenon classically known as catastrophic forgetting. In this paper, toward identifying guidelines… 38 arXiv — NLP / Computation & Language research 1d ago Psychometric Comparability of LLM-Based Digital Twins arXiv:2601.14264v2 Announce Type: replace-cross Abstract: Large language models (LLMs) act as digital twins for human respondents, yet their psychometric comparability remains uncertain. We propose a construct validity framework spanning construct representation and the… 23 arXiv — NLP / Computation & Language research 1d ago EXPLORE-Bench: Egocentric Scene Prediction with Long-Horizon Reasoning arXiv:2603.09731v3 Announce Type: replace-cross Abstract: Multimodal large language models (MLLMs) are increasingly considered as a foundation for embodied agents, yet it remains unclear whether they can reliably reason about the long-term physical consequences of actions from… 34 arXiv — NLP / Computation & Language research 1d ago RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory arXiv:2605.06675v2 Announce Type: replace-cross Abstract: Large language models cache all previously computed key-value (KV) pairs during generation, and this KV cache grows linearly with sequence length, making it a primary memory bottleneck for serving. Quantizing the KV cache… 5 arXiv — NLP / Computation & Language research 1d ago Auto-Configuring Scientific Simulators with Lightweight Coding-Agent Adapters arXiv:2606.09774v2 Announce Type: replace-cross Abstract: Configuring an advanced scientific simulator, translating a modeling goal into a valid, runnable input deck, is a persistent bottleneck that costs domain scientists hours to days. Input decks are executable interfaces:… 33 arXiv — NLP / Computation & Language research 1d ago Multimodal Evaluator Preference Collapse: Cross-Modal Coupling in Self-Evolving Agents arXiv:2606.16682v3 Announce Type: replace-cross Abstract: When AI agents use language models to evaluate their own outputs in a feedback loop, systematic biases emerge. We show that Evaluator Preference Collapse (EPC) is dramatically amplified in multimodal settings. Using… 4 arXiv — NLP / Computation & Language research 1d ago SingGuard: A Policy-Adaptive Multimodal LLM Guardrail with Dynamic Reasoning arXiv:2606.22873v3 Announce Type: replace-cross Abstract: Vision-language models (VLMs) are increasingly deployed in consumer, medical, financial, and enterprise applications. This broad deployment expands the safety surface: risks can arise from multimodal question answering,… 31 arXiv — Machine Learning research 4d ago Physics-guided Convolutional Neural Network for Domain Growth Prediction in Systems with Conserved Kinetics arXiv:2606.26128v1 Announce Type: new Abstract: The spatiotemporal evolution of many physical, chemical, and biological systems is described by nonlinear partial differential equations (PDEs). Recently, deep neural network-based surrogate models have gained increasing interest… 16 arXiv — Machine Learning research 4d ago \chisao{}: A GPU-Native Parallel Optimizer for Multimodal Black-Box Functions via Convergence-Anticonvergence Oscillation arXiv:2606.26164v1 Announce Type: new Abstract: Finding all modes of a multimodal black-box function is a fundamental challenge in optimization, Bayesian inference, and scientific computing. Existing approaches -- basin-hopping, CMA-ES, multistart gradient descent -- operate… 26 arXiv — Machine Learning research 4d ago Implementation of reinforcement learning in chemical reaction networks: application to phototaxis as curiosity-driven exploration arXiv:2606.26168v1 Announce Type: new Abstract: Living systems navigate environments using noisy and incomplete sensory signals. In unicellular algae, phototaxis is often modeled as a mechanistic run--tumble process driven by stimulus--response rules. However, such descriptions… 30 arXiv — Machine Learning research 4d ago Neural Architecture Search for Generative Adversarial Networks: A Comprehensive Review and Critical Analysis arXiv:2606.26169v1 Announce Type: new Abstract: Neural Architecture Search (NAS) has emerged as a pivotal technique in optimizing the design of Generative Adversarial Networks (GANs), automating the search for effective architectures while addressing the challenges inherent in… 15 arXiv — Machine Learning research 4d ago KG-TRACE: A Neuro-Symbolic Framework for Mechanistic Grounding in Antimicrobial Resistance Prediction arXiv:2606.26179v1 Announce Type: new Abstract: While WGS-based AMR prediction has reached high accuracy, existing models lack a mechanism to ground neural attributions in established biological pathways. We present KG-TRACE, a novel neuro-symbolic framework that integrates the… 36 arXiv — Machine Learning research 4d ago Necessary but Not Sufficient: Temperature Control and Reproducibility in LLM-as-Judge Safety Evaluations arXiv:2606.26185v1 Announce Type: new Abstract: LLM-as-judge ("grader") components are now standard in evaluation harnesses, including safety evaluations where a pass/fail verdict may gate downstream deployment decisions. A widespread assumption is that setting the grader's… 4 arXiv — Machine Learning research 4d ago Clue-Guided Money Laundering Group Discovery arXiv:2606.26189v1 Announce Type: new Abstract: Money Laundering Group Discovery (MLGD) aims to identify hidden criminal groups and recover their complete structures in large-scale financial networks. Existing graph anomaly detection methods mainly produce node-level risk… 17 arXiv — Machine Learning research 4d ago Federated Hash Projected Latent Factor Learning arXiv:2606.26192v1 Announce Type: new Abstract: Hash Learning (HL) is an efficient representation learning approach that maps real-valued data into compact binary representations. Traditional HL methods typically require users to upload personal data to a central server, which… 13 arXiv — Machine Learning research 4d ago Statistical and Structural Approaches to Algorithmic Fairness arXiv:2606.26200v1 Announce Type: new Abstract: Modern machine learning systems have outgrown their origins as isolated predictive constructs, evolving into complex socio-technical architectures that actively mediate human opportunity. As algorithms increasingly determine access… 29 arXiv — Machine Learning research 4d ago Topology-Informed Neural Networks for Flood Detection in Optical and Synthetic Aperture Radar Imagery arXiv:2606.26204v1 Announce Type: new Abstract: Floods frequently impact regions around the world. Rapid and accurate flood detection is crucial for emergency response and timely mitigation of human and economic loss. The expanding availability of satellite data and advances in… 16 arXiv — Machine Learning research 4d ago A General Framework for Learning Algebraic Properties from Cayley Graphs using Graph Neural Networks arXiv:2606.26212v1 Announce Type: new Abstract: A Graph Neural Network (GNN) framework for predicting the solvability of finite groups from their Cayley graph representations was introduced in [1]. In the present work, we generalize this approach and develop a… 18 arXiv — Machine Learning research 4d ago Fast LeWorldModel arXiv:2606.26217v1 Announce Type: new Abstract: Joint-Embedding Predictive Architectures (JEPAs), including recent LeWorldModel (LeWM), have become a promising foundation for reconstruction-free visual world models. For visual planning, however, LeWM evaluates candidate action… 32 arXiv — Machine Learning research 4d ago Dataset Usage Inference without Shadow Models or Held-out Data arXiv:2606.26257v1 Announce Type: new Abstract: How much of my data was used to train a machine learning model? Dataset Usage Inference (DUI) aims to answer this by estimating what fraction of a dataset contributed to a model's training. However, existing DUI methods rely on… 27 arXiv — Machine Learning research 4d ago Equivariance and Augmentation for Bayesian Neural Networks arXiv:2606.26273v1 Announce Type: new Abstract: Symmetries are important for many deep learning tasks, ranging from applications in the sciences to medical imaging. However, there is an ongoing debate about whether to impose symmetry constraints on the neural network… 33 arXiv — Machine Learning research 4d ago SSM Adapters via Hankel Reduced-order Modeling: Injection Site Determines Task Suitability in Long-Context Fine-Tuning arXiv:2606.26290v1 Announce Type: new Abstract: While parameter-efficient fine-tuning (PEFT) typically targets attention projectors, its efficacy for tasks requiring sequential state accumulation remains under-explored. We examine if PEFT for such tasks can benefit from state… 18 arXiv — Machine Learning research 4d ago The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators arXiv:2606.26294v1 Announce Type: new Abstract: Self-improving agents are state-of-the-art (SOTA) on agentic coding benchmarks and have recently been extended to general domains. However, their search methods generally assume a stationary evaluation criterion: a fixed verifier,… 25 arXiv — Machine Learning research 4d ago High-Probability PL-SGD with Markovian Noise: Optimal Mixing and Tail Dependence arXiv:2606.26316v1 Announce Type: new Abstract: We study first-order methods for smooth objectives satisfying the Polyak-\L{}ojasiewicz (PL) condition when gradient samples are generated by an exogenous Markov chain. In the light-tailed setting, prior uniform-in-time… 7 arXiv — Machine Learning research 4d ago EVOM: Agentic Meta-Evolution of Actor-Critic Architectures for Reinforcement Learning arXiv:2606.26327v1 Announce Type: new Abstract: In actor-critic reinforcement learning, network architectures are typically manually designed. Automating this design is challenging because each candidate must be trained before evaluation, and the design space is open-ended. To… 29 arXiv — Machine Learning research 4d ago Mesh-RL: Coupled subgrid reinforcement learning arXiv:2606.26333v1 Announce Type: new Abstract: Reinforcement learning in large or sparse-reward environments suffers from slow temporal-difference reward propagation, as value information spreads only locally across the state space. We propose Mesh-RL, a spatial… 27 arXiv — Machine Learning research 4d ago EMA-FS: Accelerating GBDT Training via Gain-Informed Feature Screening arXiv:2606.26337v1 Announce Type: new Abstract: Gradient Boosted Decision Trees (GBDT), exemplified by LightGBM, spend a dominant fraction of training time -- typically 65-70% -- constructing per-feature histograms. Existing approaches such as random feature subsampling… 23 arXiv — Machine Learning research 4d ago Does Aurora Encode Atmospheric Structure? Latent Regime Analysis and Attribution arXiv:2606.26361v1 Announce Type: new Abstract: ML foundation models are able to emulate atmospheric dynamics accurately and efficiently but operate as opaque ``black boxes''. We investigate the internal representations of the Aurora model using spatially pooled PCA and… 35 arXiv — Machine Learning research 4d ago SOLAR: AI-Powered Speed-of-Light Performance Analysis arXiv:2606.26383v1 Announce Type: new Abstract: How fast could a deep-learning model run on target hardware, and how far is today's implementation from that limit? These questions are central to software, hardware, and algorithm optimizations. Speed-of-Light (SOL) analysis… 11 arXiv — Machine Learning research 4d ago At the Edge of Understanding: Sparse Autoencoders Trace The Limits of Transformer Generalization arXiv:2606.26396v1 Announce Type: new Abstract: Pre-trained transformers have demonstrated remarkable generalization abilities, at times extending beyond the scope of their training data. Yet, real-world deployments often face unexpected or adversarial data that diverges from… 34 arXiv — Machine Learning research 4d ago Deterministic Pareto-Optimal Policy Synthesis for Multi-Objective Reinforcement Learning arXiv:2606.26397v1 Announce Type: new Abstract: Real-world decision-making often requires balancing multiple conflicting objectives, a challenge that standard Reinforcement Learning (RL) frequently addresses by aggregating rewards into a single scalar signal. While effective for… 14 arXiv — Machine Learning research 4d ago Beyond Feedforward Networks: Reentry Neural Systems as the Fundamental Basis of Subjecthood and Intrinsic Safety of Next-Generation AGI arXiv:2606.26406v1 Announce Type: new Abstract: We propose a complete architectural blueprint for safe artificial general intelligence based on a closed reentry loop (D I cycle). In contrast to feedforward networks, which are directed acyclic graphs (C=0, S=0) incapable of… 37 arXiv — Machine Learning research 4d ago Otter Weather: Skillful and Computationally Efficient Medium-Range Weather Forecasting arXiv:2606.26421v1 Announce Type: new Abstract: State-of-the-art medium-range AI weather models can outperform traditional Numerical Weather Prediction (NWP) but require massive training budgets. This restricts usage for under-resourced groups and severely limits fast model… 4 Page 8 of 10 · 500 articles ← Newer Older →