Tag

Research papers

224 articles archived under #paper · RSS

r/MachineLearning community 4h ago

Have the "on-hold" durations been getting longer for arXiv submissions? [D]

I have a paper that has been "on-hold" for about 2 weeks now. I understand that it might take a little longer now because of inundation of AI generated low-effort papers but my papers have gone from "on-hold" to "submitted" within a couple of days in the past. Wondering if…

13
Google DeepMind official-blog 10h ago

GPT-5 paper drops on arXiv — scaling laws revisited

OpenAI researchers released a 47-page preprint examining how scaling laws hold up at trillion-parameter regimes, with new evidence for compute-optimal training.

27 2
r/LocalLLaMA community 10h ago

AIDC-AI/Ovis2.6-80B-A3B · Hugging Face

We introduce Ovis2.6-80B-A3B , the latest advancement in the Ovis series of Multimodal Large Language Models (MLLMs). Building on the strong foundation of Ovis2.5, Ovis2.6 upgrades the LLM backbone to a Mixture-of-Experts (MoE) architecture, delivering superior multimodal…

31
NVIDIA Developer Blog official-blog 12h ago

Google DeepMind paper: reinforcement learning at scale

New work demonstrates RL fine-tuning at unprecedented scale, with concrete benchmarks on reasoning tasks.

14
Hugging Face Daily Papers research 14h ago

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

Abstract Pion is a spectrum-preserving optimizer for large language model training that uses orthogonal equivalence transformations to maintain singular values during weight updates, offering stable performance comparable to standard optimizers. AI-generated summary We introduce…

34
Hacker News — AI on Front Page community 19h ago

Deterministic Fully-Static Whole-Binary Translation Without Heuristics

Article URL: https://arxiv.org/abs/2605.08419 Comments URL: https://news.ycombinator.com/item?id=48117810 Points: 227 # Comments: 53

27
Hugging Face Daily Papers research 19h ago

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

Abstract AlphaGRPO enhances multimodal generation by applying Group Relative Policy Optimization to AR-Diffusion Unified Multimodal Models through self-reflective refinement and decompositional verifiable reward mechanisms. AI-generated summary In this paper, we propose…

26
arXiv — Machine Learning research 19h ago

Interpretable EEG Microstate Discovery via Variational Deep Embedding: A Systematic Architecture Search with Multi-Quadrant Evaluation

arXiv:2605.10947v1 Announce Type: new Abstract: EEG microstate analysis segments continuous brain electrical activity into brief, quasi-stable topographic configurations that reflect discrete functional brain states. Conventional approaches such as Modified K-Means operate…

22
arXiv — Machine Learning research 19h ago

QuIDE: Mastering the Quantized Intelligence Trade-off via Active Optimization

arXiv:2605.10959v1 Announce Type: new Abstract: There is currently no unified metric for evaluating the efficiency of quantized neural networks. We propose QuIDE, built around the Intelligence Index I = (C x P)/log_2(T+1), which collapses the compression-accuracy-latency…

22
arXiv — Machine Learning research 19h ago

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

arXiv:2605.10971v1 Announce Type: new Abstract: Discrete diffusion language models (DLMs) generate text by iteratively denoising all positions in parallel, offering an alternative to autoregressive models. Controlled generation methods for DLMs, imported from autoregressive…

4
arXiv — Machine Learning research 19h ago

Rotation-Preserving Supervised Fine-Tuning

arXiv:2605.10973v1 Announce Type: new Abstract: Supervised fine-tuning (SFT) improves in-domain performance but can degrade out-of-domain (OOD) generalization. Prior work suggests that this degradation is related to changes in dominant singular subspaces of pretrained weight…

22
arXiv — Machine Learning research 19h ago

Vertex-Softmax: Tight Transformer Verification via Exact Softmax Optimization

arXiv:2605.10974v1 Announce Type: new Abstract: Certified verification of transformer attention requires bounding the softmax function over interval constraints on the pre-softmax scores. Existing verifiers relax softmax ndependently of the downstream objective, leaving…

26
arXiv — Machine Learning research 19h ago

Hierarchical Multi-Scale Graph Neural Networks: Scalable Heterophilous Learning with Oversmoothing and Oversquashing Mitigation

arXiv:2605.10975v1 Announce Type: new Abstract: Graphs with heterophily, where adjacent nodes carry different labels, are prevalent in real-world applications, from social networks to molecular interactions. However, existing spectral Graph Neural Network (GNN) approaches…

24
arXiv — Machine Learning research 19h ago

LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection

arXiv:2605.10980v1 Announce Type: new Abstract: Diffusion Language Models (dLLMs) have garnered significant attention for their potential in highly parallel processing. The parallel capabilities of existing dLLMs stem from the assumption of conditional independence at high…

35
arXiv — Machine Learning research 19h ago

$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin

arXiv:2605.10981v1 Announce Type: new Abstract: Reference-free preference optimization has emerged as an efficient alternative to reinforcement learning from human feedback, with Simple Preference Optimization(SimPO) demonstrating strong performance by eliminating the explicit…

23
arXiv — Machine Learning research 19h ago

TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment

arXiv:2605.10983v1 Announce Type: new Abstract: Reinforcement learning (RL) has shown extraordinary potential in aligning diffusion models to downstream tasks, yet most of them still suffer from significant reward hacking, which degrades generative diversity and quality by…

10
arXiv — Machine Learning research 19h ago

Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning

arXiv:2605.10985v1 Announce Type: new Abstract: Protein language models such as ESM-2 learn rich residue representations that achieve strong performance on protein function prediction, but their features remain difficult to interpret as structural $\&$ evolutionary signals are…

17
arXiv — Machine Learning research 19h ago

AESOP: Adversarial Execution-path Selection to Overload Deep Learning Pipelines

arXiv:2605.10987v1 Announce Type: new Abstract: Modern machine learning deployments increasingly compose specialized models into dynamic inference pipelines, where upstream components produce intermediate predictions that determine the workload and inputs of downstream…

21
arXiv — Machine Learning research 19h ago

Seeing the Needle in the Haystack: Towards Weakly-Supervised Log Instance Anomaly Localization via Counterfactual Perturbation

arXiv:2605.10988v1 Announce Type: new Abstract: Log anomaly detection is a critical task for system operations and security assurance. However, in networked systems at scale, log data are generated at massive scale while instance-level annotations are prohibitively expensive,…

29
arXiv — Machine Learning research 19h ago

SURGE: Surrogate Gradient Adaptation in Binary Neural Networks

arXiv:2605.10989v1 Announce Type: new Abstract: The training of Binary Neural Networks (BNNs) is fundamentally based on gradient approximation for non-differentiable binarization operations (e.g., sign function). However, prevailing methods including the Straight-Through…

11
arXiv — Machine Learning research 19h ago

Test-Time Personalization: A Diagnostic Framework and Probabilistic Fix for Scaling Failures

arXiv:2605.10991v1 Announce Type: new Abstract: Existing approaches to LLM personalization focus on constructing better personalized models or inputs, while treating inference as a single-shot process. In this work, we study Test-Time Personalization (TTP) along an unexplored…

11
arXiv — Machine Learning research 19h ago

SkillGen: Verified Inference-Time Agent Skill Synthesis

arXiv:2605.10999v1 Announce Type: new Abstract: Skills are a promising way to improve LLM agent capabilities without retraining, while keeping the added procedure reusable and controllable. However, high-quality skills are still largely written by hand. We introduce SkillGen, a…

33
arXiv — Machine Learning research 19h ago

Finite Volume-Informed Neural Network Framework for 2D Shallow Water Equations: Rugged Loss Landscapes and the Importance of Data Guidance

arXiv:2605.11001v1 Announce Type: new Abstract: Physics-informed neural networks (PINNs) are a simple surrogate-modelling paradigm for partial differential equations, but their standard strong-form residual formulation is ill suited to the shallow water equations (SWE). It…

20
arXiv — Machine Learning research 19h ago

DisagMoE: Computation-Communication overlapped MoE Training via Disaggregated AF-Pipe Parallelism

arXiv:2605.11005v1 Announce Type: new Abstract: Mixture-of-experts (MoE) architectures enable trillion-parameter LLMs with sparsely activated experts. Expert parallelism (EP) is a widely adopted MoE training strategy, but it suffers from severe all-to-all communication…

25
arXiv — Machine Learning research 19h ago

RT-Transformer: The Transformer Block as a Spherical State Estimator

arXiv:2605.11007v1 Announce Type: new Abstract: We show that the core components of the Transformer block -- attention, residual connections, and normalization -- arise naturally from a single geometric estimation problem. Modeling the latent state as a direction on the…

19
arXiv — Machine Learning research 19h ago

When and How to Canonize: A Generalization Perspective

arXiv:2605.11008v1 Announce Type: new Abstract: While invariant architectures are standard for processing symmetric data, there is growing interest in achieving invariance by applying group averaging or canonization to non-invariant backbones. However, the theoretical…

12
arXiv — Machine Learning research 19h ago

ACSAC: Adaptive Chunk Size Actor-Critic with Causal Transformer Q-Network

arXiv:2605.11009v1 Announce Type: new Abstract: Long-horizon, sparse-reward tasks pose a fundamental challenge for reinforcement learning, since single-step TD learning suffers from bootstrapping error accumulation across successive Bellman updates. Actor-critic methods with…

34
arXiv — Machine Learning research 19h ago

A Comparative Study of Federated Learning Aggregation Strategies under Homogeneous and Heterogeneous Data Distributions

arXiv:2605.11010v1 Announce Type: new Abstract: Federated Learning has emerged as a transformative paradigm for collaborative machine learning across distributed environments. However, its performance is strongly influenced by the aggregation strategy used to combine local model…

17
arXiv — Machine Learning research 19h ago

LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models

arXiv:2605.11011v1 Announce Type: new Abstract: Looped computation shows promise in improving the reasoning-oriented performance of LLMs by scaling test-time compute. However, existing approaches typically require either training recurrent models from scratch or applying…

37
arXiv — Machine Learning research 19h ago

Backbone-Equated Diffusion OOD via Sparse Internal Snapshots

arXiv:2605.11014v1 Announce Type: new Abstract: Fair comparison between diffusion-based OOD detectors is challenging, as conclusions can vary with backbone choice, corruption parameterization, and test-time budget. We address this issue through a Mutualized Backbone-Equated…

30
arXiv — Machine Learning research 19h ago

Simpson's Paradox in Behavioral Curves: How Aggregation Distorts Parametric Models of User Dynamics

arXiv:2605.11017v1 Announce Type: new Abstract: Behavioral curve modeling -- fitting parametric functions to engagement-versus-exposure data -- is standard practice in recommendation, advertising, and clinical dosing. We show that aggregation introduces a systematic distortion:…

13
arXiv — Machine Learning research 19h ago

Efficient LLM Reasoning via Variational Posterior Guidance with Efficiency Awareness

arXiv:2605.11019v1 Announce Type: new Abstract: Although large language models rely on chain-of-thought for complex reasoning, the overthinking phenomenon severely degrades inference efficiency. Existing reinforcement learning methods compress reasoning chains by designing…

23
arXiv — Machine Learning research 19h ago

Trust Region Inverse Reinforcement Learning: Explicit Dual Ascent using Local Policy Updates

arXiv:2605.11020v1 Announce Type: new Abstract: Inverse reinforcement learning (IRL) is typically formulated as maximizing entropy subject to matching the distribution of expert trajectories. Classical (dual-ascent) IRL guarantees monotonic performance improvement but requires…

14
arXiv — Machine Learning research 19h ago

A Switching System Theory of Q-Learning with Linear Function Approximation

arXiv:2605.11021v1 Announce Type: new Abstract: This paper develops a switching-system interpretation of Q-learning with linear function approximation (LFA) based on the joint spectral radius (JSR). We derive an exact linear switched model for the mean dynamics and relate…

11
arXiv — Machine Learning research 19h ago

ASD-Bench: A Four-Axis Comprehensive Benchmark of AI Models for Autism Spectrum Disorder

arXiv:2605.11091v1 Announce Type: new Abstract: Automated ASD screening tools remain limited by single-architecture evaluations, axis-restricted assessment, and near-exclusive focus on adult cohorts, obscuring age-specific diagnostic patterns critical for early intervention. We…

4
arXiv — Machine Learning research 19h ago

Enabling Performant and Flexible Model-Internal Observability for LLM Inference

arXiv:2605.11093v1 Announce Type: new Abstract: Today's inference-time workloads increasingly depend on timely access to a model's internal states. We present DMI-Lib, a high-speed deep model inspector that treats internal observability as a first-class systems primitive,…

18
arXiv — Machine Learning research 19h ago

Newton's Lantern: A Reinforcement Learning Framework for Finetuning AC Power Flow Warm Start Models

arXiv:2605.11102v1 Announce Type: new Abstract: Neural warm starts can sharply reduce the number of Newton-Raphson iterations required to solve the AC power flow problem, but existing supervised approaches generalize poorly on heavily loaded instances near voltage collapse. We…

11
arXiv — Machine Learning research 19h ago

GRAFT-ATHENA: Self-Improving Agentic Teams for Autonomous Discovery and Evolutionary Numerical Algorithms

arXiv:2605.11117v1 Announce Type: new Abstract: Scientific discovery can be modeled as a sequence of probabilistic decisions that map physical problems to numerical solutions. Recent agentic AI systems automate individual scientific tasks by orchestrating LLM-driven planners,…

22
arXiv — Machine Learning research 19h ago

Language Modeling with Hyperspherical Flows

arXiv:2605.11125v1 Announce Type: new Abstract: Discrete Diffusion Language Models progressed rapidly as an alternative to autoregressive (AR) models, motivated by their parallel generation abilities. However, for tractability, discrete diffusion models sample from a factorized…

17
arXiv — Machine Learning research 19h ago

HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series

arXiv:2605.11130v1 Announce Type: new Abstract: Critical events in multivariate time series, from turbine failures to cardiac arrhythmias, demand accurate prediction, yet labeled data is scarce because such events are rare and costly to annotate. We introduce HEPA…

16
arXiv — Machine Learning research 19h ago

Steerable Neural ODEs on Homogeneous Spaces

arXiv:2605.11133v1 Announce Type: new Abstract: We introduce steerable neural ordinary differential equations on homogeneous spaces $M=G/H$. These models constitute a novel geometric extension of manifold neural ordinary differential equations (NODEs) that transport associated…

33
arXiv — Machine Learning research 19h ago

Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training

arXiv:2605.11134v1 Announce Type: new Abstract: Preference learning methods such as Direct Preference Optimization (DPO) are known to induce reliance on spurious correlations, leading to sycophancy and length bias in today's language models and potentially severe goal…

13
arXiv — Machine Learning research 19h ago

Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models

arXiv:2605.11142v1 Announce Type: new Abstract: Graph representation learning has become a standard approach for analyzing networked data, with latent embeddings widely used for link prediction, community detection, and related tasks. Yet a basic design choice, the latent…

36
arXiv — Machine Learning research 19h ago

CORE: Cyclic Orthotope Relation Embedding for Knowledge Graph Completion

arXiv:2605.11159v1 Announce Type: new Abstract: Knowledge graph completion (KGC) aims to automatically infer missing facts in multi-relational data by mapping entities and relations into continuous representation spaces. Recent region-based embedding models have shown great…

16
arXiv — Machine Learning research 19h ago

Interpretability Can Be Actionable

arXiv:2605.11161v1 Announce Type: new Abstract: Interpretability aims to explain the behavior of deep neural networks. Despite rapid growth, there is mounting concern that much of this work has not translated into practical impact, raising questions about its relevance and…

37
arXiv — Machine Learning research 19h ago

COSMOS: Model-Agnostic Personalized Federated Learning with Clustered Server Models and Pseudo-Label-Only Communication

arXiv:2605.11165v1 Announce Type: new Abstract: Federated learning (FL) in heterogeneous environments remains challenging because client models often differ in both architecture and data distribution. While recent approaches attempt to address this challenge through client…

36
arXiv — Machine Learning research 19h ago

Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data

arXiv:2605.11170v1 Announce Type: new Abstract: Noise-based certified machine unlearning currently faces a hard ceiling: the noise magnitude required to certify unlearning typically destroys model utility, particularly for large-scale deletion requests. While leveraging public…

12
arXiv — Machine Learning research 19h ago

Optimistic Dual Averaging Unifies Modern Optimizers

arXiv:2605.11172v1 Announce Type: new Abstract: We introduce SODA, a generalization of Optimistic Dual Averaging, which provides a common perspective on state-of-the-art optimizers like Muon, Lion, AdEMAMix and NAdam, showing that they can all be viewed as optimistic instances…

31
arXiv — Machine Learning research 19h ago

Oversmoothing as Representation Degeneracy in Neural Sheaf Diffusion

arXiv:2605.11178v1 Announce Type: new Abstract: Neural Sheaf Diffusion (NSD) generalizes diffusion-based Graph Neural Networks by replacing scalar graph Laplacians with sheaf Laplacians whose learned restriction maps define a task-adapted geometry. While the diffusion limit of…

25
arXiv — Machine Learning research 19h ago

Muon is Not That Special: Random or Inverted Spectra Work Just as Well

arXiv:2605.11181v1 Announce Type: new Abstract: The recent empirical success of the Muon optimizer has renewed interest in non-Euclidean optimization, typically justified by similarities with second-order methods, and linear minimization oracle (LMO) theory. In this paper, we…

8
arXiv — Machine Learning research 19h ago

CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration

arXiv:2605.11186v1 Announce Type: new Abstract: Auto-regressive decoding in Large Language Models (LLMs) is inherently memory-bound: every generation step requires loading the model weights and intermediate results from memory (e.g., High-Bandwidth Memory (HBM) for GPU servers),…

19
arXiv — Machine Learning research 19h ago

Deep Learning for Protein Complex Prediction and Design

arXiv:2605.11189v1 Announce Type: new Abstract: Accurately modeling and designing protein complex structures is a central problem in computational structural biology, with broad implications for understanding cellular function and developing therapeutics. This thesis…

16
arXiv — Machine Learning research 19h ago

Variational Linear Attention: Stable Associative Memory for Long-Context Transformers

arXiv:2605.11196v1 Announce Type: new Abstract: Linear attention reduces the quadratic cost of softmax attention to $\mathcal{O}(T)$, but its memory state grows as $\mathcal{O}(T)$ in Frobenius norm, causing progressive interference between stored associations. We introduce…

13
arXiv — Machine Learning research 19h ago

FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry

arXiv:2605.11203v1 Announce Type: new Abstract: Intermediate feature representations represent the backbone for the expressivity and adaptability of deep neural networks. However, their geometric structure remains poorly understood. In this submission, we provide indirect…

20
arXiv — Machine Learning research 19h ago

The Scaling Law of Evaluation Failure: Why Simple Averaging Collapses Under Data Sparsity and Item Difficulty Gaps, and How Item Response Theory Recovers Ground Truth Across Domains

arXiv:2605.11205v1 Announce Type: new Abstract: Benchmark evaluation across AI and safety-critical domains overwhelmingly relies on simple averaging. We demonstrate that this practice produces substantially misleading rankings when two conditions co-occur: (1) the evaluation…

34
arXiv — Machine Learning research 19h ago

Measuring Five-Nines Reliability: Sample-Efficient LLM Evaluation in Saturated Benchmarks

arXiv:2605.11209v1 Announce Type: new Abstract: While existing benchmarks demonstrate the near-perfect performance of large language models (LLMs) on various tasks, this apparent saturation often obscures the need for rigorous evaluation of their reliability. In real-world…

36
arXiv — Machine Learning research 19h ago

Enforcing Constraints in Generative Sampling via Adaptive Correction Scheduling

arXiv:2605.11214v1 Announce Type: new Abstract: Hard constraints in generative sampling are typically enforced by projection, applied either once at the end of sampling or after every update. This binary framing overlooks a fundamental issue: projection changes the distribution…

17
arXiv — Machine Learning research 19h ago

Leveraging RAG for Training-Free Alignment of LLMs

arXiv:2605.11217v1 Announce Type: new Abstract: Large language model (LLM) alignment algorithms typically consist of post-training over preference pairs. While such algorithms are widely used to enable safety guardrails and align LLMs with general human preferences, we show that…

36
arXiv — Machine Learning research 19h ago

ADMM-Q: An Improved Hessian-based Weight Quantizer for Post-Training Quantization of Large Language Models

arXiv:2605.11222v1 Announce Type: new Abstract: Quantization is an effective strategy to reduce the storage and computation footprint of large language models (LLMs). Post-training quantization (PTQ) is a leading approach for compressing LLMs. Popular weight quantization…

5
arXiv — Machine Learning research 19h ago

LiBaGS: Lightweight Boundary Gap Synthesis for Targeted Synthetic Data Selection

arXiv:2605.11231v1 Announce Type: new Abstract: Synthetic data is useful only when the added samples fill missing parts of the training distribution that matter for the downstream task. We introduce LiBaGS, a lightweight, generator-agnostic method for targeted synthetic training…

30
arXiv — Machine Learning research 19h ago

A Comparative Study of Model Selection Criteria for Symbolic Regression

arXiv:2605.11233v1 Announce Type: new Abstract: Effective model selection is critical in symbolic regression (SR) to identify mathematical expressions that balance accuracy and complexity, and have low expected error on unseen data. Many modern implementations of genetic…

38
arXiv — Machine Learning research 19h ago

Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

arXiv:2605.11235v1 Announce Type: new Abstract: In LLM Reinforcement Fine-Tuning (RFT), curriculum learning drives both efficiency and performance. Yet, current methods externalize curriculum judgment via handcrafted heuristics or auxiliary models, risking misalignment with the…

18
arXiv — Machine Learning research 19h ago

DeconDTN-Toolkit: A Library for Evaluation and Enhancement of Robustness to Provenance Shift

arXiv:2605.11237v1 Announce Type: new Abstract: Despite the burgeoning body of work on distribution shifts, provenance shift-where the relationship between data source and label changes at deployment-remains poorly understood and under-addressed. In this paper, we establish a…

13
arXiv — Machine Learning research 19h ago

Extending Kernel Trick to Influence Functions

arXiv:2605.11239v1 Announce Type: new Abstract: In this paper, we present a dual representation of the influence functions, whose computational complexity scales with dataset size rather than model size. Both analytically and experimentally, we show that this representation can…

7
arXiv — Machine Learning research 19h ago

Support-Proximity Augmented Diffusion Estimation for Offline Black-Box Optimization

arXiv:2605.11246v1 Announce Type: new Abstract: Offline black-box optimization aims to discover novel designs with high property scores using only a static dataset, a task fundamentally challenged by the out-of-distribution (OOD) extrapolation problem. Existing approaches…

13
arXiv — Machine Learning research 19h ago

A Proof-of-Concept Simulation-Driven Digital Twin Framework for Decision-Aware Diabetes Modeling

arXiv:2605.11247v1 Announce Type: new Abstract: This paper presents a proof-of-concept digital twin framework for simulation-driven diabetes modeling using benchmark clinical data, synthetic temporal augmentation, and illustrative continuous glucose monitoring (CGM) analysis.…

27
arXiv — Machine Learning research 19h ago

Curriculum Learning-Guided Progressive Distillation in Large Language Models

arXiv:2605.11260v1 Announce Type: new Abstract: Knowledge distillation is a key technique for transferring the capabilities of large language models (LLMs) into smaller, more efficient student models. Existing distillation approaches often overlook two critical factors: the…

26
arXiv — Machine Learning research 19h ago

Latent Chain-of-Thought Improves Structured-Data Transformers

arXiv:2605.11262v1 Announce Type: new Abstract: Chain-of-thought and more broadly test-time compute are known to augment the expressive capabilities of language models and have led to major innovations in reasoning. Motivated by this success, this paper explores latent…

24
arXiv — Machine Learning research 19h ago

Localization Boosting for Growth Markets: Mitigating Cross-Locale Behavioral Bias in Learning-to-Rank

arXiv:2605.11272v1 Announce Type: new Abstract: Adobe Express is expanding internationally, but the US has a disproportionately large content supply and interaction volume. Learning-to-rank (LTR) models trained primarily on behavioral feedback inherit this imbalance: templates…

20
arXiv — Machine Learning research 19h ago

Beyond Similarity: Temporal Operator Attention for Time Series Analysis

arXiv:2605.11287v1 Announce Type: new Abstract: A persistent paradox in time-series forecasting is that structurally simple MLP and linear models often outperform high-capacity Transformers. We argue that this gap arises from a mismatch in the sequence-modeling primitive: while…

18
arXiv — Machine Learning research 19h ago

Quotient-Categorical Representations for Bellman-Compatible Average-Reward Distributional Reinforcement Learning

arXiv:2605.11289v1 Announce Type: new Abstract: Average-reward reinforcement learning requires estimating the gain and the bias, which is defined only up to an additive constant. This makes direct distributional analogues ill-posed on the real line. We introduce a quotient-space…

27
arXiv — Machine Learning research 19h ago

Optimal Representations for Generalized Contrastive Learning with Imbalanced Datasets

arXiv:2605.11291v1 Announce Type: new Abstract: In this paper, we provide a computable characterization of the geometry of optimal representations in Contrastive Learning (CL) when the classes are imbalanced. When classes are balanced and the representation dimension is greater…

27
arXiv — Machine Learning research 19h ago

Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling

arXiv:2605.11299v1 Announce Type: new Abstract: Code generation is typically trained in the primal space of programs: a model produces a candidate solution and receives sparse execution feedback, often a single pass/fail bit. Test-time scaling enriches the inference procedure by…

32
arXiv — Machine Learning research 19h ago

A Theory of Time-Sensitive Language Generation: Sparse Hallucination Beats Mode Collapse

arXiv:2605.11302v1 Announce Type: new Abstract: We study language generation in the limit under a global preference ordering on strings, as introduced by Kleinberg and Wei. As in [arXiv:2504.14370, arXiv:2511.05295], we aim for \emph{breadth}, but impose an additional…

20
arXiv — Machine Learning research 19h ago

Couple to Control: Joint Initial Noise Design in Diffusion Models

arXiv:2605.11311v1 Announce Type: new Abstract: Diffusion models typically generate image batches from independent Gaussian initial noises. We argue that this independence assumption is only one choice within a broader class of valid joint noise designs. Instead, one can specify…

11
arXiv — Machine Learning research 19h ago

Error whitening: Why Gauss-Newton outperforms Newton

arXiv:2605.11316v1 Announce Type: new Abstract: The Gauss-Newton matrix is widely viewed as a positive semidefinite approximation of the Hessian, yet mounting empirical evidence shows that Gauss-Newton descent outperforms Newton's method. We adopt a function space perspective to…

5
arXiv — Machine Learning research 19h ago

$\varepsilon$-Good Action Identification in Fixed-Budget Monte Carlo Tree Search

arXiv:2605.11324v1 Announce Type: new Abstract: We study the fixed-budget max-min action identification problem in depth-2 max-min trees, an important special case of Monte Carlo Tree Search. A learner sequentially allocates $T$ samples to leaves and then recommends a subtree…

17
arXiv — Machine Learning research 19h ago

Neural Statistical Functions

arXiv:2605.11327v1 Announce Type: new Abstract: Classical deep learning typically operates on individual cases. Despite its success, real-world usage often requires repeated inference to estimate statistical quantities for complex decision-making tasks involving uncertainty or…

24
arXiv — Machine Learning research 19h ago

Epistemic Uncertainty for Test-Time Discovery

arXiv:2605.11328v1 Announce Type: new Abstract: Automated scientific discovery using large language models relies on identifying genuinely novel solutions. Standard reinforcement learning penalizes high-variance mutations, which leads the policy to prioritize familiar patterns.…

31
arXiv — Machine Learning research 19h ago

Physics-Informed Teacher-Student Ensemble Learning for Traffic State Estimation with a Varying Speed Limit Scenario

arXiv:2605.11346v1 Announce Type: new Abstract: Physics-informed deep learning (PIDL) neural networks have shown their capability as a useful instrument for transportation practitioners in utilizing the underlying relationship between the state variables for traffic state…

11
arXiv — Machine Learning research 19h ago

Gradient-Free Noise Optimization for Reward Alignment in Generative Models

arXiv:2605.11347v1 Announce Type: new Abstract: Existing reward alignment methods for diffusion and flow models rely on multi-step stochastic trajectories, making them difficult to extend to deterministic generators. A natural alternative is noise-space optimization, but…

38
arXiv — Machine Learning research 19h ago

gym-invmgmt: An Open Benchmarking Framework for Inventory Management Methods

arXiv:2605.11355v1 Announce Type: new Abstract: Inventory-policy comparisons are often difficult to interpret because performance depends on the evaluation contract as much as on the policy itself. Differences in topology, demand regime, information access, feasibility…

32
arXiv — Machine Learning research 19h ago

The tractability landscape of diffusion alignment: regularization, rewards, and computational primitives

arXiv:2605.11361v1 Announce Type: new Abstract: Inference-time reward alignment asks how to turn a pre-trained diffusion model with base law $p$ into a sampler that favors a reward $r$ while remaining close to $p$. Since there is no canonical distributional distance for this…

27
arXiv — Machine Learning research 19h ago

Causal Fairness for Survival Analysis

arXiv:2605.11362v1 Announce Type: new Abstract: In the data-driven era, large-scale datasets are routinely collected and analyzed using machine learning (ML) and artificial intelligence (AI) to inform decisions in high-stakes domains such as healthcare, employment, and criminal…

31
arXiv — Machine Learning research 19h ago

LPDP: Inference-Time Reward Control for Variable-Length DNA Generation with Edit Flows

arXiv:2605.11368v1 Announce Type: new Abstract: We study the application of recent Edit Flows for inference-time reward control for DNA sequence generation. Unlike most reward-guided DNA generation frameworks, which operate on fixed-length sequence spaces, Edit Flows have a…

6
arXiv — Machine Learning research 19h ago

TRACE: Temporal Routing with Autoregressive Cross-channel Experts for EEG Representation Learning

arXiv:2605.11380v1 Announce Type: new Abstract: Learning transferable representations for electroencephalography (EEG) remains challenging because EEG signals are inherently multi-channel and non-stationary. Channels observed at the same time provide coupled measurements of…

25
arXiv — Machine Learning research 19h ago

Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies

arXiv:2605.11387v1 Announce Type: new Abstract: We address the problem of fine-tuning pre-trained generative policies with reinforcement learning (RL) while preserving the multimodality of their action distributions. Existing methods for RL fine-tuning of generative policies…

17
arXiv — Machine Learning research 19h ago

MuonQ: Enhancing Low-Bit Muon Quantization via Directional Fidelity Optimization

arXiv:2605.11396v1 Announce Type: new Abstract: The Muon optimizer has emerged as a compelling alternative to Adam for training large language models, achieving remarkable computational savings through gradient orthogonalization. However, Muon's optimizer state is more sensitive…

21
arXiv — Machine Learning research 19h ago

More Than Meets the Eye: A Semantics-Aware Traffic Augmentation Framework for Generalizable Website Fingerprinting

arXiv:2605.11402v1 Announce Type: new Abstract: Deep learning-based website fingerprinting has emerged as an effective technique for inferring the websites users visit. Although existing methods achieve strong performance on closed-world datasets, they often fail to generalize…

23
arXiv — Machine Learning research 19h ago

20/20 Vision Language Models: A Prescription for Better VLMs through Data Curation Alone

arXiv:2605.11405v1 Announce Type: new Abstract: Data curation has shifted the quality-compute frontier for language-model and contrastive image-text pretraining, but its role for vision-language models (VLMs) is far less established. We ask how far data curation alone can take…

33
arXiv — Machine Learning research 19h ago

A Boundary-Aware Non-parametric Granular-Ball Classifier Based on Minimum Description Length

arXiv:2605.11406v1 Announce Type: new Abstract: Existing granular-ball classification methods are often driven by handcrafted quality measures, neighborhood rules, or heuristic splitting and stopping criteria, which may reduce the transparency of local construction decisions and…

6
arXiv — Machine Learning research 19h ago

Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer

arXiv:2605.11414v1 Announce Type: new Abstract: While traditional time-series classifiers assume full sequences at inference, practical constraints (latency and cost) often limit inputs to partial prefixes. The absence of class-discriminative patterns in partial data can…

29
arXiv — Machine Learning research 19h ago

FastUMAP: Scalable Dimensionality Reduction via Bipartite Landmark Sampling

arXiv:2605.11428v1 Announce Type: new Abstract: Exploratory analysis of high-dimensional data rarely stops at a single embedding. In practice, analysts rerun dimensionality reduction after changing preprocessing, subsets, or hyperparameters, and standard nonlinear methods can…

26
arXiv — Machine Learning research 19h ago

Deep Minds and Shallow Probes

arXiv:2605.11448v1 Announce Type: new Abstract: Neural representations are not unique objects. Even when two systems realize the same downstream computation, their hidden coordinates may differ by reparameterization. A probe family intended to reveal structure already present in…

18
arXiv — Machine Learning research 19h ago

Beyond Prediction: Interval Neural Networks for Uncertainty-Aware System Identification

arXiv:2605.11460v1 Announce Type: new Abstract: System identification (SysID) is critical for modeling dynamical systems from experimental data, yet traditional approaches often fail to capture nonlinear behaviors. While deep learning offers powerful tools for modeling such…

20
arXiv — Machine Learning research 19h ago

Drop the Act: Probe-Filtered RL for Faithful Chain-of-Thought Reasoning

arXiv:2605.11467v1 Announce Type: new Abstract: Reasoning models post-hoc rationalize answers they have already committed to internally, producing chains of *reasoning theater*: deliberative-looking steps that contribute nothing to correctness. This wastes inference tokens,…

7
arXiv — Machine Learning research 19h ago

Robust Multi-Agent Path Finding under Observation Attacks: A Principled Adversarial-Plus-Smoothing Training Recipe

arXiv:2605.11469v1 Announce Type: new Abstract: Decentralized multi-agent path finding (MAPF) routes a team of agents on a shared grid, each acting from its own local view. The standard solution trains one shared neural policy with Proximal Policy Optimization (PPO), a popular…

20
arXiv — Machine Learning research 19h ago

On the Approximation Complexity of Matrix Product Operator Born Machines

arXiv:2605.11471v1 Announce Type: new Abstract: Matrix product operator Born machines (MPO-BMs) are tractable tensor-network models for probabilistic modeling, but their efficient approximation capability remains unclear. We characterize this boundary from both negative and…

35
arXiv — Machine Learning research 19h ago

Efficient Adjoint Matching for Fine-tuning Diffusion Models

arXiv:2605.11480v1 Announce Type: new Abstract: Reward fine-tuning has become a common approach for aligning pretrained diffusion and flow models with human preferences in text-to-image generation. Among reward-gradient-based methods, Adjoint Matching (AM) provides a principled…

30
arXiv — Machine Learning research 19h ago

Adaptive Calibration in Non-Stationary Environments

arXiv:2605.11490v1 Announce Type: new Abstract: Making calibrated online predictions is a central challenge in modern AI systems. Much of the existing literature focuses on fully adversarial environments where outcomes may be arbitrary, leading to conservative algorithms that…

9
arXiv — Machine Learning research 19h ago

Understanding and Preventing Entropy Collapse in RLVR with On-Policy Entropy Flow Optimization

arXiv:2605.11491v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has become an effective paradigm for improving the reasoning ability of large language models. However, widely used RLVR algorithms, such as GRPO, often suffer from entropy…

12
arXiv — Machine Learning research 19h ago

CTFusion: A CTF-based Benchmark for LLM Agent Evaluation

arXiv:2605.11504v1 Announce Type: new Abstract: Recent advances in Large Language Models (LLMs) have enabled agentic systems for complex, multi-step tasks; cybersecurity is emerging as a prominent application. To evaluate such agents, researchers widely adopt Capture The Flag…

23
arXiv — Machine Learning research 19h ago

EqOD: Symmetry-Informed Stability Selection for PDE Identification

arXiv:2605.11524v1 Announce Type: new Abstract: Data-driven identification of partial differential equations (PDEs) relies on sparse regression over a candidate library of differential operators, where larger libraries inflate false positives under observation noise and smaller…

26
arXiv — NLP / Computation & Language research 19h ago

Sampling More, Getting Less: Calibration is the Diversity Bottleneck in LLMs

arXiv:2605.11128v1 Announce Type: new Abstract: Diversity is essential for language-model applications ranging from creative generation to scientific discovery, yet modern LLMs often collapse into a narrow subset of plausible outputs. While prior work has developed benchmarks…

11
arXiv — NLP / Computation & Language research 19h ago

ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV

arXiv:2605.11143v1 Announce Type: new Abstract: Reasoning benchmarks measure clinical performance on clean inputs. We evaluate the step before reasoning: retrieval over real EHR notes, where negation, temporality, and family-versus-patient attribution can flip a correct answer…

27
arXiv — NLP / Computation & Language research 19h ago

Decomposing Evolutionary Mixture-of-LoRA Architectures: The Routing Lever, the Lifecycle Penalty, and a Substrate-Conditional Boundary

arXiv:2605.11153v1 Announce Type: new Abstract: We decompose an evolutionary mixture-of-LoRA system on a from-scratch ~150M-parameter widened-D substrate (D=1536, V=32000; D/V approx 0.048; the "widened-1536" substrate) into three factors -- a router rewrite (parallel sigmoid…

19
arXiv — NLP / Computation & Language research 19h ago

The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models

arXiv:2605.11167v1 Announce Type: new Abstract: Existing multi-model and tool-augmented systems communicate by generating text, serializing every exchange through the output vocabulary. Can two pretrained language models instead coordinate through a continuous, concurrent…

16
arXiv — NLP / Computation & Language research 19h ago

How Does Differential Privacy Affect Social Bias in LLMs? A Systematic Evaluation

arXiv:2605.11195v1 Announce Type: new Abstract: Large language models (LLMs) trained on web-scale corpora can memorize sensitive training data, posing significant privacy risks. Differential privacy (DP) has emerged as a principled framework that limits the influence of…

32
arXiv — NLP / Computation & Language research 19h ago

Instructions shape Production of Language, not Processing

arXiv:2605.11206v1 Announce Type: new Abstract: Instructions trigger a production-centered mechanism in language models. Through a cognitively inspired lens that separates language processing and production, we reveal this mechanism as an asymmetry between the two stages by…

14
arXiv — NLP / Computation & Language research 19h ago

ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction

arXiv:2605.11212v1 Announce Type: new Abstract: Computer-use agents~(CUAs) rely on visual observations of graphical user interfaces, where each screenshot is encoded into a large number of visual tokens. As interaction trajectories grow, the token cost increases rapidly,…

11
arXiv — NLP / Computation & Language research 19h ago

RETUYT-INCO at BEA 2026 Shared Task 2: Meta-prompting in Rubric-based Scoring for German

arXiv:2605.11242v1 Announce Type: new Abstract: In this paper, we present the RETUYT-INCO participation at the BEA 2026 shared task "Rubric-based Short Answer Scoring for German". Our team participated in track 1 (Unseen answers three-way), track 3 (Unseen answers two-way) and…

26
arXiv — NLP / Computation & Language research 19h ago

HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model

arXiv:2605.11255v1 Announce Type: new Abstract: We present Hebatron, a Hebrew-specialized open-weight large language model built on the NVIDIA Nemotron-3 sparse Mixture-of-Experts architecture. Training employs a three-phase easy-to-hard curriculum with continuous…

11
arXiv — NLP / Computation & Language research 19h ago

ReAD: Reinforcement-Guided Capability Distillation for Large Language Models

arXiv:2605.11290v1 Announce Type: new Abstract: Capability distillation applies knowledge distillation to selected model capabilities, aiming to compress a large language model (LLM) into a smaller one while preserving the abilities needed for a downstream task. However, most…

27
arXiv — NLP / Computation & Language research 19h ago

Predicting Psychological Well-Being from Spontaneous Speech using LLMs

arXiv:2605.11303v1 Announce Type: new Abstract: We investigate the use of Large Language Models (LLMs) for zero-shot prediction of Ryff Psychological Well-Being (PWB) scores from spontaneous speech. Using a few minutes of voice recordings from 111 participants in the PsyVoiD…

7
arXiv — NLP / Computation & Language research 19h ago

SOMA: Efficient Multi-turn LLM Serving via Small Language Model

arXiv:2605.11317v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed in multi-turn dialogue settings where preserving conversational context across turns is essential. A standard serving practice concatenates the full dialogue history at every…

33
arXiv — NLP / Computation & Language research 19h ago

Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence

arXiv:2605.11348v1 Announce Type: new Abstract: During disasters, extracting causal relations from social media can strengthen situational awareness by identifying factors linked to casualties, physical damage, infrastructure disruption, and cascading impacts. However,…

17
arXiv — NLP / Computation & Language research 19h ago

An Empirical Study of Automating Agent Evaluation

arXiv:2605.11378v1 Announce Type: new Abstract: Agent evaluation requires assessing complex multi-step behaviors involving tool use and intermediate reasoning, making it costly and expertise-intensive. A natural question arises: can frontier coding assistants reliably automate…

5
arXiv — NLP / Computation & Language research 19h ago

Deep Reasoning in General Purpose Agents via Structured Meta-Cognition

arXiv:2605.11388v1 Announce Type: new Abstract: Humans intuitively solve complex problems by flexibly shifting among reasoning modes: they plan, execute, revise intermediate goals, resolve ambiguity through associative judgment, and apply formal procedures to well-specified…

5
arXiv — NLP / Computation & Language research 19h ago

Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training

arXiv:2605.11416v1 Announce Type: new Abstract: Selective layer-wise updates are essential for low-cost continued pre-training of Large Language Models (LLMs), yet determining which layers to freeze or train remains an empirical black-box problem due to the lack of interpretable…

28
arXiv — NLP / Computation & Language research 19h ago

Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

arXiv:2605.11436v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed on long-horizon tasks in partially observable environments, where they must act while inferring and tracking a complex environment state over many steps. This leads to two…

38
arXiv — NLP / Computation & Language research 19h ago

StoicLLM: Preference Optimization for Philosophical Alignment in Small Language Models

arXiv:2605.11483v1 Announce Type: new Abstract: While large language models excel at factual adaptation, their ability to internalize nuanced philosophical frameworks under severe data constraints remains underexplored. We investigate this by specializing small LLMs on…

13
arXiv — NLP / Computation & Language research 19h ago

Robust Biomedical Publication Type and Study Design Classification with Knowledge-Guided Perturbations

arXiv:2605.11502v1 Announce Type: new Abstract: Accurately and consistently indexing biomedical literature by publication type and study design is essential for supporting evidence synthesis and knowledge discovery. Prior work on automated publication type and study design…

23
arXiv — NLP / Computation & Language research 19h ago

A Study on Hidden Layer Distillation for Large Language Model Pre-Training

arXiv:2605.11513v1 Announce Type: new Abstract: Knowledge Distillation (KD) is a critical tool for training Large Language Models (LLMs), yet the majority of research focuses on approaches that rely solely on output logits, neglecting semantic information in the teacher's…

25
arXiv — NLP / Computation & Language research 19h ago

Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation

arXiv:2605.11533v1 Announce Type: new Abstract: Clinical check-up reports are multimodal documents that combine page layouts, tables, numerical biomarkers, abnormality flags, imaging findings, and domain-specific terminology. Such heterogeneous evidence is difficult for…

30
arXiv — NLP / Computation & Language research 19h ago

Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting

arXiv:2605.11538v1 Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) has emerged as a promising approach for improving the reasoning capabilities of large language models. However, it struggles to effectively balance the tradeoff between exploration and…

23
arXiv — NLP / Computation & Language research 19h ago

Three Regimes of Context-Parametric Conflict: A Predictive Framework and Empirical Validation

arXiv:2605.11574v1 Announce Type: new Abstract: The literature on how large language models handle conflict between their training knowledge and a contradicting document presents a persistent empirical contradiction: some studies find models stubbornly retain their trained…

35
arXiv — NLP / Computation & Language research 19h ago

BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

arXiv:2605.11577v1 Announce Type: new Abstract: Autoregressive language models generate text one token at a time, yet natural language is inherently structured in multi-token units, including phrases, n-grams, and collocations that carry meaning jointly. This one-token…

27
arXiv — NLP / Computation & Language research 19h ago

Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference

arXiv:2605.11581v1 Announce Type: new Abstract: When large language models (LLMs) serve real-time inference in commercial online advertising systems, end-to-end latency must be strictly bounded to the millisecond range. Yet every token generated during the decode phase triggers…

32
arXiv — NLP / Computation & Language research 19h ago

Efficient LLM-based Advertising via Model Compression and Parallel Verification

arXiv:2605.11582v1 Announce Type: new Abstract: Large language models (LLMs) have shown remarkable potential in advertising scenarios such as ad creative generation and targeted advertising. However, deploying LLMs in real-time advertising systems poses significant challenges…

19
arXiv — NLP / Computation & Language research 19h ago

DiffScore: Text Evaluation Beyond Autoregressive Likelihood

arXiv:2605.11601v1 Announce Type: new Abstract: Autoregressive language models are widely used for text evaluation, however, their left-to-right factorization introduces positional bias, i.e., early tokens are scored with only leftward context, conflating architectural asymmetry…

38
arXiv — NLP / Computation & Language research 19h ago

PRISM: A Geometric Risk Bound that Decomposes Drift into Scale, Shape, and Head

arXiv:2605.11608v1 Announce Type: new Abstract: Comparing post-training LLM variants, such as quantized, LoRA-adapted, and distilled models, requires a diagnostic that identifies how a variant has drifted, not only whether it has degraded. Existing similarity scores such as CKA…

31
arXiv — NLP / Computation & Language research 19h ago

When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models

arXiv:2605.11612v1 Announce Type: new Abstract: Backdoor vulnerabilities widely exist in the fine-tuning of large language models(LLMs). Most backdoor poisoning methods operate mainly at the token level and lack deeper semantic manipulation, which limits stealthiness. In…

25
arXiv — NLP / Computation & Language research 19h ago

OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models

arXiv:2605.11629v1 Announce Type: new Abstract: Recent multimodal large language models (MLLMs) have shown strong chain-of-thought (CoT) reasoning ability on vision-language tasks, but their direct deployment in real-world systems is often limited by latency and resource…

38
arXiv — NLP / Computation & Language research 19h ago

Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization

arXiv:2605.11632v1 Announce Type: new Abstract: Self-generated counterfactual explanations (SCEs) are minimally modified inputs (minimality) generated by large language models (LLMs) that flip their own predictions (validity), offering a causally grounded approach to unraveling…

37
arXiv — NLP / Computation & Language research 19h ago

Human-Grounded Multimodal Benchmark with 900K-Scale Aggregated Student Response Distributions from Japan's National Assessment of Academic Ability

arXiv:2605.11663v1 Announce Type: new Abstract: Authentic school examinations provide a high-validity test bed for evaluating multimodal large language models (MLLMs), yet benchmarks grounded in Japanese K-12 assessments remain scarce. We present a multimodal dataset constructed…

13
arXiv — NLP / Computation & Language research 19h ago

Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

arXiv:2605.11685v1 Announce Type: new Abstract: Large language model (LLM) unlearning aims to remove specific data influences from pre-trained model without costly retraining, addressing privacy, copyright, and safety concerns. However, recent studies reveal a critical…

17
arXiv — NLP / Computation & Language research 19h ago

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

arXiv:2605.11739v1 Announce Type: new Abstract: On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, existing studies largely attribute this advantage to denser and more stable supervision, while the parameter-level…

6
arXiv — NLP / Computation & Language research 19h ago

Training-Inference Consistent Segmented Execution for Long-Context LLMs

arXiv:2605.11744v1 Announce Type: new Abstract: Transformer-based large language models face severe scalability challenges in long-context generation due to the computational and memory costs of full-context attention. Under practical computation and memory constraints, many…

7
arXiv — NLP / Computation & Language research 19h ago

Safety-Oriented Evaluation of Language Understanding Systems for Air Traffic Control

arXiv:2605.11769v1 Announce Type: new Abstract: Air Traffic Control (ATC) is a safety-critical domain in which incorrect interpretation of instructions may lead to severe operational consequences. While large language models (LLMs) demonstrate strong general performance, their…

7
arXiv — NLP / Computation & Language research 19h ago

From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction

arXiv:2605.11774v1 Announce Type: new Abstract: By processing electronic health records (EHRs) as natural language sequences, large language models (LLMs) have shown potential in clinical prediction tasks such as mortality prediction and phenotyping. However, longitudinal or…

13
arXiv — NLP / Computation & Language research 19h ago

Choosing features for classifying multiword expressions

arXiv:2605.11779v1 Announce Type: new Abstract: Multiword expressions (MWEs) are a heterogeneous set with a glaring need for classifications. Designing a satisfactory classification involves choosing features. In the case of MWEs, many features are a priori available. Not all…

21
arXiv — NLP / Computation & Language research 19h ago

Probabilistic Calibration Is a Trainable Capability in Language Models

arXiv:2605.11845v1 Announce Type: new Abstract: Language models are increasingly used in settings where outputs must satisfy user-specified randomness constraints, yet their generation probabilities are often poorly calibrated to those targets. We study whether this capability…

17
arXiv — NLP / Computation & Language research 19h ago

Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

arXiv:2605.11854v1 Announce Type: new Abstract: Diffusion Language Models (DLMs) have recently emerged as a promising alternative to autoregressive language models, offering stronger global awareness and highly parallel generation. However, post-training DLMs with standard…

18
arXiv — NLP / Computation & Language research 19h ago

Concordance Comparison as a Means of Assembling Local Grammars

arXiv:2605.11862v1 Announce Type: new Abstract: Named Entity Recognition for person names is an important but non-trivial task in information extraction. This article uses a tool that compares the concordances obtained from two local grammars (LG) and highlights the differences.…

12
arXiv — NLP / Computation & Language research 19h ago

Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models

arXiv:2605.11887v1 Announce Type: new Abstract: Large language models have achieved remarkable capabilities across diverse tasks, yet their internal decision-making processes remain largely opaque, limiting our ability to inspect, control, and systematically improve them. This…

22
arXiv — NLP / Computation & Language research 19h ago

YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning

arXiv:2605.11906v1 Announce Type: new Abstract: Preference optimization has become an important post-training paradigm for improving the reasoning abilities of large language models. Existing methods typically rely on externally constructed preference data, using preferred and…

31
arXiv — NLP / Computation & Language research 19h ago

Enhancing Target-Guided Proactive Dialogue Systems via Conversational Scenario Modeling and Intent-Keyword Bridging

arXiv:2605.11964v1 Announce Type: new Abstract: A target-guided proactive dialogue system aims to steer conversations proactively toward pre-defined targets, such as designated keywords or specific topics. During guided conversations, dynamically modeling conversational…

37
arXiv — NLP / Computation & Language research 19h ago

On Predicting the Post-training Potential of Pre-trained LLMs

arXiv:2605.11978v1 Announce Type: new Abstract: The performance of Large Language Models (LLMs) on downstream tasks is fundamentally constrained by the capabilities acquired during pre-training. However, traditional benchmarks like MMLU often fail to reflect a base model's…

11
arXiv — NLP / Computation & Language research 19h ago

Towards Visually-Guided Movie Subtitle Translation for Indic Languages

arXiv:2605.11993v1 Announce Type: new Abstract: Movie subtitle translation is inherently multimodal, yet text-only systems often miss visual cues needed to convey emotion, action, and social nuance, especially for low-resource Indic languages (English to Hindi, Bengali, Telugu,…

13
arXiv — NLP / Computation & Language research 19h ago

Learning Agentic Policy from Action Guidance

arXiv:2605.12004v1 Announce Type: new Abstract: Agentic reinforcement learning (RL) for Large Language Models (LLMs) critically depends on the exploration capability of the base policy, as training signals emerge only within its in-capability region. For tasks where the base…

12
arXiv — NLP / Computation & Language research 19h ago

SAGE: Scalable Automated Robustness Augmentation for LLM Knowledge Evaluation

arXiv:2605.12022v1 Announce Type: new Abstract: Large Language Models (LLMs) achieve strong performance on standard knowledge evaluation benchmarks, yet recent work shows that their knowledge capabilities remain brittle under question variants that test the same knowledge in…

26
arXiv — NLP / Computation & Language research 19h ago

Caraman at SemEval-2026 Task 8: Three-Stage Multi-Turn Retrieval with Query Rewriting, Hybrid Search, and Cross-Encoder Reranking

arXiv:2605.12028v1 Announce Type: new Abstract: We describe our system for SemEval-2026 Task 8 (MTRAGEval), participating in Task A (Retrieval) across four English-language domains. Our approach employs a three-stage pipeline: (1) query rewriting via a LoRA-fine-tuned Qwen 2.5…

30
arXiv — NLP / Computation & Language research 19h ago

SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs

arXiv:2605.12039v1 Announce Type: new Abstract: Skill libraries enable large language model agents to reuse experience from past interactions, but most existing libraries store skills as isolated entries and retrieve them only by semantic similarity. This leads to two key…

11
arXiv — NLP / Computation & Language research 19h ago

Is Child-Directed Language Optimized for Word Learning? A Computational Study of Verb Meaning Acquisition

arXiv:2605.12047v1 Announce Type: new Abstract: Is child-directed language (CDL) optimized to support language learning, and which aspects of linguistic development does it facilitate? We investigate this question using neural language models trained on CDL versus adult-directed…

6
arXiv — NLP / Computation & Language research 19h ago

Do Language Models Encode Knowledge of Linguistic Constraint Violations?

arXiv:2605.12055v1 Announce Type: new Abstract: Large Language Models (LLMs) achieve strong linguistic performance, yet their internal mechanisms for producing these predictions remain unclear. We investigate the hypothesis that LLMs encode representations of linguistic…

31
arXiv — NLP / Computation & Language research 19h ago

Sign Language Recognition and Translation for Low-Resource Languages: Challenges and Pathways Forward

arXiv:2605.12096v1 Announce Type: new Abstract: Sign languages are natural, visual-gestural languages used by Deaf communities worldwide. Over 300 distinct sign languages remain severely low-resource due to limited documentation, sparse datasets, and insufficient computational…

27
arXiv — NLP / Computation & Language research 19h ago

Metaphor Is Not All Attention Needs

arXiv:2605.12128v1 Announce Type: new Abstract: Large language models are increasingly deployed in safety-critical applications, where their ability to resist harmful instructions is essential. Although post-training aims to make models robust against many jailbreak strategies,…

20
arXiv — NLP / Computation & Language research 19h ago

Latent Causal Void: Explicit Missing-Context Reconstruction for Misinformation Detection

arXiv:2605.12156v1 Announce Type: new Abstract: Automatic misinformation detection performs well when deception is visible in what an article explicitly states. However, some misinformation articles remain locally coherent and only become misleading once compared with…

27
arXiv — NLP / Computation & Language research 19h ago

Correcting Selection Bias in Sparse User Feedback for Large Language Model Quality Estimation: A Multi-Agent Hierarchical Bayesian Approach

arXiv:2605.12177v1 Announce Type: new Abstract: [Abridged] Production LLM deployments receive feedback from a non-random fraction of users: thumbs sit mostly in the tails of the satisfaction distribution, and a naive average over them can land 40-50 percentage points away from…

6
arXiv — NLP / Computation & Language research 19h ago

Mitigating Context-Memory Conflicts in LLMs through Dynamic Cognitive Reconciliation Decoding

arXiv:2605.12185v1 Announce Type: new Abstract: Large language models accumulate extensive parametric knowledge through pre-training. However, knowledge conflicts occur when outdated or incorrect parametric knowledge conflicts with external knowledge in the context. Existing…

27
arXiv — NLP / Computation & Language research 19h ago

Mechanistic Interpretability of ASR models using Sparse Autoencoders

arXiv:2605.12225v1 Announce Type: new Abstract: Understanding the internal machinations of deep Transformer-based NLP models is more crucial than ever as these models see widespread use in various domains that affect the public at large, such as industry, academia, finance,…

24
arXiv — NLP / Computation & Language research 19h ago

Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

arXiv:2605.12227v1 Announce Type: new Abstract: Adapting large language models (LLMs) to long-context tasks requires post-training methods that remain accurate and coherent over thousands of tokens. Existing approaches are limited in several ways: 1) off-policy methods such as…

12
arXiv — NLP / Computation & Language research 19h ago

Mind the Pause: Disfluency-Aware Objective Tuning for Multilingual Speech Correction with LLMs

arXiv:2605.12242v1 Announce Type: new Abstract: Automatic Speech Recognition (ASR) transcripts often contain disfluencies, such as fillers, repetitions, and false starts, which reduce readability and hinder downstream applications like chatbots and voice assistants. If left…

5
arXiv — NLP / Computation & Language research 19h ago

PreScam: A Benchmark for Predicting Scam Progression from Early Conversations

arXiv:2605.12243v1 Announce Type: new Abstract: Conversational scams, such as romance and investment scams, are emerging as a major form of online fraud. Unlike one-shot scam lures such as fake lottery or unpaid toll messages, they unfold through multi-turn conversations in…

33
arXiv — NLP / Computation & Language research 19h ago

PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents

arXiv:2605.12260v1 Announce Type: new Abstract: Long-horizon language agents accumulate conversation history far faster than any fixed context window can hold, making memory management critical to both answer accuracy and serving cost. Existing approaches either expand the…

8
arXiv — NLP / Computation & Language research 19h ago

What makes a word hard to learn? Modeling L1 influence on English vocabulary difficulty

arXiv:2605.12281v1 Announce Type: new Abstract: What makes a word difficult to learn, and how does the difficulty depend on the learner's native language? We computationally model vocabulary difficulty for English learners whose first language is Spanish, German, or Chinese with…

32
arXiv — NLP / Computation & Language research 19h ago

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

arXiv:2605.12288v1 Announce Type: new Abstract: Direct Preference Optimization (DPO) is a widely used RL-free method for aligning language models from pairwise preferences, but it models preferences over full sequences even though generation is driven by per-token decisions.…

12
arXiv — NLP / Computation & Language research 19h ago

GKnow: Measuring the Entanglement of Gender Bias and Factual Gender

arXiv:2605.12299v1 Announce Type: new Abstract: Recent works have analyzed the impact of individual components of neural networks on gendered predictions, often with a focus on mitigating gender bias. However, mechanistic interpretations of gender tend to (i) focus on a very…

23
arXiv — NLP / Computation & Language research 19h ago

Overview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answering

arXiv:2605.12313v1 Announce Type: new Abstract: Multi-hop question answering (QA) remains a significant challenge in the biomedical domain, requiring systems to integrate information across multiple sources to answer complex questions. To address this problem, the BioCreative IX…

18
arXiv — NLP / Computation & Language research 19h ago

A categorical error sensitivity index (ISEC): A preventive ordinal decision-support measure for irrecoverable errors in manual data entry systems

arXiv:2605.12328v1 Announce Type: new Abstract: Data entry systems remain structurally vulnerable to categorical misclassifications, particularly in small and medium sized enterprises (SMEs). When nominal categories exhibit semantic or morphological proximity, human machine…

29
arXiv — NLP / Computation & Language research 19h ago

Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation

arXiv:2605.12345v1 Announce Type: new Abstract: Parameter-efficient fine-tuning (PEFT) techniques offer task-specific fine-tuning at a fraction of the cost of full fine-tuning, but require separate fine-tuning for every new task (combination). In this paper, we explore three…

25
arXiv — NLP / Computation & Language research 19h ago

MedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answering

arXiv:2605.12361v1 Announce Type: new Abstract: Evaluating large language models (LLMs) in the biomedical domain requires benchmarks that can distinguish reasoning from pattern matching and remain discriminative as model capabilities improve. Existing biomedical question…

6
arXiv — NLP / Computation & Language research 19h ago

Context Convergence Improves Answering Inferential Questions

arXiv:2605.12370v1 Announce Type: new Abstract: While Large Language Models (LLMs) are widely used in open-domain Question Answering (QA), their ability to handle inferential questions-where answers must be derived rather than directly retrieved-remains still underexplored. This…

21
arXiv — NLP / Computation & Language research 19h ago

Pretraining Exposure Explains Popularity Judgments in Large Language Models

arXiv:2605.12382v1 Announce Type: new Abstract: Large language models (LLMs) exhibit systematic preferences for well-known entities, a phenomenon often attributed to popularity bias. However, the extent to which these preferences reflect real-world popularity versus statistical…

19
arXiv — NLP / Computation & Language research 19h ago

Scalable Token-Level Hallucination Detection in Large Language Models

arXiv:2605.12384v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated remarkable capabilities, but they still frequently produce hallucinations. These hallucinations are difficult to detect in reasoning-intensive tasks, where the content appears coherent…

35
arXiv — NLP / Computation & Language research 19h ago

A Comparative Study of Controlled Text Generation Systems Using Level-Playing-Field Evaluation Principles

arXiv:2605.12395v1 Announce Type: new Abstract: Background: Many different approaches to controlled text generation (CTG) have been proposed over recent years, but it is difficult to get a clear picture of which approach performs best, because different datasets and evaluation…

23
arXiv — NLP / Computation & Language research 19h ago

Question Difficulty Estimation for Large Language Models via Answer Plausibility Scoring

arXiv:2605.12398v1 Announce Type: new Abstract: Estimating question difficulty is a critical component in evaluating and improving large language models (LLMs) for question answering (QA). Existing approaches often rely on readability formulas, retrieval-based signals, or…

8
arXiv — NLP / Computation & Language research 19h ago

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

arXiv:2605.12412v1 Announce Type: new Abstract: Large Language Models (LLMs) update their behavior in context, which can be viewed as a form of Bayesian inference. However, the structure of the latent hypothesis space over which this inference operates remains unclear. In this…

9
arXiv — NLP / Computation & Language research 19h ago

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

arXiv:2605.12419v1 Announce Type: new Abstract: Despite the rapid advancements in large language model (LLM) development, fine-tuning them for specific tasks often results in the catastrophic forgetting of their general, language-based reasoning abilities. This work investigates…

24
arXiv — NLP / Computation & Language research 19h ago

Predicting Disagreement with Human Raters in LLM-as-a-Judge Difficulty Assessment without Using Generation-Time Probability Signals

arXiv:2605.12422v1 Announce Type: new Abstract: Automatic generation of educational materials using large language models (LLMs) is becoming increasingly common, but assigning difficulty levels to such materials still requires substantial human effort. LLM-as-a-Judge has…

16
arXiv — NLP / Computation & Language research 19h ago

Geometric Factual Recall in Transformers

arXiv:2605.12426v1 Announce Type: new Abstract: How do transformer language models memorize factual associations? A common view casts internal weight matrices as associative memories over pairs of embeddings, requiring parameter counts that scale linearly with the number of…

5
arXiv — NLP / Computation & Language research 19h ago

A Causal Language Modeling Detour Improves Encoder Continued Pretraining

arXiv:2605.12438v1 Announce Type: new Abstract: When adapting an encoder to a new domain, the standard approach is to continue training with Masked Language Modeling (MLM). We show that temporarily switching to Causal Language Modeling (CLM) followed by a short MLM decay…

38
arXiv — NLP / Computation & Language research 19h ago

The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events

arXiv:2605.12452v1 Announce Type: new Abstract: Large Language Models (LLMs) can generate fluent political text at scale, raising concerns about synthetic discourse during crises and social conflict. Existing AI-text detection often focuses on sentence-level cues such as…

18
arXiv — NLP / Computation & Language research 19h ago

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

arXiv:2605.12487v1 Announce Type: new Abstract: We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of…

36
arXiv — NLP / Computation & Language research 19h ago

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

arXiv:2605.12493v1 Announce Type: new Abstract: Long-term memory is crucial for agents in specialized web environments, where success depends on recalling interface affordances, state dynamics, workflows, and recurring failure modes. However, existing memory benchmarks for…

17
arXiv — NLP / Computation & Language research 19h ago

AgentShield: Deception-based Compromise Detection for Tool-using LLM Agents

arXiv:2605.11026v1 Announce Type: cross Abstract: Defenses against indirect prompt injection (IPI) in tool-using LLM agents share two structural weaknesses. First, they all attempt to prevent attacks rather than detect the compromises that slip through. Second, they have only…

21
arXiv — NLP / Computation & Language research 19h ago

On Problems of Implicit Context Compression for Software Engineering Agents

arXiv:2605.11051v1 Announce Type: cross Abstract: LLM-based Software Engineering agents face a critical bottleneck: context length limitations cause failures on complex, long-horizon tasks. One promising solution is to encode context as continuous embeddings rather than discrete…

27
arXiv — NLP / Computation & Language research 19h ago

Unlocking LLM Creativity in Science through Analogical Reasoning

arXiv:2605.11258v1 Announce Type: cross Abstract: Autonomous science promises to augment scientific discovery, particularly in complex fields like biomedicine. However, this requires AI systems that can consistently generate novel and diverse solutions to open-ended problems. We…

22
arXiv — NLP / Computation & Language research 19h ago

LatentRouter: Can We Choose the Right Multimodal Model Before Seeing Its Answer?

arXiv:2605.11301v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) have heterogeneous strengths across OCR, chart understanding, spatial reasoning, visual question answering, cost, and latency. Effective MLLM routing therefore requires more than…

24
arXiv — NLP / Computation & Language research 19h ago

VERDI: Single-Call Confidence Estimation for Verification-Based LLM Judges via Decomposed Inference

arXiv:2605.11334v1 Announce Type: cross Abstract: LLM-as-Judge systems are widely deployed for automated evaluation, yet practitioners lack reliable methods to know when a judge's verdict should be trusted. Token log-probabilities, the standard post-hoc confidence signal, are…

19
arXiv — NLP / Computation & Language research 19h ago

Much of Geospatial Web Search Is Beyond Traditional GIS

arXiv:2605.11336v1 Announce Type: cross Abstract: Web search queries concern place far more often than existing labelling schemes suggest, yet the landscape of geospatial web search queries - what people ask of place, and how often - remains poorly characterised at scale. We…

13
arXiv — NLP / Computation & Language research 19h ago

PresentAgent-2: Towards Generalist Multimodal Presentation Agents

arXiv:2605.11363v1 Announce Type: cross Abstract: Presentation generation is moving beyond static slide creation toward end-to-end presentation video generation with research grounding, multimodal media, and interactive delivery. We introduce PresentAgent-2, an agentic framework…

30
arXiv — NLP / Computation & Language research 19h ago

Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models

arXiv:2605.11374v1 Announce Type: cross Abstract: Test-time compute is widely believed to benefit only large reasoning models. We show it also helps small embedding models. Most modern embedding checkpoints are distilled from large LLM backbones and inherit their representation…

21
arXiv — NLP / Computation & Language research 19h ago

AcuityBench: Evaluating Clinical Acuity Identification and Uncertainty Alignment

arXiv:2605.11398v1 Announce Type: cross Abstract: We introduce AcuityBench, a benchmark for evaluating whether language models identify the appropriate urgency of care from user medical presentations. Existing health benchmarks emphasize medical question answering, broad health…

36
arXiv — NLP / Computation & Language research 19h ago

fg-expo: Frontier-guided exploration-prioritized policy optimization via adaptive kl and gaussian curriculum

arXiv:2605.11403v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become the standard paradigm for LLM mathematical reasoning, with Group Relative Policy Optimization (GRPO) serving as the dominant algorithm. We identify two overlooked…

38
arXiv — NLP / Computation & Language research 19h ago

MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification

arXiv:2605.11408v1 Announce Type: cross Abstract: Tabular data forms the backbone of high-stakes decision systems in finance, healthcare, and beyond. Yet industrial tabular datasets are inherently difficult: high-dimensional, riddled with missing entries, and rarely labeled at…

5
arXiv — NLP / Computation & Language research 19h ago

Can a Single Message Paralyze the AI Infrastructure? The Rise of AbO-DDoS Attacks through Targeted Mobius Injection

arXiv:2605.11442v1 Announce Type: cross Abstract: Large Language Model (LLM) agents have emerged as key intermediaries, orchestrating complex interactions between human users and a wide range of digital services and LLM infrastructures. While prior research has extensively…

20
arXiv — NLP / Computation & Language research 19h ago

Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

arXiv:2605.11458v1 Announce Type: cross Abstract: On-policy self-distillation has become a strong recipe for LLM reasoning, where a privileged teacher supervises the student's own rollouts while conditioning on the reference solution. A design choice shared by nearly all such…

28
arXiv — NLP / Computation & Language research 19h ago

AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration -- Learning from Cheap, Optimizing Expensive

arXiv:2605.11518v1 Announce Type: cross Abstract: Effectively configuring scalable large language model (LLM) experiments, spanning architecture design, hyperparameter tuning, and beyond, is crucial for advancing LLM research, as poor configuration choices can waste substantial…

13
arXiv — NLP / Computation & Language research 19h ago

Controllable User Simulation

arXiv:2605.11519v1 Announce Type: cross Abstract: Using offline datasets to evaluate conversational agents often fails to cover rare scenarios or to support testing new policies. This has motivated the use of controllable user simulators for targeted, counterfactual evaluation,…

20
arXiv — Machine Learning research 1d ago

Reinforcement learning for inverse structural design and rapid laser cutting of kirigami prototypes

arXiv:2605.08098v1 Announce Type: new Abstract: Kirigami is an increasingly useful fabrication method to produce shape-programmable metamaterial structures. However, inverse design remains difficult because deployment…

12
arXiv — Machine Learning research 1d ago

Path-Based Gradient Boosting for Graph-Level Prediction

arXiv:2605.08102v1 Announce Type: new Abstract: We propose PathBoost, a gradient tree boosting method for graph-level classification and regression that learns discriminative path-based features directly from the input…

20
arXiv — Machine Learning research 1d ago

Distributional Reinforcement Learning via the Cram\'er Distance

arXiv:2605.08104v1 Announce Type: new Abstract: This paper explores the application of the Soft Actor-Critic (SAC) algorithm within a Distributional Reinforcement Learning setting and introduces an implementation of…

15
arXiv — Machine Learning research 1d ago

Geometry-free prediction of inertial lift forces in microfluidic devices using deep learning

arXiv:2605.08109v1 Announce Type: new Abstract: Inertial microfluidic devices (IMDs) offer low-cost, high-throughput alternative techniques for many traditional particle- (or cell-) manipulation tasks, but simulating…

19
arXiv — Machine Learning research 1d ago

BaLoRA: Bayesian Low-Rank Adaptation of Large Scale Models

arXiv:2605.08110v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) has become the standard for fine-tuning large pre-trained models at reduced computational cost. However, its low-rank point-estimate updates…

6
arXiv — Machine Learning research 1d ago

TTCD:Transformer Integrated Temporal Causal Discovery from Non-Stationary Time Series Data

arXiv:2605.08111v1 Announce Type: new Abstract: The widespread availability of complex time series data in various domains such as environmental science, epidemiology, and economics demands robust causal discovery…

35
arXiv — Machine Learning research 1d ago

Do Foundation Model Embeddings Improve Cross-Country Crop Yield Generalisation? A Leave-One-Country-Out Evaluation in Sub-Saharan Africa

arXiv:2605.08113v1 Announce Type: new Abstract: Accurate predictions of smallholder maize yields across national boundaries are critical for food security planning in sub-Saharan Africa, yet most published benchmarks…

17
arXiv — Machine Learning research 1d ago

Statistical Inference and Quality Measures of KV Cache Quantisations Inspired by TurboQuant

arXiv:2605.08114v1 Announce Type: new Abstract: We analyse three KV cache quantization schemes under a fair bit budget: \textbf{KV} (scalar MSE baseline), \textbf{KQV} (WHT + MSE on $K$; WHT + MSE + QJL on $V$), and…

27
arXiv — Machine Learning research 1d ago

The Safety-Aware Denoiser for Text Diffusion Models

arXiv:2605.08116v1 Announce Type: new Abstract: Recent work on text diffusion models offers a promising alternative to autoregressive generation, but controlling their safety remains underexplored. Existing safety…

9
arXiv — Machine Learning research 1d ago

Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking

arXiv:2605.08119v1 Announce Type: new Abstract: Tian (2025) proves a repulsion theorem (Theorem 6) for the matrix $ B = (\widetilde{F}^\top \widetilde{F} + \eta I)^{-1} $ during the interactive feature-learning stage of…

31
arXiv — Machine Learning research 1d ago

Block-Wise Differentiable Sinkhorn Attention: Tail-Refinement Gradients with a Gap-Aware Dustbin Bridge

arXiv:2605.08123v1 Announce Type: new Abstract: We study long-context balanced entropic optimal transport (OT) attention on TPU hardware through a stopped-base, fixed-depth tail-refinement surrogate. After a stopped…

32
arXiv — Machine Learning research 1d ago

Towards Universal Gene Regulatory Network Inference: Unlocking Generalizable Regulatory Knowledge in Single-cell Foundation Models

arXiv:2605.08128v1 Announce Type: new Abstract: Gene Regulatory Network (GRN) inference is essential for understanding complex cellular mechanisms, rendered tractable through single-cell transcriptomic data. With the…

19
arXiv — Machine Learning research 1d ago

Towards Customized Multimodal Role-Play

arXiv:2605.08129v1 Announce Type: new Abstract: Unified multimodal understanding and generation models enable richer human-AI interaction. Yet jointly customizing a character's persona, dialogue style, and visual…

26
arXiv — Machine Learning research 1d ago

Additive Atomic Forests for Symbolic Function and Antiderivative Discovery

arXiv:2605.08130v1 Announce Type: new Abstract: We present a framework for the simultaneous symbolic recovery of a function and its antiderivative from data. The framework rests on three ideas. First, a derivative…

27
arXiv — Machine Learning research 1d ago

Interactive Inverse Reinforcement Learning of Interaction Scenarios via Bi-level Optimization

arXiv:2605.08131v1 Announce Type: new Abstract: Inverse reinforcement learning (IRL) learns a reward function and a corresponding policy that best fit the demonstration data of an expert. However, in the current IRL…

18
arXiv — Machine Learning research 1d ago

DARE: Diffusion Language Model Activation Reuse for Efficient Inference

arXiv:2605.08134v1 Announce Type: new Abstract: Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to auto-regressive (AR) models, offering greater expressive capacity and potential for…

36
arXiv — Machine Learning research 1d ago

Dendritic Neural Networks with Equilibrium Propagation

arXiv:2605.08135v1 Announce Type: new Abstract: Equilibrium propagation (EP) is a biologically plausible alternative to backpropagation (BP), but its effectiveness can degrade in deeper and more challenging learning…

26
arXiv — Machine Learning research 1d ago

Weight Pruning Amplifies Bias: A Multi-Method Study of Compressed LLMs for Edge AI

arXiv:2605.08137v1 Announce Type: new Abstract: Weight pruning is widely advocated for deploying Large Language Models on resource-constrained IoT and edge devices, yet its impact on model fairness remains poorly…

6
arXiv — Machine Learning research 1d ago

DataArc-SynData-Toolkit: A Unified Closed-Loop Framework for Multi-Path, Multimodal, and Multilingual Data Synthesis

arXiv:2605.08138v1 Announce Type: new Abstract: Synthetic data has emerged as a crucial solution to the data scarcity bottleneck in large language models (LLMs), particularly for specialized domains and low-resource…

10
arXiv — Machine Learning research 1d ago

Reasoning emerges from constrained inference manifolds in large language models

arXiv:2605.08142v1 Announce Type: new Abstract: Reasoning in large language models is predominantly evaluated through labeled benchmarks, conflating task performance with the quality of internal inference. Here we study…

15
NVIDIA Developer Blog official-blog 20d ago

Simplify Sparse Deep Learning with Universal Sparse Tensor in nvmath-python

In a previous post, we introduced the Universal Sparse Tensor (UST), enabling developers to decouple a tensor’s sparsity from its memory layout for greater...

14
Hugging Face official-blog 6mo ago

Voice Cloning with Consent

Back to Articles Voice Cloning with Consent Published October 28, 2025 Update on GitHub Upvote 40 Margaret Mitchell meg Lucie-Aimée Kaffee frimelle In this blog post, we introduce the idea of a 'voice consent gate' to support voice cloning with consent. We provide an example…

24
Google DeepMind official-blog 6mo ago

VaultGemma: The world's most capable differentially private LLM

We introduce VaultGemma, the most capable model trained from scratch with differential privacy.

12
Lil'Log (Lilian Weng) research 106mo ago

From GAN to WGAN

[Updated on 2018-09-30: thanks to Yoonju, we have this post translated in Korean !] [Updated on 2019-04-18: this post is also available on arXiv .] Generative adversarial network (GAN) has shown great results in many generative tasks to replicate the real-world rich content such…

4

Have the "on-hold" durations been getting longer for arXiv submissions? [D]

GPT-5 paper drops on arXiv — scaling laws revisited

AIDC-AI/Ovis2.6-80B-A3B · Hugging Face

Google DeepMind paper: reinforcement learning at scale

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

Deterministic Fully-Static Whole-Binary Translation Without Heuristics

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

Interpretable EEG Microstate Discovery via Variational Deep Embedding: A Systematic Architecture Search with Multi-Quadrant Evaluation

QuIDE: Mastering the Quantized Intelligence Trade-off via Active Optimization

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

Rotation-Preserving Supervised Fine-Tuning

Vertex-Softmax: Tight Transformer Verification via Exact Softmax Optimization

Hierarchical Multi-Scale Graph Neural Networks: Scalable Heterophilous Learning with Oversmoothing and Oversquashing Mitigation

LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection

$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin

TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment

Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning

AESOP: Adversarial Execution-path Selection to Overload Deep Learning Pipelines

Seeing the Needle in the Haystack: Towards Weakly-Supervised Log Instance Anomaly Localization via Counterfactual Perturbation

SURGE: Surrogate Gradient Adaptation in Binary Neural Networks

Test-Time Personalization: A Diagnostic Framework and Probabilistic Fix for Scaling Failures

SkillGen: Verified Inference-Time Agent Skill Synthesis

Finite Volume-Informed Neural Network Framework for 2D Shallow Water Equations: Rugged Loss Landscapes and the Importance of Data Guidance

DisagMoE: Computation-Communication overlapped MoE Training via Disaggregated AF-Pipe Parallelism

RT-Transformer: The Transformer Block as a Spherical State Estimator

When and How to Canonize: A Generalization Perspective

ACSAC: Adaptive Chunk Size Actor-Critic with Causal Transformer Q-Network

A Comparative Study of Federated Learning Aggregation Strategies under Homogeneous and Heterogeneous Data Distributions

LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models

Backbone-Equated Diffusion OOD via Sparse Internal Snapshots

Simpson's Paradox in Behavioral Curves: How Aggregation Distorts Parametric Models of User Dynamics

Efficient LLM Reasoning via Variational Posterior Guidance with Efficiency Awareness

Trust Region Inverse Reinforcement Learning: Explicit Dual Ascent using Local Policy Updates

A Switching System Theory of Q-Learning with Linear Function Approximation

ASD-Bench: A Four-Axis Comprehensive Benchmark of AI Models for Autism Spectrum Disorder

Enabling Performant and Flexible Model-Internal Observability for LLM Inference

Newton's Lantern: A Reinforcement Learning Framework for Finetuning AC Power Flow Warm Start Models

GRAFT-ATHENA: Self-Improving Agentic Teams for Autonomous Discovery and Evolutionary Numerical Algorithms

Language Modeling with Hyperspherical Flows

HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series

Steerable Neural ODEs on Homogeneous Spaces

Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training

Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models

CORE: Cyclic Orthotope Relation Embedding for Knowledge Graph Completion

Interpretability Can Be Actionable

COSMOS: Model-Agnostic Personalized Federated Learning with Clustered Server Models and Pseudo-Label-Only Communication

Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data

Optimistic Dual Averaging Unifies Modern Optimizers

Oversmoothing as Representation Degeneracy in Neural Sheaf Diffusion

Muon is Not That Special: Random or Inverted Spectra Work Just as Well

CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration

Deep Learning for Protein Complex Prediction and Design

Variational Linear Attention: Stable Associative Memory for Long-Context Transformers

FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry

The Scaling Law of Evaluation Failure: Why Simple Averaging Collapses Under Data Sparsity and Item Difficulty Gaps, and How Item Response Theory Recovers Ground Truth Across Domains

Measuring Five-Nines Reliability: Sample-Efficient LLM Evaluation in Saturated Benchmarks

Enforcing Constraints in Generative Sampling via Adaptive Correction Scheduling

Leveraging RAG for Training-Free Alignment of LLMs

ADMM-Q: An Improved Hessian-based Weight Quantizer for Post-Training Quantization of Large Language Models

LiBaGS: Lightweight Boundary Gap Synthesis for Targeted Synthetic Data Selection

A Comparative Study of Model Selection Criteria for Symbolic Regression

Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

DeconDTN-Toolkit: A Library for Evaluation and Enhancement of Robustness to Provenance Shift

Extending Kernel Trick to Influence Functions

Support-Proximity Augmented Diffusion Estimation for Offline Black-Box Optimization

A Proof-of-Concept Simulation-Driven Digital Twin Framework for Decision-Aware Diabetes Modeling

Curriculum Learning-Guided Progressive Distillation in Large Language Models

Latent Chain-of-Thought Improves Structured-Data Transformers

Localization Boosting for Growth Markets: Mitigating Cross-Locale Behavioral Bias in Learning-to-Rank

Beyond Similarity: Temporal Operator Attention for Time Series Analysis

Quotient-Categorical Representations for Bellman-Compatible Average-Reward Distributional Reinforcement Learning

Optimal Representations for Generalized Contrastive Learning with Imbalanced Datasets

Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling

A Theory of Time-Sensitive Language Generation: Sparse Hallucination Beats Mode Collapse

Couple to Control: Joint Initial Noise Design in Diffusion Models

Error whitening: Why Gauss-Newton outperforms Newton

$\varepsilon$-Good Action Identification in Fixed-Budget Monte Carlo Tree Search

Neural Statistical Functions

Epistemic Uncertainty for Test-Time Discovery

Physics-Informed Teacher-Student Ensemble Learning for Traffic State Estimation with a Varying Speed Limit Scenario