Hugging Face Daily Papers

500 articles archived · Visit source ↗ · RSS

Hugging Face Daily Papers research 17d ago

Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior

Abstract Psychometric assessments of LLM behavior reveal that specific behavioral frameworks like Theory of Planned Behavior show better coherence with actual responses than broad personality traits, particularly within shared conversations. Generated by…

6
Hugging Face Daily Papers research 17d ago

See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents

Abstract Heterogeneous multi-agent systems can effectively transfer knowledge through aligned KV-cache communication, achieving better performance than text-based methods with reduced computational costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-agent systems…

21
Hugging Face Daily Papers research 17d ago

Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

Abstract TRACE is a skill-layer pipeline that mines user corrections to create runtime checks, significantly reducing preference violations in interactive LLM agents. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Interactive LLM agents are becoming part of daily work, but they do…

30
Hugging Face Daily Papers research 17d ago

WebChallenger: A Reliable and Efficient Generalist Web Agent

Abstract WebChallenger presents a web agent framework that improves autonomous navigation through structured page representation and cognitive-inspired mechanisms, achieving high performance with open-weight models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Autonomous web…

15
Hugging Face Daily Papers research 17d ago

The Cold-Start Safety Gap in LLM Agents

Abstract Tool-calling language model agents exhibit improved safety after initial interactions, with a systematic benchmark demonstrating enhanced security through prior task completion. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Are tool-calling LLM agents equally safe…

37
Hugging Face Daily Papers research 17d ago

Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models

Abstract Compute-aware evaluation framework using FLOPs and risk-compute curves reveals non-monotonic effects of alignment training and varying attack costs across different harm categories. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Adversarial robustness evaluations of large…

6
Hugging Face Daily Papers research 17d ago

ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs

Abstract Parametric tool retrieval models show reduced performance and understanding when evaluated with realistic ambiguous queries compared to standard benchmarks, revealing a dissociation between knowledge retrieval and true tool comprehension. Generated by…

27
Hugging Face Daily Papers research 17d ago

A Stationary (and Therefore Compatible) Representation is All You Need

Abstract Stationary representations learned through d-Simplex fixed classifiers ensure model compatibility during sequential fine-tuning and updates, enabling continuous retrieval services without reprocessing. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Learning compatible…

25
Hugging Face Daily Papers research 18d ago

WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation

Abstract WEAVER is a multi-view world model architecture that achieves high fidelity, consistency, and efficiency in robotic manipulation tasks through flow-matching loss and demonstrates superior performance in policy evaluation, improvement, and test-time planning. Generated…

27
Hugging Face Daily Papers research 18d ago

MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

Abstract MaxProof is a test-time scaling framework that enhances mathematical proof generation by combining multiple proof-oriented capabilities and using population-level search with tournament selection to achieve competitive performance on high-level mathematical…

25
Hugging Face Daily Papers research 18d ago

Surflo: Consistent 3D Surface Flow Model with Global State

Abstract Surflo compresses unposed RGB views into latent tokens and decodes 3D surface points through flow matching, enabling flexible resolution output and efficient processing compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Geometry is invariant to…

35
Hugging Face Daily Papers research 18d ago

ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages

Abstract ArogyaBodha dataset and ArogyaSutra framework enhance multilingual medical reasoning in low-resource settings through diverse data integration and actor-critic multi-agent reasoning. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal Large Language Models (MLLMs)…

30
Hugging Face Daily Papers research 18d ago

Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback

Abstract Structured Defect Grounding (SDG) addresses limitations in text-to-image model diagnosis by modeling defects as structured sets and using vision-language models for detection and reward-based alignment. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Despite generating…

22
Hugging Face Daily Papers research 18d ago

Revisiting Articulated Parts Perception in Robot Manipulation

Abstract A new geometric representation called Geometric Primary Structure (GPS) is introduced for articulated parts perception, enabling efficient data collection through VR annotation and achieving high manipulation success rates without fine-tuning. Generated by…

27
Hugging Face Daily Papers research 18d ago

From 2D Grids to 1D Tokens: Reforming Shared Representations for Multimodal Image Fusion

Abstract A multimodal image fusion approach uses a 1D token interface from a pretrained image tokenizer to enhance global appearance coherence while preserving local details through selective token editing. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal image fusion…

33
Hugging Face Daily Papers research 18d ago

HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers

Abstract HYDRA-X presents a unified multimodal model that integrates image and video tokenization within a single Vision Transformer, addressing spatiotemporal reconstruction and semantic awareness through causal temporal attention and hierarchical compression. Generated by…

32
Hugging Face Daily Papers research 18d ago

VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

Abstract VIA-SD introduces a multi-tier speculative decoding framework that uses intra-model routing to reduce verification costs by employing slim submodels for medium-confidence token validation, achieving significant speedups over traditional approaches. Generated by…

32
Hugging Face Daily Papers research 18d ago

TreeSeeker: Tree-Structured Trial, Error, and Return in Deep Search

Abstract TreeSeeker is an inference-time framework that uses tree-structured search with branch-and-return control to manage exploration and exploitation in deep search tasks, improving performance through systematic trial-and-error decision making. Generated by…

23
Hugging Face Daily Papers research 18d ago

Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering

Abstract Flash-GMM introduces an efficient fused Triton kernel for Gaussian Mixture Models that achieves significant speedup and enables processing much larger datasets on a single GPU. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present Flash-GMM, a fused Triton kernel for…

18
Hugging Face Daily Papers research 18d ago

Leveraging Morphology for Historical Script Metrological Analysis

Abstract A transformer-based architecture with prototype learning enables scalable paleographic measurements from historical documents using only line-level transcriptions, demonstrating its effectiveness on a 160-page codex with minimal training data requirements. Generated by…

37
Hugging Face Daily Papers research 18d ago

PianoKontext: Expressive Performance Rendering from Deadpan Context

Abstract PianoKontext generates variable-length piano performances by aligning MIDI scores with audio in latent space using DTW and DiT blocks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Expressive performance rendering (EPR) aims to generate realistic performances constrained…

12
Hugging Face Daily Papers research 18d ago

IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoder

Abstract Representation autoencoders using deep learning frameworks can improve image reconstruction quality by combining shallow and deep visual feature representations for better semantic richness and visual fidelity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Built on…

31
Hugging Face Daily Papers research 18d ago

High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

Abstract A 2-step image generation model is developed through distillation from an 8-step teacher using distribution-aligned adversarial learning, step-decoupled parameterization, and end-to-end training with iterative regularization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

33
Hugging Face Daily Papers research 18d ago

MiniMax Sparse Attention

Abstract MiniMax Sparse Attention enables efficient processing of ultra-long contexts in large language models through blockwise sparsity and optimized GPU execution, achieving significant speedups while maintaining performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

20
Hugging Face Daily Papers research 18d ago

VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

Abstract VideoMDM trains 3D human motion priors from 2D poses using a diffusion framework with 2D reprojection loss and 3D motion regularizers, achieving near-3D supervised performance without requiring 3D ground truth. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce…

5
Hugging Face Daily Papers research 18d ago

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

Abstract LabVLA, a vision-language-action model trained with a two-stage approach combining action token pretraining and flow matching, demonstrates superior performance on laboratory automation tasks through simulated data generation and robot-specific learning. Generated by…

18
Hugging Face Daily Papers research 18d ago

Visual Para-Thinker++: A Single-Policy Multi-Agent Framework for Visual Reasoning

Abstract A multi-agent framework with shared MLLM policy and role-specific training methods improves visual reasoning by reducing hallucinations and enabling efficient parallel processing. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Visual reasoning requires integrating…

6
Hugging Face Daily Papers research 18d ago

SG-OPD: Sign-Gated On-Policy Distillation via Sign-Consistency Gating and Phased Teacher Sampling

Abstract Sign-Gated On-Policy Distillation improves upon standard on-policy distillation by incorporating a binary verifier to filter teacher signals, resulting in better performance on mathematical reasoning tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct On-policy…

38
Hugging Face Daily Papers research 18d ago

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

Abstract Robust-U1 enhances multimodal large language models' robustness against visual corruptions through self-recovery capabilities that improve both visual quality and reasoning performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal Large Language Models…

4
Hugging Face Daily Papers research 18d ago

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

Abstract EvoBrowseComp is an evolving benchmark with 800 contamination-free questions synthesized through a three-agent framework that ensures temporal freshness and prevents parametric memorization in search agent evaluation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Search…

26
Hugging Face Daily Papers research 18d ago

MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training

Abstract Token-subset representation alignment method called MaskAlign improves diffusion transformer training by reducing reliance on complete token sets and maintaining stable alignment behavior under perturbations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Representation…

12
Hugging Face Daily Papers research 18d ago

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Abstract EvoArena benchmark and EvoMem memory paradigm address the challenge of dynamic environments in LLM agents by modeling progressive updates and structured memory evolution, showing improved performance on evolving tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large…

5
Hugging Face Daily Papers research 18d ago

MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning

Abstract A Gymnasium-compatible multi-drone simulation environment built on MuJoCo physics engine that supports flexible physics models, action interfaces, and observation spaces for reinforcement learning applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic…

35
Hugging Face Daily Papers research 18d ago

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

Abstract A switchable latent reasoning framework uses explicit boundary tokens to enable trainable and interpretable latent reasoning through recurrent hidden states. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Latent chain-of-thought compresses reasoning by replacing visible…

24
Hugging Face Daily Papers research 18d ago

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Abstract Evoflux enables compact language models to execute tool workflows more reliably by using evolutionary search to repair failed plans during inference, significantly improving execution feasibility compared to traditional fine-tuning methods. Generated by…

20
Hugging Face Daily Papers research 18d ago

WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces

Abstract WeaveBench presents a comprehensive benchmark for evaluating computer-use agents across multiple interfaces, revealing significant challenges in long-horizon task orchestration and highlighting the limitations of traditional performance assessment methods. Generated by…

38
Hugging Face Daily Papers research 18d ago

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

Abstract Environment engineering enhances autonomous scientific discovery by designing structured agent environments that optimize behaviors like exploration and collaboration while mitigating issues such as reward hacking and human oversight friction, as demonstrated by the…

35
Hugging Face Daily Papers research 18d ago

InterleaveThinker: Reinforcing Agentic Interleaved Generation

Abstract InterleaveThinker enables interleaved generation capabilities for image generators through a multi-agent pipeline with planner and critic agents, achieving performance comparable to state-of-the-art models while enhancing reasoning benchmarks. Generated by…

36
Hugging Face Daily Papers research 18d ago

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

Abstract A framework for creating shortcut-resistant training data for deep search agents by identifying and mitigating four shortcut risks in data synthesis processes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Training deep search agents requires verifiable questions whose…

11
Hugging Face Daily Papers research 18d ago

N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization

Abstract N-GRPO, a novel exploration strategy within GRPO framework, enhances mathematical reasoning in large language models through semantic neighbor mixing that maintains semantic consistency while injecting diversity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The success…

27
Hugging Face Daily Papers research 18d ago

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Abstract SpatialClaw is a training-free framework that uses code as an action interface to enable flexible, stateful spatial reasoning in vision-language models, achieving superior performance across diverse 3D/4D spatial reasoning tasks. Generated by…

36
Hugging Face Daily Papers research 18d ago

MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold

Abstract MoVerse generates real-time interactive video from single images by creating 360° panoramas and 3D Gaussian scaffolds, enabling efficient rendering through diffusion-based techniques. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present MoVerse, a real-time video…

22
Hugging Face Daily Papers research 18d ago

HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness

Abstract Learnable harness controller called HarnessBridge is introduced to parameterize agent-environment interfaces through bidirectional projections, achieving performance comparable to specialized harnesses with reduced computational overhead. Generated by…

21
Hugging Face Daily Papers research 18d ago

Can Generalist Agents Automate Data Curation?

Abstract Automated data curation using generalist coding agents shows promise but requires structured scaffolding to achieve superior performance compared to traditional methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Curating training data is among the most consequential…

33
Hugging Face Daily Papers research 18d ago

Building Social World Models with Large Language Models

Abstract Social World Model framework captures evolution of social beliefs in response to events through temporal pattern mining and evidence lower bound optimization without explicit human annotations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Understanding and predicting…

33
Hugging Face Daily Papers research 18d ago

Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs

Abstract ModSleuth is an agentic system that recursively reconstructs large-scale dependency graphs for LLM development by analyzing public artifacts and resolving inconsistencies in documentation and artifact identities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern LLM…

6
Hugging Face Daily Papers research 18d ago

ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction

Abstract ReVision improves computer-use agent efficiency by removing redundant visual patches from consecutive screenshots while preserving spatial structure, reducing token usage by 46% and improving success rates. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Computer-use…

10
Hugging Face Daily Papers research 18d ago

SparDA: Sparse Decoupled Attention for Efficient Long-Context LLM Inference

Abstract SparDA is a decoupled sparse attention architecture that improves long-context LLM inference by reducing KV cache bottlenecks and attention complexity through aForecast projection for lookahead selection. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse attention…

23
Hugging Face Daily Papers research 18d ago

APEX: A Network-Native Time-Series Foundation Model for Forecasting and Anomaly Detection for Wireless Edge Operations

Abstract Network-native transformer model APEX demonstrates superior forecasting performance for wireless network telemetry compared to existing foundation models and traditional methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generic time-series foundation models transfer…

38
Hugging Face Daily Papers research 18d ago

Towards Diverse Scientific Hypothesis Search with Large Language Models

Abstract Evolutionary framework for hypothesis generation that improves diversity and quality through multi-temperature sampling and information exchange across search levels. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models (LLMs) are on the rise for…

14

Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior

See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents

Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

WebChallenger: A Reliable and Efficient Generalist Web Agent

The Cold-Start Safety Gap in LLM Agents

Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models

ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs

A Stationary (and Therefore Compatible) Representation is All You Need

WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation

MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

Surflo: Consistent 3D Surface Flow Model with Global State

ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages

Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback

Revisiting Articulated Parts Perception in Robot Manipulation

From 2D Grids to 1D Tokens: Reforming Shared Representations for Multimodal Image Fusion

HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers

VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

TreeSeeker: Tree-Structured Trial, Error, and Return in Deep Search

Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering

Leveraging Morphology for Historical Script Metrological Analysis

PianoKontext: Expressive Performance Rendering from Deadpan Context

IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoder

High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

MiniMax Sparse Attention

VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

Visual Para-Thinker++: A Single-Policy Multi-Agent Framework for Visual Reasoning

SG-OPD: Sign-Gated On-Policy Distillation via Sign-Consistency Gating and Phased Teacher Sampling

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

InterleaveThinker: Reinforcing Agentic Interleaved Generation

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold

HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness

Can Generalist Agents Automate Data Curation?

Building Social World Models with Large Language Models

Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs

ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction

SparDA: Sparse Decoupled Attention for Efficient Long-Context LLM Inference

APEX: A Network-Native Time-Series Foundation Model for Forecasting and Anomaly Detection for Wireless Edge Operations

Towards Diverse Scientific Hypothesis Search with Large Language Models