Hugging Face Daily Papers
500 articles archived · Visit source ↗ · RSS
-
Hugging Face Daily Papers research 17d ago
Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior
Abstract Psychometric assessments of LLM behavior reveal that specific behavioral frameworks like Theory of Planned Behavior show better coherence with actual responses than broad personality traits, particularly within shared conversations. Generated by…
6 -
Hugging Face Daily Papers research 17d ago
See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents
Abstract Heterogeneous multi-agent systems can effectively transfer knowledge through aligned KV-cache communication, achieving better performance than text-based methods with reduced computational costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-agent systems…
21 -
Hugging Face Daily Papers research 17d ago
Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents
Abstract TRACE is a skill-layer pipeline that mines user corrections to create runtime checks, significantly reducing preference violations in interactive LLM agents. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Interactive LLM agents are becoming part of daily work, but they do…
30 -
Hugging Face Daily Papers research 17d ago
WebChallenger: A Reliable and Efficient Generalist Web Agent
Abstract WebChallenger presents a web agent framework that improves autonomous navigation through structured page representation and cognitive-inspired mechanisms, achieving high performance with open-weight models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Autonomous web…
15 -
Hugging Face Daily Papers research 17d ago
The Cold-Start Safety Gap in LLM Agents
Abstract Tool-calling language model agents exhibit improved safety after initial interactions, with a systematic benchmark demonstrating enhanced security through prior task completion. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Are tool-calling LLM agents equally safe…
37 -
Hugging Face Daily Papers research 17d ago
Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models
Abstract Compute-aware evaluation framework using FLOPs and risk-compute curves reveals non-monotonic effects of alignment training and varying attack costs across different harm categories. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Adversarial robustness evaluations of large…
6 -
Hugging Face Daily Papers research 17d ago
ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs
Abstract Parametric tool retrieval models show reduced performance and understanding when evaluated with realistic ambiguous queries compared to standard benchmarks, revealing a dissociation between knowledge retrieval and true tool comprehension. Generated by…
27 -
Hugging Face Daily Papers research 17d ago
A Stationary (and Therefore Compatible) Representation is All You Need
Abstract Stationary representations learned through d-Simplex fixed classifiers ensure model compatibility during sequential fine-tuning and updates, enabling continuous retrieval services without reprocessing. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Learning compatible…
25 -
Hugging Face Daily Papers research 18d ago
WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation
Abstract WEAVER is a multi-view world model architecture that achieves high fidelity, consistency, and efficiency in robotic manipulation tasks through flow-matching loss and demonstrates superior performance in policy evaluation, improvement, and test-time planning. Generated…
27 -
Hugging Face Daily Papers research 18d ago
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling
Abstract MaxProof is a test-time scaling framework that enhances mathematical proof generation by combining multiple proof-oriented capabilities and using population-level search with tournament selection to achieve competitive performance on high-level mathematical…
25 -
Hugging Face Daily Papers research 18d ago
Surflo: Consistent 3D Surface Flow Model with Global State
Abstract Surflo compresses unposed RGB views into latent tokens and decodes 3D surface points through flow matching, enabling flexible resolution output and efficient processing compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Geometry is invariant to…
35 -
Hugging Face Daily Papers research 18d ago
ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages
Abstract ArogyaBodha dataset and ArogyaSutra framework enhance multilingual medical reasoning in low-resource settings through diverse data integration and actor-critic multi-agent reasoning. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal Large Language Models (MLLMs)…
30 -
Hugging Face Daily Papers research 18d ago
Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback
Abstract Structured Defect Grounding (SDG) addresses limitations in text-to-image model diagnosis by modeling defects as structured sets and using vision-language models for detection and reward-based alignment. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Despite generating…
22 -
Hugging Face Daily Papers research 18d ago
Revisiting Articulated Parts Perception in Robot Manipulation
Abstract A new geometric representation called Geometric Primary Structure (GPS) is introduced for articulated parts perception, enabling efficient data collection through VR annotation and achieving high manipulation success rates without fine-tuning. Generated by…
27 -
Hugging Face Daily Papers research 18d ago
From 2D Grids to 1D Tokens: Reforming Shared Representations for Multimodal Image Fusion
Abstract A multimodal image fusion approach uses a 1D token interface from a pretrained image tokenizer to enhance global appearance coherence while preserving local details through selective token editing. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal image fusion…
33 -
Hugging Face Daily Papers research 18d ago
HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers
Abstract HYDRA-X presents a unified multimodal model that integrates image and video tokenization within a single Vision Transformer, addressing spatiotemporal reconstruction and semantic awareness through causal temporal attention and hierarchical compression. Generated by…
32 -
Hugging Face Daily Papers research 18d ago
VIA-SD: Verification via Intra-Model Routing for Speculative Decoding
Abstract VIA-SD introduces a multi-tier speculative decoding framework that uses intra-model routing to reduce verification costs by employing slim submodels for medium-confidence token validation, achieving significant speedups over traditional approaches. Generated by…
32 -
Hugging Face Daily Papers research 18d ago
TreeSeeker: Tree-Structured Trial, Error, and Return in Deep Search
Abstract TreeSeeker is an inference-time framework that uses tree-structured search with branch-and-return control to manage exploration and exploitation in deep search tasks, improving performance through systematic trial-and-error decision making. Generated by…
23 -
Hugging Face Daily Papers research 18d ago
Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering
Abstract Flash-GMM introduces an efficient fused Triton kernel for Gaussian Mixture Models that achieves significant speedup and enables processing much larger datasets on a single GPU. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present Flash-GMM, a fused Triton kernel for…
18 -
Hugging Face Daily Papers research 18d ago
Leveraging Morphology for Historical Script Metrological Analysis
Abstract A transformer-based architecture with prototype learning enables scalable paleographic measurements from historical documents using only line-level transcriptions, demonstrating its effectiveness on a 160-page codex with minimal training data requirements. Generated by…
37 -
Hugging Face Daily Papers research 18d ago
PianoKontext: Expressive Performance Rendering from Deadpan Context
Abstract PianoKontext generates variable-length piano performances by aligning MIDI scores with audio in latent space using DTW and DiT blocks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Expressive performance rendering (EPR) aims to generate realistic performances constrained…
12 -
Hugging Face Daily Papers research 18d ago
IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoder
Abstract Representation autoencoders using deep learning frameworks can improve image reconstruction quality by combining shallow and deep visual feature representations for better semantic richness and visual fidelity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Built on…
31 -
Hugging Face Daily Papers research 18d ago
High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation
Abstract A 2-step image generation model is developed through distillation from an 8-step teacher using distribution-aligned adversarial learning, step-decoupled parameterization, and end-to-end training with iterative regularization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
33 -
Hugging Face Daily Papers research 18d ago
MiniMax Sparse Attention
Abstract MiniMax Sparse Attention enables efficient processing of ultra-long contexts in large language models through blockwise sparsity and optimized GPU execution, achieving significant speedups while maintaining performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
20 -
Hugging Face Daily Papers research 18d ago
VideoMDM: Towards 3D Human Motion Generation From 2D Supervision
Abstract VideoMDM trains 3D human motion priors from 2D poses using a diffusion framework with 2D reprojection loss and 3D motion regularizers, achieving near-3D supervised performance without requiring 3D ground truth. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce…
5 -
Hugging Face Daily Papers research 18d ago
LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories
Abstract LabVLA, a vision-language-action model trained with a two-stage approach combining action token pretraining and flow matching, demonstrates superior performance on laboratory automation tasks through simulated data generation and robot-specific learning. Generated by…
18 -
Hugging Face Daily Papers research 18d ago
Visual Para-Thinker++: A Single-Policy Multi-Agent Framework for Visual Reasoning
Abstract A multi-agent framework with shared MLLM policy and role-specific training methods improves visual reasoning by reducing hallucinations and enabling efficient parallel processing. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Visual reasoning requires integrating…
6 -
Hugging Face Daily Papers research 18d ago
SG-OPD: Sign-Gated On-Policy Distillation via Sign-Consistency Gating and Phased Teacher Sampling
Abstract Sign-Gated On-Policy Distillation improves upon standard on-policy distillation by incorporating a binary verifier to filter teacher signals, resulting in better performance on mathematical reasoning tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct On-policy…
38 -
Hugging Face Daily Papers research 18d ago
Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?
Abstract Robust-U1 enhances multimodal large language models' robustness against visual corruptions through self-recovery capabilities that improve both visual quality and reasoning performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal Large Language Models…
4 -
Hugging Face Daily Papers research 18d ago
EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge
Abstract EvoBrowseComp is an evolving benchmark with 800 contamination-free questions synthesized through a three-agent framework that ensures temporal freshness and prevents parametric memorization in search agent evaluation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Search…
26 -
Hugging Face Daily Papers research 18d ago
MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training
Abstract Token-subset representation alignment method called MaskAlign improves diffusion transformer training by reducing reliance on complete token sets and maintaining stable alignment behavior under perturbations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Representation…
12 -
Hugging Face Daily Papers research 18d ago
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments
Abstract EvoArena benchmark and EvoMem memory paradigm address the challenge of dynamic environments in LLM agents by modeling progressive updates and structured memory evolution, showing improved performance on evolving tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large…
5 -
Hugging Face Daily Papers research 18d ago
MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning
Abstract A Gymnasium-compatible multi-drone simulation environment built on MuJoCo physics engine that supports flexible physics models, action interfaces, and observation spaces for reinforcement learning applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic…
35 -
Hugging Face Daily Papers research 18d ago
Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning
Abstract A switchable latent reasoning framework uses explicit boundary tokens to enable trainable and interpretable latent reasoning through recurrent hidden states. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Latent chain-of-thought compresses reasoning by replacing visible…
24 -
Hugging Face Daily Papers research 18d ago
Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents
Abstract Evoflux enables compact language models to execute tool workflows more reliably by using evolutionary search to repair failed plans during inference, significantly improving execution feasibility compared to traditional fine-tuning methods. Generated by…
20 -
Hugging Face Daily Papers research 18d ago
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces
Abstract WeaveBench presents a comprehensive benchmark for evaluating computer-use agents across multiple interfaces, revealing significant challenges in long-horizon task orchestration and highlighting the limitations of traditional performance assessment methods. Generated by…
38 -
Hugging Face Daily Papers research 18d ago
EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery
Abstract Environment engineering enhances autonomous scientific discovery by designing structured agent environments that optimize behaviors like exploration and collaboration while mitigating issues such as reward hacking and human oversight friction, as demonstrated by the…
35 -
Hugging Face Daily Papers research 18d ago
InterleaveThinker: Reinforcing Agentic Interleaved Generation
Abstract InterleaveThinker enables interleaved generation capabilities for image generators through a multi-agent pipeline with planner and critic agents, achieving performance comparable to state-of-the-art models while enhancing reasoning benchmarks. Generated by…
36 -
Hugging Face Daily Papers research 18d ago
FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents
Abstract A framework for creating shortcut-resistant training data for deep search agents by identifying and mitigating four shortcut risks in data synthesis processes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Training deep search agents requires verifiable questions whose…
11 -
Hugging Face Daily Papers research 18d ago
N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization
Abstract N-GRPO, a novel exploration strategy within GRPO framework, enhances mathematical reasoning in large language models through semantic neighbor mixing that maintains semantic consistency while injecting diversity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The success…
27 -
Hugging Face Daily Papers research 18d ago
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning
Abstract SpatialClaw is a training-free framework that uses code as an action interface to enable flexible, stateful spatial reasoning in vision-language models, achieving superior performance across diverse 3D/4D spatial reasoning tasks. Generated by…
36 -
Hugging Face Daily Papers research 18d ago
MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold
Abstract MoVerse generates real-time interactive video from single images by creating 360° panoramas and 3D Gaussian scaffolds, enabling efficient rendering through diffusion-based techniques. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present MoVerse, a real-time video…
22 -
Hugging Face Daily Papers research 18d ago
HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness
Abstract Learnable harness controller called HarnessBridge is introduced to parameterize agent-environment interfaces through bidirectional projections, achieving performance comparable to specialized harnesses with reduced computational overhead. Generated by…
21 -
Hugging Face Daily Papers research 18d ago
Can Generalist Agents Automate Data Curation?
Abstract Automated data curation using generalist coding agents shows promise but requires structured scaffolding to achieve superior performance compared to traditional methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Curating training data is among the most consequential…
33 -
Hugging Face Daily Papers research 18d ago
Building Social World Models with Large Language Models
Abstract Social World Model framework captures evolution of social beliefs in response to events through temporal pattern mining and evidence lower bound optimization without explicit human annotations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Understanding and predicting…
33 -
Hugging Face Daily Papers research 18d ago
Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs
Abstract ModSleuth is an agentic system that recursively reconstructs large-scale dependency graphs for LLM development by analyzing public artifacts and resolving inconsistencies in documentation and artifact identities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern LLM…
6 -
Hugging Face Daily Papers research 18d ago
ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction
Abstract ReVision improves computer-use agent efficiency by removing redundant visual patches from consecutive screenshots while preserving spatial structure, reducing token usage by 46% and improving success rates. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Computer-use…
10 -
Hugging Face Daily Papers research 18d ago
SparDA: Sparse Decoupled Attention for Efficient Long-Context LLM Inference
Abstract SparDA is a decoupled sparse attention architecture that improves long-context LLM inference by reducing KV cache bottlenecks and attention complexity through aForecast projection for lookahead selection. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse attention…
23 -
Hugging Face Daily Papers research 18d ago
APEX: A Network-Native Time-Series Foundation Model for Forecasting and Anomaly Detection for Wireless Edge Operations
Abstract Network-native transformer model APEX demonstrates superior forecasting performance for wireless network telemetry compared to existing foundation models and traditional methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generic time-series foundation models transfer…
38 -
Hugging Face Daily Papers research 18d ago
Towards Diverse Scientific Hypothesis Search with Large Language Models
Abstract Evolutionary framework for hypothesis generation that improves diversity and quality through multi-temperature sampling and information exchange across search levels. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models (LLMs) are on the rise for…
14