Hugging Face Daily Papers
500 articles archived · Visit source ↗ · RSS
-
Hugging Face Daily Papers research 21d ago
OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning
Abstract OmniCap-IF is introduced as the first comprehensive benchmark for evaluating instruction-following capabilities in omni-modal captioning, revealing significant performance disparities and a format-content tradeoff in multi-modal reasoning. Generated by…
5 -
Hugging Face Daily Papers research 21d ago
SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating
Abstract SlimSearcher is a framework that improves efficiency in deep research agents by combining Pareto-efficient trajectory filtering and adaptive reward shaping to reduce computational costs while maintaining accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deep…
15 -
Hugging Face Daily Papers research 21d ago
Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops
Abstract Researchers identify widespread vulnerabilities in agent benchmark verification systems and develop an automated iterative process using LLM agents to create robust verifiers that resist exploitation while maintaining legitimate task performance. Generated by…
20 -
Hugging Face Daily Papers research 21d ago
LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents
Abstract LatentSkill enables efficient deployment of textual skills in agent systems by converting them into LoRA adapters stored in weight space, reducing context overhead while maintaining modularity and composability. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agent systems…
18 -
Hugging Face Daily Papers research 21d ago
Chiaroscuro Attention: Spending Compute in the Dark
Abstract CHIAR-Former uses spectral entropy-based routing to dynamically select between DCT, RBF, and self-attention operators, achieving improved efficiency on large text datasets while maintaining performance through hybrid attention mechanisms. Generated by…
27 -
Hugging Face Daily Papers research 21d ago
Text-to-Image Models Need Less from Text Encoders Than You Think
Abstract Text-to-image models primarily utilize basic text representation aspects like word merging and order rather than complex contextual information encoded in full text embeddings. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Text-to-image models rely on text prompts as…
36 -
Hugging Face Daily Papers research 21d ago
Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text
Abstract Optical reasoning uses images as a standalone reasoning medium for language and multimodal tasks, achieving higher token efficiency than traditional text-based approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Chain-of-Thought (CoT) improves the performance of…
27 -
Hugging Face Daily Papers research 21d ago
Answer Presence Drives RAG Rewriting Gains
Abstract Controlled interventions reveal that gold answer presence in rewritten contexts significantly boosts QA performance, with removal causing substantial F1 drops and injection improving results, while conventional probing methods show fragility to sentinel changes.…
35 -
Hugging Face Daily Papers research 21d ago
SwiftVR: Real-Time One-Step Generative Video Restoration
Abstract SwiftVR enables real-time video restoration on consumer GPUs through efficient attention mechanisms and lightweight autoencoding, achieving high frame rates at 4K resolution. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Real-time video restoration (VR) for live streams…
33 -
Hugging Face Daily Papers research 21d ago
Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short
Abstract Reasoning Arena improves reinforcement learning with verifiable rewards by using trace tournaments and Bradley-Terry models to generate meaningful gradients from non-diverse reward groups, resulting in faster training and better reasoning performance. Generated by…
15 -
Hugging Face Daily Papers research 21d ago
PBSD: Privileged Bayesian Self-Distillation for Long-Horizon Credit Assignment
Abstract Privileged Bayesian Self-Distillation enables fine-grained credit assignment in long-horizon tasks by converting sparse outcome rewards into calibrated turn-level signals through Bayesian evidence scoring and autoregressive decomposition. Generated by…
8 -
Hugging Face Daily Papers research 21d ago
Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory
Abstract SkeMex is a self-evolving framework that enhances medical agents through structured skill memory, improving long-term clinical reasoning by distinguishing useful experiences and governing memory retention based on contextual utility. Generated by…
32 -
Hugging Face Daily Papers research 21d ago
EMMA: Extracting Multiple physical parameters from Multimodal Data
Abstract EMMA is a physics-informed multimodal framework that directly recovers dynamical parameters from raw video, audio, and image data using a Liquid Time-Constant network and physics-constrained loss. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce EMMA, a…
33 -
Hugging Face Daily Papers research 21d ago
Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents
Abstract Research challenges the conventional wisdom in latent visual reasoning by demonstrating that cosine alignment between supervised latents and visual targets negatively correlates with model accuracy, while revealing that answers are decoded downstream from latents rather…
24 -
Hugging Face Daily Papers research 21d ago
Self-Evaluation Is Already There: Eliciting Latent Judge Calibration in Base LLMs with Minimal Data
Abstract Self-Evaluation Elicitation (SEE) method improves model calibration for quality assessment through calibration-coupled reinforcement learning and masked distillation, demonstrating transferable quality evaluation beyond specific judge preferences. Generated by…
37 -
Hugging Face Daily Papers research 21d ago
DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning
Abstract A multi-agent framework for deep research tasks that addresses planning, evidence acquisition, and report synthesis through decoupled components and dynamic optimization mechanisms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deep Research (DR) has emerged as a new…
38 -
Hugging Face Daily Papers research 21d ago
Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill
Abstract Skill-RM presents a unified reward modeling framework that treats reward computation as a structured agentic task, enabling dynamic evidence aggregation and consistent evaluation across diverse applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reward models…
19 -
Hugging Face Daily Papers research 21d ago
PIPE-Cypher: Automatic Enterprise Benchmark Generation for Text-to-Cypher Systems
Abstract A local benchmark-generation pipeline transforms live property graphs and seed queries into balanced NL-to-Cypher datasets for enterprise knowledge graphs, incorporating schema profiling, reverse-query grounding, and execution validation. Generated by…
22 -
Hugging Face Daily Papers research 21d ago
Honest Lying: Understanding Memory Confabulation in Reflexive Agents
Abstract Agents relying on self-generated reflections can store confident but incorrect task interpretations, leading to persistent errors despite environment resets, which is identified through a new metric called Reflection Repetition Rate. Generated by…
10 -
Hugging Face Daily Papers research 21d ago
AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing
Abstract AHA-WAM is an asynchronous world-action model that uses dual Diffusion Transformers to enable efficient long-horizon planning and real-time action execution in robotic manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World-action models have emerged as a…
4 -
Hugging Face Daily Papers research 21d ago
Why Muon Outperforms Adam: A Curvature Perspective
Abstract Muon outperforms Adam in large language model training by reducing curvature penalties through lower normalized directional sharpness, particularly in middle and late training stages, with advantages amplified by data imbalance and heterogeneous curvature. Generated by…
30 -
Hugging Face Daily Papers research 21d ago
OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics
Abstract OmniGameArena presents a unified benchmark for evaluating vision-language model agents in diverse game settings with a reflection-based improvement protocol that tracks performance evolution and skill generalization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
18 -
Hugging Face Daily Papers research 21d ago
Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory
Abstract Large language models can be equipped with formal verification frameworks using dependent-type languages to improve multi-step workflow reliability and performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Equipping Large Language Models (LLMs) to execute reliable…
9 -
Hugging Face Daily Papers research 21d ago
Trajectory-Refined Distillation
Abstract On-policy distillation suffers from prefix failure where dense token-level supervision creates fragmented gradients; trajectory-refined distillation addresses this by correcting student rollouts at the trajectory level before distillation. Generated by…
37 -
Hugging Face Daily Papers research 21d ago
Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses
Abstract Bayesian-Agent presents a framework that treats reusable skills and SOPs as hypotheses for model success, using Bayesian inference to guide agent behavior and improve task performance through posterior-guided harness optimization. Generated by…
10 -
Hugging Face Daily Papers research 21d ago
FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention
Abstract Lookahead Sparse Attention with Neural Memory Indexer reduces GPU memory usage for long-context LLM inference while maintaining accuracy through proactive KV cache management and decoupled training. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Conventional LLMs keep the…
19 -
Hugging Face Daily Papers research 21d ago
Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting
Abstract AI evaluation results suffer from inconsistent reporting across platforms, prompting the development of EvalCards, an operational framework that standardizes benchmark metadata, evaluation data, and model information into a unified, interpretable record with four key…
20 -
Hugging Face Daily Papers research 21d ago
End-to-End Context Compression at Scale
Abstract Encoder-decoder compression techniques are improved through architectural search and large-scale pretraining to create Latent Context Language Models that efficiently handle long contexts with better performance and memory usage compared to traditional KV cache methods.…
25 -
Hugging Face Daily Papers research 21d ago
OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation
Abstract A simulation-data-driven framework for humanoid loco-manipulation that uses 3D generative models to create realistic assets and hierarchical visuomotor policies trained on simulated data achieves better zero-shot performance than real-robot training. Generated by…
24 -
Hugging Face Daily Papers research 21d ago
Echo-Memory: A Controlled Study of Memory in Action World Models
Abstract Controlled study of memory mechanisms in action-conditioned world models reveals that memory structure and capacity significantly impact open-domain return performance beyond simple replay fidelity measures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present…
29 -
Hugging Face Daily Papers research 21d ago
Latent Spatial Memory for Video World Models
Abstract Latent spatial memory for video world models stores 3D scene information directly in diffusion latent space, eliminating pixel-space reconstruction overhead and achieving faster generation with reduced memory usage. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video…
23 -
Hugging Face Daily Papers research 21d ago
On the Geometry of On-Policy Distillation
Abstract On-policy distillation exhibits distinct parameter space dynamics characterized by relaxed off-principal updates and subspace locking, forming a unique geometric pattern separate from supervised fine-tuning and reinforcement learning with verifiable rewards. Generated…
20 -
Hugging Face Daily Papers research 21d ago
Set-Based Transformer for Atmospheric Compensation in Standoff LWIR Hyperspectral Imaging
Abstract A lightweight deep learning framework is presented for atmospheric compensation in passive long-wave infrared hyperspectral imaging, enabling joint estimation of transmittance, atmospheric path radiance, and downwelling spectrum from multi-range radiance measurements.…
36 -
Hugging Face Daily Papers research 21d ago
Human Psychometric Questionnaires Mischaracterize LLM Behavior
Abstract Human psychometric questionnaires fail to reliably predict LLM behavior in real-world interactions, while generation-based profiling offers superior accuracy for understanding model responses to everyday user queries. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We…
38 -
Hugging Face Daily Papers research 21d ago
CoVEBench: Can Video Editing Models Handle Complex Instructions?
Abstract A new benchmark called CoVEBench is introduced to evaluate compositional video editing capabilities, addressing limitations of existing models in handling complex, multi-step editing tasks while preserving spatiotemporal content. Generated by…
19 -
Hugging Face Daily Papers research 21d ago
SWE-Explore: Benchmarking How Coding Agents Explore Repositories
Abstract SWE-Explore introduces a benchmark for evaluating coding agents' repository exploration capabilities by requiring ranked lists of relevant code regions within line budgets, demonstrating that agentic exploration outperforms traditional retrieval methods. Generated by…
11 -
Hugging Face Daily Papers research 21d ago
SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks
Abstract SpatialWorld presents a unified benchmark for evaluating interactive spatial understanding in multimodal agents through diverse real-world tasks with partial observability and text-based actions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spatial reasoning is a…
7 -
Hugging Face Daily Papers research 21d ago
Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models
Abstract Imaginative Perception Tokens (IPT) enhance vision-language models' spatial reasoning by providing intermediate perceptual representations that externalize what the model would perceive from alternative viewpoints, outperforming traditional text-based reasoning methods.…
22 -
Hugging Face Daily Papers research 21d ago
A Cookbook of 3D Vision: Data, Learning Paradigms, and Application
Abstract 3D vision research is organized through a taxonomy connecting geometric representations, datasets, learning frameworks, and applications across reconstruction, generation, and video modeling tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct 3D vision has rapidly…
32 -
Hugging Face Daily Papers research 21d ago
UnpredictaBench: A Benchmark for Evaluating Distributional Randomness in LLMs
Abstract UnpredictaBench evaluates large language models' capacity to sample from target distributions, revealing significant gaps in their ability to simulate unpredictable systems despite recent advances in output diversity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We…
7 -
Hugging Face Daily Papers research 21d ago
CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning
Abstract Contrastive Reflection (CORE) improves language model reasoning by analyzing differences between successful and unsuccessful attempts to generate concise, interpretable insights that enable faster and more efficient self-improvement compared to traditional parametric…
21 -
Hugging Face Daily Papers research 21d ago
Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation
Abstract Autoregressive language models are transformed into diffusion language models through on-policy distillation that eliminates train-inference mismatch and reduces training token requirements. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We study the transformation of…
18 -
Hugging Face Daily Papers research 21d ago
Measuring Model Robustness via Fisher Information: Spectral Bounds, Theoretical Guarantees, and Practical Algorithms
Abstract A novel attack-agnostic robustness metric based on Fisher Information Matrix spectral norm is proposed, providing theoretical bounds and scalable evaluation methods for deep neural network robustness assessment. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The…
12 -
Hugging Face Daily Papers research 21d ago
Reinforcement Learning from Rich Feedback with Distributional DAgger
Abstract Forward cross-entropy objective with distributional imitation learning enables monotonic policy improvement and better performance in reasoning tasks compared to traditional reinforcement learning methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reasoning models…
15 -
Hugging Face Daily Papers research 21d ago
SPACENUM: Revisiting Spatial Numerical Understanding in VLMs
Abstract Vision-language models struggle to genuinely understand spatial numerical concepts, relying instead on shallow visual cues rather than developing robust coordinate-aware representations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision-Language Models (VLMs) are…
19 -
Hugging Face Daily Papers research 21d ago
Augmenting Attention with Exponentially Decaying Memory Improves Query-Aware KV Sparsity
Abstract RAT+ memory module enhances query-aware sparse inference methods by improving accuracy in long-context language models across various sparse budgets. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Efficient inference is critical for long-context language models, where…
28 -
Hugging Face Daily Papers research 21d ago
Towards Retrieving Interaction Spaces for Agentic Search
Abstract RISE framework constructs bounded interaction spaces for agentic search by combining BM25 retrieval with preprocessed document indexing to enable efficient corpus exploration while maintaining high accuracy at scale. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
18 -
Hugging Face Daily Papers research 21d ago
LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models
Abstract LayerRoute is a lightweight adapter that selectively skips transformer blocks during inference based on input type, achieving compute savings while maintaining or improving model quality through gated routing and LoRA adaptation. Generated by…
19 -
Hugging Face Daily Papers research 22d ago
Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation
Abstract Interactive ASR framework integrates semantic correction and reasoning-based editing to reduce semantic errors through multi-turn refinement, validated by a new sentence-level semantic error rate metric and interactive simulation system. Generated by…
35 -
Hugging Face Daily Papers research 22d ago
GENEB: Why Genomic Models Are Hard to Compare
Abstract GENEB presents a comprehensive benchmark for evaluating genomic foundation models across diverse tasks and architectures under a unified protocol. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Progress in genomic foundation models is difficult to assess due to fragmented…
25