Hugging Face Daily Papers
500 articles archived · Visit source ↗ · RSS
-
Hugging Face Daily Papers research 20d ago
U-TTT: Towards Generalizable PET Image Denoising via Test-Time Training
Abstract A novel U-shaped deep learning model with test-time training layers and dual-domain adaptation mechanisms achieves robust PET image denoising under distribution shifts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing deep learning models for Positron Emission…
32 -
Hugging Face Daily Papers research 20d ago
Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution
Abstract Role-Agent framework enables LLM agents to function as both agent and environment through bootstrapped co-evolution, improving performance via environment-aware reasoning and targeted practice. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Although Large Language Model…
33 -
Hugging Face Daily Papers research 20d ago
Late-Layer Fusion is Enough: Dual-Path Vision Token Routing for Multimodal Large Language Models under Visual Saturation
Abstract Research reveals that vision and text tokens in multimodal models evolve asynchronously, leading to inefficient computation; a new asymmetric routing framework reduces visual processing overhead while maintaining performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
9 -
Hugging Face Daily Papers research 20d ago
MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism
Abstract MemDreamer addresses long-video understanding challenges by decoupling perception and reasoning through hierarchical graph memory and agentic exploration, achieving state-of-the-art performance with reduced computational overhead. Generated by…
33 -
Hugging Face Daily Papers research 20d ago
Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields
Abstract Current AI agents struggle with long-horizon professional GUI workflows, achieving low success rates due to issues with workflow consistency and domain-specific software understanding. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent years have witnessed the rapid…
15 -
Hugging Face Daily Papers research 20d ago
Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts
Abstract Retrospective Harness Optimization (RHO) is a self-supervised method that improves AI agent performance by optimizing agent harness using only past trajectories through diverse task selection, parallel re-solving, and self-validation techniques. Generated by…
8 -
Hugging Face Daily Papers research 20d ago
Lip Forcing: Few-Step Autoregressive Diffusion for Real-time Lip Synchronization
Abstract Autoregressive diffusion method for video-to-video lip synchronization achieves real-time performance through distillation and optimized inference schedules. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Diffusion-based lip synchronization models achieve strong visual…
29 -
Hugging Face Daily Papers research 20d ago
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models
Abstract Flow-DPPO replaces ratio clipping with divergence proximal constraints in flow matching models, improving training stability and multi-objective optimization through exact KL divergence computation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent work has…
34 -
Hugging Face Daily Papers research 20d ago
Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories
Abstract A multi-agent framework automates data journalism by generating evidence-grounded, multimodal news stories while maintaining transparency and verifiability. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Data tells stories that shape society; the data journalist's job is…
10 -
Hugging Face Daily Papers research 20d ago
WorldOlympiad: Can Your World Model Survive a Triathlon?
Abstract WorldOlympiad presents a comprehensive benchmark for evaluating video-based world models across physical faithfulness, geometric consistency, and interaction fidelity, revealing significant gaps in current generative models' capabilities. Generated by…
13 -
Hugging Face Daily Papers research 20d ago
Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It
Abstract Chain-of-thought supervised fine-tuning degrades long-context recall in hybrid linear-attention models by biasing attention gradients toward short-range patterns, but a training-free method called QK-Restore can restore long-context capabilities by reverting query-key…
8 -
Hugging Face Daily Papers research 20d ago
Rethinking the Divergence Regularization in LLM RL
Abstract DRPO improves LLM reinforcement learning stability by replacing hard masks with smooth regularization that provides continuous gradient corrections beyond trust-region boundaries. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement learning (RL) has become a key…
29 -
Hugging Face Daily Papers research 20d ago
EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents
Abstract EEVEE is a novel test-time prompt learning framework for LLM agents that handles heterogeneous data streams through task clustering and co-evolving router-prompt optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct In this paper, we propose EEVEE, the first…
6 -
Hugging Face Daily Papers research 20d ago
SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research
Abstract A large language model trained on synthesized delegation intelligence achieves superior performance on long-horizon research tasks through task decomposition and subagent coordination. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models are increasingly…
12 -
Hugging Face Daily Papers research 20d ago
One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA
Abstract Latent Memory introduces a compressed representation approach for external memory in question answering, reducing token consumption and storage requirements while maintaining competitive performance across text-only and multimodal benchmarks. Generated by…
28 -
Hugging Face Daily Papers research 20d ago
Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking
Abstract Struct-Searcher introduces a belief revision theory-based structural agentic workflow for multimodal information seeking that improves accuracy over existing vision-language models and deep research agents. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deep research…
17 -
Hugging Face Daily Papers research 20d ago
How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs
Abstract FlowTracer is an RL framework that uses attention-induced graphs to trace reasoning flows and assign token-level credit based on global information propagation structures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Token-level credit assignment remains a key obstacle…
26 -
Hugging Face Daily Papers research 20d ago
When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models
Abstract Multi-turn reasoning models exhibit hidden alignment failures that are masked by traditional evaluation methods, revealing vulnerabilities through a trace-level diagnostic framework that identifies distinct failure modes including context-injection failures. Generated…
12 -
Hugging Face Daily Papers research 20d ago
Emergent Misalignment Can Be Induced by Sycophancy and Reversed via Alignment Gating
Abstract Sycophancy fine-tuning contributes to emergent misalignment in language models, which can be reversed using Alignment Gating—a method that inserts learnable gates to identify and control unsafe responses while maintaining general capabilities. Generated by…
24 -
Hugging Face Daily Papers research 20d ago
ARM: An AutoRegressive Large Multimodal Model with Unified Discrete Representations
Abstract ARM demonstrates a unified autoregressive framework for image understanding, generation, and editing through discrete semantic tokenization and reinforcement learning optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct This paper introduces ARM, a discrete…
35 -
Hugging Face Daily Papers research 20d ago
Bridging the Agent-World Gap: Text World Models for LLM-based Agents
Abstract Text world models serve as transition models for LLM-based agents in interactive environments, enabling planning and efficient learning by predicting environmental changes from textual states and actions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language model…
16 -
Hugging Face Daily Papers research 20d ago
Dynamic Linear Attention
Abstract DLA addresses limitations in long-context LLMs by introducing adaptive state merging and capacity-bounded memory modeling for improved multi-state linear attention. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The scalability of Large Language Models (LLMs) to long…
25 -
Hugging Face Daily Papers research 20d ago
SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning
Abstract SCAIL-2 enables end-to-end character animation by directly transferring motion from driving videos without intermediate representations, using unified task decomposition and synthetic data generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Controlled character…
6 -
Hugging Face Daily Papers research 20d ago
Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning
Abstract QGF is an RL algorithm that improves policies at test time by using a value gradient to guide a pre-trained flow policy, avoiding training-time instability while maintaining competitive performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Expressive continuous…
31 -
Hugging Face Daily Papers research 20d ago
ABot-Earth 0.5: Generative 3D Earth Model
Abstract ABot-Earth 0.5 generates realistic 3D environments from satellite imagery using 3D Gaussian Splatting representation, enabling fast synthesis and real-time visualization for Embodied AI applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present ABot-Earth…
22 -
Hugging Face Daily Papers research 20d ago
Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval
Abstract State-Grounded Dynamic Retrieval enables web agents to dynamically reuse skills based on current webpage state rather than fixed task-level strategies, improving automation performance across multiple domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Language agents…
37 -
Hugging Face Daily Papers research 20d ago
BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts
Abstract Researchers create BenSyc, a benchmark for evaluating conversational sycophancy in Bengali contexts, revealing challenges in distinguishing empathetic support from validation and escalation in emotionally sensitive dialogues. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
14 -
Hugging Face Daily Papers research 20d ago
What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems
Abstract Multi-agent systems using large language models suffer from inefficient token consumption in agent-to-agent communication, which PACT addresses by structuring messages as compact action-state records that improve performance-cost trade-offs across different system…
30 -
Hugging Face Daily Papers research 20d ago
VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon Manipulation
Abstract VoLoAgent enables physical orchestration by integrating vision-language models with robot capabilities for open-vocabulary long-horizon manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Open-vocabulary long-horizon manipulation requires robots to reason…
24 -
Hugging Face Daily Papers research 20d ago
Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher
Abstract Trust functions enable effective weak-to-strong generalization by identifying reliable weak labels for training, achieving performance comparable to ground-truth supervision across multiple domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Weak-to-strong…
15 -
Hugging Face Daily Papers research 20d ago
Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense
Abstract SCOUT framework dynamically allocates prompt-injection detection by predicting detector reliability and latency, improving safety and efficiency over fixed single-detector approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Prompt-injection detectors are…
30 -
Hugging Face Daily Papers research 20d ago
Precision Is Not Faithfulness: Coverage-Aware Evaluation of Grounded Generation with a Complete Oracle
Abstract Reference-free faithfulness metrics suffer from a blind spot measuring only precision, leading to rewards for abstention; completeness in deterministic domains enables measurement of both precision and recall, revealing that high-precision models often have poor fact…
34 -
Hugging Face Daily Papers research 20d ago
Phase Marginalization for Patch-Grid Instability in Vision Transformers
Abstract Phase Marginalization is a post-hoc method that addresses phase-dependent instability in Vision Transformers by evaluating structured patch-grid phases and aggregating outputs in the original image coordinate system. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision…
32 -
Hugging Face Daily Papers research 20d ago
AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents
Abstract AsyncWebRL improves vision-language web agent training through asynchronous reinforcement learning and trajectory normalization modifications, achieving faster throughput and better performance on challenging tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Training…
32 -
Hugging Face Daily Papers research 20d ago
SDR: Set-Distance Rewards for Radiology Report Generation
Abstract Set-based rewards using embedding distances improve chest X-ray report generation by enabling effective post-training and test-time selection without requiring causal reasoning structures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement learning with…
14 -
Hugging Face Daily Papers research 20d ago
Agents' Last Exam
Abstract Agents' Last Exam (ALE) is a benchmark for evaluating AI agents on long-term, economically valuable real-world tasks across 13 industry clusters with 1K+ tasks, revealing significant gaps between benchmark performance and practical deployment. Generated by…
6 -
Hugging Face Daily Papers research 20d ago
Robotic Policy Adaptation via Weight-Space Meta-Learning
Abstract WIZARD is a weight-space meta-learning framework that generates task-specific LoRA parameters for frozen VLA policies using language instructions and demonstration videos, enabling efficient task adaptation without fine-tuning. Generated by…
31 -
Hugging Face Daily Papers research 20d ago
Light-WAM: Efficient World Action Models with State-Fusion Action Decoding
Abstract Light-WAM is a lightweight world action model for robot manipulation that uses a compact video backbone and downsampled latent space for efficient future-video supervision, combined with a StateFusionActionExpert for direct action prediction. Generated by…
25 -
Hugging Face Daily Papers research 20d ago
DEI: Diversity in Evolutionary Inference for Quality-Diversity Search
Abstract A distributed Quality-Diversity search framework uses heterogeneous large language models as mutation operators to enhance evolutionary inference, demonstrating that model diversity improves performance over homogeneous parallel approaches. Generated by…
13 -
Hugging Face Daily Papers research 20d ago
SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices
Abstract SigmaScale learns auxiliary scaling matrices to improve truncated SVD-based LLM compression by adapting to individual weight structures through activation-aware transformations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present SigmaScale, a method for learning…
28 -
Hugging Face Daily Papers research 20d ago
Pruning and Distilling Mixture-of-Experts into Dense Language Models
Abstract A systematic framework converts mixture-of-experts models into dense architectures through expert scoring, selection, grouping, and knowledge distillation, achieving superior performance and efficiency compared to traditional pruning methods. Generated by…
6 -
Hugging Face Daily Papers research 20d ago
Where Rectified Flows Leak: Characterising Membership Signals Along the Interpolation Path
Abstract Rectified Flows retain subtle training data traces that accumulate during training and can be exploited for membership inference attacks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Understanding what generative models retain from training data remains challenging,…
12 -
Hugging Face Daily Papers research 20d ago
EmpiriGraph-Psy: A Dataset and LLM Pipeline for Extracting Empirical Relation Graphs from Psychology Abstracts
Abstract Variable-centered empirical graph extraction maps psychology abstracts to typed graphs with normalized variables and empirical relations, achieving improved performance through staged pipeline approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing scientific…
33 -
Hugging Face Daily Papers research 20d ago
Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning
Abstract Skill-3D framework enables agents to learn scene-aware skills through self-evolving memory and skill libraries, improving tool utilization in 3D spatial reasoning tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct This paper explores agentic 3D spatial understanding,…
22 -
Hugging Face Daily Papers research 20d ago
Liberating LLM Capabilities in Full-Duplex Speech Models
Abstract A text-first tri-channel speech interface enables real-time interaction with visible text output alongside spoken responses, demonstrating superior performance in full-duplex conversational tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speech-based large language…
21 -
Hugging Face Daily Papers research 21d ago
WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models
Abstract WorldCraft extends interactive video world models to enable object-level trajectory control while maintaining camera navigation capabilities through specialized control pipelines. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent video-based world models have made…
27 -
Hugging Face Daily Papers research 21d ago
A Geometric Account of Activation Steering through Angle-Norm Decomposition
Abstract Research challenges the assumption that hidden-state norms carry concept-relevant information in language models, demonstrating that concepts are primarily represented in angular structure while norm remains crucial for steering stability and effectiveness across…
25 -
Hugging Face Daily Papers research 21d ago
Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?
Abstract Large language models can improve translation for low-resource languages through structured linguistic reasoning traces, with the most significant benefits occurring during inference rather than training. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language…
30 -
Hugging Face Daily Papers research 21d ago
Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders
Abstract Research demonstrates that hallucinations in Whisper ASR can be detected and reduced using internal representations from audio encoder activations and Sparse AutoEncoder latents, achieving significant hallucination rate reduction with minimal speech transcription…
20 -
Hugging Face Daily Papers research 21d ago
CIPER: A Unified Framework for Cross-view Image-retrieval and Pose-estimation
Abstract CIPER is a unified cross-view geo-localization framework that simultaneously performs city-scale retrieval and precise 3-DoF pose estimation using a shared transformer encoder and two-way pose decoder. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Cross-view…
36