Hugging Face Daily Papers

500 articles archived · Visit source ↗ · RSS

Hugging Face Daily Papers research 20d ago

U-TTT: Towards Generalizable PET Image Denoising via Test-Time Training

Abstract A novel U-shaped deep learning model with test-time training layers and dual-domain adaptation mechanisms achieves robust PET image denoising under distribution shifts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing deep learning models for Positron Emission…

32
Hugging Face Daily Papers research 20d ago

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

Abstract Role-Agent framework enables LLM agents to function as both agent and environment through bootstrapped co-evolution, improving performance via environment-aware reasoning and targeted practice. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Although Large Language Model…

33
Hugging Face Daily Papers research 20d ago

Late-Layer Fusion is Enough: Dual-Path Vision Token Routing for Multimodal Large Language Models under Visual Saturation

Abstract Research reveals that vision and text tokens in multimodal models evolve asynchronously, leading to inefficient computation; a new asymmetric routing framework reduces visual processing overhead while maintaining performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

9
Hugging Face Daily Papers research 20d ago

MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

Abstract MemDreamer addresses long-video understanding challenges by decoupling perception and reasoning through hierarchical graph memory and agentic exploration, achieving state-of-the-art performance with reduced computational overhead. Generated by…

33
Hugging Face Daily Papers research 20d ago

Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

Abstract Current AI agents struggle with long-horizon professional GUI workflows, achieving low success rates due to issues with workflow consistency and domain-specific software understanding. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent years have witnessed the rapid…

15
Hugging Face Daily Papers research 20d ago

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Abstract Retrospective Harness Optimization (RHO) is a self-supervised method that improves AI agent performance by optimizing agent harness using only past trajectories through diverse task selection, parallel re-solving, and self-validation techniques. Generated by…

8
Hugging Face Daily Papers research 20d ago

Lip Forcing: Few-Step Autoregressive Diffusion for Real-time Lip Synchronization

Abstract Autoregressive diffusion method for video-to-video lip synchronization achieves real-time performance through distillation and optimized inference schedules. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Diffusion-based lip synchronization models achieve strong visual…

29
Hugging Face Daily Papers research 20d ago

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

Abstract Flow-DPPO replaces ratio clipping with divergence proximal constraints in flow matching models, improving training stability and multi-objective optimization through exact KL divergence computation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent work has…

34
Hugging Face Daily Papers research 20d ago

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

Abstract A multi-agent framework automates data journalism by generating evidence-grounded, multimodal news stories while maintaining transparency and verifiability. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Data tells stories that shape society; the data journalist's job is…

10
Hugging Face Daily Papers research 20d ago

WorldOlympiad: Can Your World Model Survive a Triathlon?

Abstract WorldOlympiad presents a comprehensive benchmark for evaluating video-based world models across physical faithfulness, geometric consistency, and interaction fidelity, revealing significant gaps in current generative models' capabilities. Generated by…

13
Hugging Face Daily Papers research 20d ago

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Abstract Chain-of-thought supervised fine-tuning degrades long-context recall in hybrid linear-attention models by biasing attention gradients toward short-range patterns, but a training-free method called QK-Restore can restore long-context capabilities by reverting query-key…

8
Hugging Face Daily Papers research 20d ago

Rethinking the Divergence Regularization in LLM RL

Abstract DRPO improves LLM reinforcement learning stability by replacing hard masks with smooth regularization that provides continuous gradient corrections beyond trust-region boundaries. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement learning (RL) has become a key…

29
Hugging Face Daily Papers research 20d ago

EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

Abstract EEVEE is a novel test-time prompt learning framework for LLM agents that handles heterogeneous data streams through task clustering and co-evolving router-prompt optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct In this paper, we propose EEVEE, the first…

6
Hugging Face Daily Papers research 20d ago

SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

Abstract A large language model trained on synthesized delegation intelligence achieves superior performance on long-horizon research tasks through task decomposition and subagent coordination. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models are increasingly…

12
Hugging Face Daily Papers research 20d ago

One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA

Abstract Latent Memory introduces a compressed representation approach for external memory in question answering, reducing token consumption and storage requirements while maintaining competitive performance across text-only and multimodal benchmarks. Generated by…

28
Hugging Face Daily Papers research 20d ago

Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking

Abstract Struct-Searcher introduces a belief revision theory-based structural agentic workflow for multimodal information seeking that improves accuracy over existing vision-language models and deep research agents. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deep research…

17
Hugging Face Daily Papers research 20d ago

How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs

Abstract FlowTracer is an RL framework that uses attention-induced graphs to trace reasoning flows and assign token-level credit based on global information propagation structures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Token-level credit assignment remains a key obstacle…

26
Hugging Face Daily Papers research 20d ago

When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models

Abstract Multi-turn reasoning models exhibit hidden alignment failures that are masked by traditional evaluation methods, revealing vulnerabilities through a trace-level diagnostic framework that identifies distinct failure modes including context-injection failures. Generated…

12
Hugging Face Daily Papers research 20d ago

Emergent Misalignment Can Be Induced by Sycophancy and Reversed via Alignment Gating

Abstract Sycophancy fine-tuning contributes to emergent misalignment in language models, which can be reversed using Alignment Gating—a method that inserts learnable gates to identify and control unsafe responses while maintaining general capabilities. Generated by…

24
Hugging Face Daily Papers research 20d ago

ARM: An AutoRegressive Large Multimodal Model with Unified Discrete Representations

Abstract ARM demonstrates a unified autoregressive framework for image understanding, generation, and editing through discrete semantic tokenization and reinforcement learning optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct This paper introduces ARM, a discrete…

35
Hugging Face Daily Papers research 20d ago

Bridging the Agent-World Gap: Text World Models for LLM-based Agents

Abstract Text world models serve as transition models for LLM-based agents in interactive environments, enabling planning and efficient learning by predicting environmental changes from textual states and actions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language model…

16
Hugging Face Daily Papers research 20d ago

Dynamic Linear Attention

Abstract DLA addresses limitations in long-context LLMs by introducing adaptive state merging and capacity-bounded memory modeling for improved multi-state linear attention. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The scalability of Large Language Models (LLMs) to long…

25
Hugging Face Daily Papers research 20d ago

SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning

Abstract SCAIL-2 enables end-to-end character animation by directly transferring motion from driving videos without intermediate representations, using unified task decomposition and synthetic data generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Controlled character…

6
Hugging Face Daily Papers research 20d ago

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

Abstract QGF is an RL algorithm that improves policies at test time by using a value gradient to guide a pre-trained flow policy, avoiding training-time instability while maintaining competitive performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Expressive continuous…

31
Hugging Face Daily Papers research 20d ago

ABot-Earth 0.5: Generative 3D Earth Model

Abstract ABot-Earth 0.5 generates realistic 3D environments from satellite imagery using 3D Gaussian Splatting representation, enabling fast synthesis and real-time visualization for Embodied AI applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present ABot-Earth…

22
Hugging Face Daily Papers research 20d ago

Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval

Abstract State-Grounded Dynamic Retrieval enables web agents to dynamically reuse skills based on current webpage state rather than fixed task-level strategies, improving automation performance across multiple domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Language agents…

37
Hugging Face Daily Papers research 20d ago

BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts

Abstract Researchers create BenSyc, a benchmark for evaluating conversational sycophancy in Bengali contexts, revealing challenges in distinguishing empathetic support from validation and escalation in emotionally sensitive dialogues. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

14
Hugging Face Daily Papers research 20d ago

What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems

Abstract Multi-agent systems using large language models suffer from inefficient token consumption in agent-to-agent communication, which PACT addresses by structuring messages as compact action-state records that improve performance-cost trade-offs across different system…

30
Hugging Face Daily Papers research 20d ago

VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon Manipulation

Abstract VoLoAgent enables physical orchestration by integrating vision-language models with robot capabilities for open-vocabulary long-horizon manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Open-vocabulary long-horizon manipulation requires robots to reason…

24
Hugging Face Daily Papers research 20d ago

Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher

Abstract Trust functions enable effective weak-to-strong generalization by identifying reliable weak labels for training, achieving performance comparable to ground-truth supervision across multiple domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Weak-to-strong…

15
Hugging Face Daily Papers research 20d ago

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

Abstract SCOUT framework dynamically allocates prompt-injection detection by predicting detector reliability and latency, improving safety and efficiency over fixed single-detector approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Prompt-injection detectors are…

30
Hugging Face Daily Papers research 20d ago

Precision Is Not Faithfulness: Coverage-Aware Evaluation of Grounded Generation with a Complete Oracle

Abstract Reference-free faithfulness metrics suffer from a blind spot measuring only precision, leading to rewards for abstention; completeness in deterministic domains enables measurement of both precision and recall, revealing that high-precision models often have poor fact…

34
Hugging Face Daily Papers research 20d ago

Phase Marginalization for Patch-Grid Instability in Vision Transformers

Abstract Phase Marginalization is a post-hoc method that addresses phase-dependent instability in Vision Transformers by evaluating structured patch-grid phases and aggregating outputs in the original image coordinate system. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision…

32
Hugging Face Daily Papers research 20d ago

AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents

Abstract AsyncWebRL improves vision-language web agent training through asynchronous reinforcement learning and trajectory normalization modifications, achieving faster throughput and better performance on challenging tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Training…

32
Hugging Face Daily Papers research 20d ago

SDR: Set-Distance Rewards for Radiology Report Generation

Abstract Set-based rewards using embedding distances improve chest X-ray report generation by enabling effective post-training and test-time selection without requiring causal reasoning structures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement learning with…

14
Hugging Face Daily Papers research 20d ago

Agents' Last Exam

Abstract Agents' Last Exam (ALE) is a benchmark for evaluating AI agents on long-term, economically valuable real-world tasks across 13 industry clusters with 1K+ tasks, revealing significant gaps between benchmark performance and practical deployment. Generated by…

6
Hugging Face Daily Papers research 20d ago

Robotic Policy Adaptation via Weight-Space Meta-Learning

Abstract WIZARD is a weight-space meta-learning framework that generates task-specific LoRA parameters for frozen VLA policies using language instructions and demonstration videos, enabling efficient task adaptation without fine-tuning. Generated by…

31
Hugging Face Daily Papers research 20d ago

Light-WAM: Efficient World Action Models with State-Fusion Action Decoding

Abstract Light-WAM is a lightweight world action model for robot manipulation that uses a compact video backbone and downsampled latent space for efficient future-video supervision, combined with a StateFusionActionExpert for direct action prediction. Generated by…

25
Hugging Face Daily Papers research 20d ago

DEI: Diversity in Evolutionary Inference for Quality-Diversity Search

Abstract A distributed Quality-Diversity search framework uses heterogeneous large language models as mutation operators to enhance evolutionary inference, demonstrating that model diversity improves performance over homogeneous parallel approaches. Generated by…

13
Hugging Face Daily Papers research 20d ago

SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices

Abstract SigmaScale learns auxiliary scaling matrices to improve truncated SVD-based LLM compression by adapting to individual weight structures through activation-aware transformations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present SigmaScale, a method for learning…

28
Hugging Face Daily Papers research 20d ago

Pruning and Distilling Mixture-of-Experts into Dense Language Models

Abstract A systematic framework converts mixture-of-experts models into dense architectures through expert scoring, selection, grouping, and knowledge distillation, achieving superior performance and efficiency compared to traditional pruning methods. Generated by…

6
Hugging Face Daily Papers research 20d ago

Where Rectified Flows Leak: Characterising Membership Signals Along the Interpolation Path

Abstract Rectified Flows retain subtle training data traces that accumulate during training and can be exploited for membership inference attacks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Understanding what generative models retain from training data remains challenging,…

12
Hugging Face Daily Papers research 20d ago

EmpiriGraph-Psy: A Dataset and LLM Pipeline for Extracting Empirical Relation Graphs from Psychology Abstracts

Abstract Variable-centered empirical graph extraction maps psychology abstracts to typed graphs with normalized variables and empirical relations, achieving improved performance through staged pipeline approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing scientific…

33
Hugging Face Daily Papers research 20d ago

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

Abstract Skill-3D framework enables agents to learn scene-aware skills through self-evolving memory and skill libraries, improving tool utilization in 3D spatial reasoning tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct This paper explores agentic 3D spatial understanding,…

22
Hugging Face Daily Papers research 20d ago

Liberating LLM Capabilities in Full-Duplex Speech Models

Abstract A text-first tri-channel speech interface enables real-time interaction with visible text output alongside spoken responses, demonstrating superior performance in full-duplex conversational tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speech-based large language…

21
Hugging Face Daily Papers research 21d ago

WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models

Abstract WorldCraft extends interactive video world models to enable object-level trajectory control while maintaining camera navigation capabilities through specialized control pipelines. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent video-based world models have made…

27
Hugging Face Daily Papers research 21d ago

A Geometric Account of Activation Steering through Angle-Norm Decomposition

Abstract Research challenges the assumption that hidden-state norms carry concept-relevant information in language models, demonstrating that concepts are primarily represented in angular structure while norm remains crucial for steering stability and effectiveness across…

25
Hugging Face Daily Papers research 21d ago

Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?

Abstract Large language models can improve translation for low-resource languages through structured linguistic reasoning traces, with the most significant benefits occurring during inference rather than training. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language…

30
Hugging Face Daily Papers research 21d ago

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

Abstract Research demonstrates that hallucinations in Whisper ASR can be detected and reduced using internal representations from audio encoder activations and Sparse AutoEncoder latents, achieving significant hallucination rate reduction with minimal speech transcription…

20
Hugging Face Daily Papers research 21d ago

CIPER: A Unified Framework for Cross-view Image-retrieval and Pose-estimation

Abstract CIPER is a unified cross-view geo-localization framework that simultaneously performs city-scale retrieval and precise 3-DoF pose estimation using a shared transformer encoder and two-way pose decoder. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Cross-view…

36

U-TTT: Towards Generalizable PET Image Denoising via Test-Time Training

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

Late-Layer Fusion is Enough: Dual-Path Vision Token Routing for Multimodal Large Language Models under Visual Saturation

MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Lip Forcing: Few-Step Autoregressive Diffusion for Real-time Lip Synchronization

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

WorldOlympiad: Can Your World Model Survive a Triathlon?

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Rethinking the Divergence Regularization in LLM RL

EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA

Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking

How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs

When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models

Emergent Misalignment Can Be Induced by Sycophancy and Reversed via Alignment Gating

ARM: An AutoRegressive Large Multimodal Model with Unified Discrete Representations

Bridging the Agent-World Gap: Text World Models for LLM-based Agents

Dynamic Linear Attention

SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

ABot-Earth 0.5: Generative 3D Earth Model

Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval

BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts

What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems

VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon Manipulation

Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

Precision Is Not Faithfulness: Coverage-Aware Evaluation of Grounded Generation with a Complete Oracle

Phase Marginalization for Patch-Grid Instability in Vision Transformers

AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents

SDR: Set-Distance Rewards for Radiology Report Generation

Agents' Last Exam

Robotic Policy Adaptation via Weight-Space Meta-Learning

Light-WAM: Efficient World Action Models with State-Fusion Action Decoding

DEI: Diversity in Evolutionary Inference for Quality-Diversity Search

SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices

Pruning and Distilling Mixture-of-Experts into Dense Language Models

Where Rectified Flows Leak: Characterising Membership Signals Along the Interpolation Path

EmpiriGraph-Psy: A Dataset and LLM Pipeline for Extracting Empirical Relation Graphs from Psychology Abstracts

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

Liberating LLM Capabilities in Full-Duplex Speech Models

WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models

A Geometric Account of Activation Steering through Angle-Norm Decomposition

Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

CIPER: A Unified Framework for Cross-view Image-retrieval and Pose-estimation