Hugging Face Daily Papers

500 articles archived · Visit source ↗ · RSS

Hugging Face Daily Papers research 2d ago

COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami

Abstract A computational origami system generates crease patterns from natural language using AI-driven optimization and aesthetic evaluation, enabling human-AI collaboration in mathematically constrained design. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While generative AI…

11
Hugging Face Daily Papers research 2d ago

Fast LeWorldModel

Abstract Fast-LeWM accelerates visual planning by replacing autoregressive rollout with parallel action-prefix prediction, reducing computational costs and latency accumulation during long-horizon predictions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Joint-Embedding…

20
Hugging Face Daily Papers research 3d ago

ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and Generation

Abstract ABACUS is a unified vision-language model that performs object counting and related tasks through innovative spatial grounding, boundary-aware counting policies, and self-critical learning strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct ABACUS is a unified…

16
Hugging Face Daily Papers research 3d ago

Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents

Abstract Reinforcement learning post-training enables effective step-level scoring for language models without requiring dedicated reward model training by deriving an implicit advantage function called progress advantage. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Process…

6
Hugging Face Daily Papers research 3d ago

Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation

Abstract A unified agentic framework called Qwen-Image-Agent is proposed to address the context gap in text-to-image generation by progressively constructing complete generation context through planning, reasoning, searching, and memory mechanisms. Generated by…

22
Hugging Face Daily Papers research 3d ago

Information-Aware KV Cache Compression for Long Reasoning

Abstract InfoKV is an entropy-aware KV cache compression framework that enhances long-context reasoning in LLMs by incorporating information-theoretic signals alongside attention weights. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reasoning capability has advanced rapidly in…

10
Hugging Face Daily Papers research 3d ago

EO-WM: A Physically Informed World Model for Probabilistic Earth Observation Forecasting

Abstract EO-WM is a video diffusion transformer for multispectral Earth Observation forecasting that incorporates physically informed conditioning frameworks to better capture weather-driven uncertainties in land-surface dynamics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

10
Hugging Face Daily Papers research 3d ago

LISA: Likelihood Score Alignment for Visual-condition Controllable Generation

Abstract Score-based generative modeling reveals that side networks contribute likelihood scores to conditional control, leading to improved training efficiency through likelihood score alignment regularization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The prevalent…

36
Hugging Face Daily Papers research 3d ago

Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments

Abstract A web-based benchmark evaluates agent generalization across challenging scenarios, revealing significant gaps between current agentic systems and human performance in temporal perception, graphical understanding, and 3D reasoning. Generated by…

10
Hugging Face Daily Papers research 3d ago

When Does Combining Language Models Help? A Co-Failure Ceiling on Routing, Voting, and Mixture-of-Agents Across 67 Frontier Models

Abstract Multi-model systems face fundamental accuracy limits determined by the rate at which all models fail simultaneously, regardless of their individual correlations or ensemble strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-model LLM systems such as routing,…

11
Hugging Face Daily Papers research 3d ago

PhysiFormer: Learning to Simulate Mechanics in World Space

Abstract PhysiFormer uses coordinate-space diffusion to generate physically-plausible 3D object motions without explicit inductive biases, enabling efficient multi-object reasoning and generalization to complex materials and geometries. Generated by…

30
Hugging Face Daily Papers research 3d ago

CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies

Abstract CoffeeBench evaluates LLM agents in a multi-agent economic simulation where firms interact over 90 days to maximize profits, revealing differences in communication patterns and performance among various models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As LLM agents…

4
Hugging Face Daily Papers research 4d ago

Discretizing Reward Models

Abstract Reward models in reinforcement learning suffer from oversensitivity issues where they assign different scores to equally good responses, leading to poor policy learning, but this can be mitigated through discretization techniques that maintain discriminative ability…

16
Hugging Face Daily Papers research 4d ago

JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

Abstract JetSpec is a speculative decoding framework that combines efficient forward drafting with causal conditioning to improve LLM inference speed and acceptance rates across various benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speculative decoding (SD)…

17
Hugging Face Daily Papers research 4d ago

How Post-Training Shapes Biological Reasoning Models

Abstract Post-training stages in biological reasoning models differently affect generalization, with continued pre-training aligning models with biological language, supervised fine-tuning improving in-domain performance but reducing out-of-domain generalization, and…

8
Hugging Face Daily Papers research 4d ago

Hallucination in World Models is Predictable and Preventable

Abstract World models exhibit hallucinations in low-data regions of state-action space, which can be detected and mitigated using data-centric signals and coverage-aware sampling techniques. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern generative world models render…

25
Hugging Face Daily Papers research 4d ago

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

Abstract Verification challenges in AI agents arise from the difficulty of aligning proxy signals with human intent, requiring adaptive verification systems that evolve alongside generative capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A classical intuition holds…

26
Hugging Face Daily Papers research 4d ago

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

Abstract Computer-use agents can execute software tasks through either graphical interfaces or programmatic command interfaces, but existing evaluations confound interaction modality with differences in tasks, initial states, verifiers, and permitted actions. We introduce a…

7
Hugging Face Daily Papers research 4d ago

Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It

Abstract Research investigates how different supervisory signals and training strategies improve the stability and performance of large language models in tool-use tasks, addressing issues like catastrophic collapse and format sensitivity through interleaved supervised…

21
Hugging Face Daily Papers research 4d ago

OpenBioRQ: Unsolved Biomedical Research Questions for Agents

Abstract A new biomedical benchmark evaluates agentic models' ability to verify sources and avoid false citations by testing unsolved research questions with no answer keys, revealing significant failures in retrieval-grounded reasoning and tool usage. Generated by…

9
Hugging Face Daily Papers research 4d ago

In-Context World Modeling for Robotic Control

Abstract ICWM enables robot policies to infer system variables from self-generated interactions, allowing adaptation to novel configurations without parameter updates by treating system identification as an in-context adaptation problem. Generated by…

8
Hugging Face Daily Papers research 4d ago

Confidence-Aware Tool Orchestration for Robust Video Understanding

Abstract Robust-TO addresses the Blind Trust Problem in video reasoning by integrating per-frame trustworthiness into an agentic framework that improves accuracy under realistic perturbations through calibrated evidence weighting and reliability-aware reasoning. Generated by…

17
Hugging Face Daily Papers research 4d ago

ViQ: Text-Aligned Visual Quantized Representations at Any Resolution

Abstract ViQ presents a visual quantization framework that balances semantic richness and detail preservation in discrete representations, enabling efficient multimodal training with native-resolution inputs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A unified representation…

26
Hugging Face Daily Papers research 4d ago

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

Abstract On-policy skill distillation framework extracts dense hindsight supervision from completed trajectories to improve language agent training efficiency and performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Outcome-based reinforcement learning provides a stable…

20
Hugging Face Daily Papers research 4d ago

DanceOPD: On-Policy Generative Field Distillation

Abstract A novel on-policy generative field distillation framework called DanceOPD is proposed to unify text-to-image generation, local editing, and global editing capabilities in flow-matching models through capability-specific routing and velocity-based training. Generated by…

10
Hugging Face Daily Papers research 4d ago

Physics Question Scene Graph: Fine-grained Evaluation of Physical Plausibility in Text-to-Video Generation

Abstract A vision-language model-based hierarchical question graph framework evaluates video generation models' adherence to physical laws with granular violation detection and human correlation validation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video generation models are…

23
Hugging Face Daily Papers research 4d ago

Do Thinking Tokens Help with Safety?

Abstract Research reveals that reasoning models' safety outcomes are predictable from early hidden representations, with deliberation appearing but not substantially influencing final responses, and current safety interventions inadvertently suppress genuine deliberation…

25
Hugging Face Daily Papers research 4d ago

Forecasting Future Behavior as a Learning Task

Abstract Behavior Forecasters are trained to predict large reasoning model outputs from single trajectories, outperforming large language models while requiring significantly less computational cost. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Trust in an AI system is often…

24
Hugging Face Daily Papers research 4d ago

Plans Don't Persist: Why Context Management Is Load Bearing for LLM Agents

Abstract Standard LLM agents rely on plan content remaining in context rather than maintaining it as persistent state, with evidence shown through replay pairing diagnostics and compression stress tests. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Long-horizon agents depend on…

27
Hugging Face Daily Papers research 4d ago

Speaker Identity in Non-Verbal Vocalizations: Conditional Distillation and Mixture of Experts Approach

Abstract A novel speaker verification framework combines frozen self-supervised features with ECAPA-TDNN and MoE modules to improve identity verification across both speech and non-verbal vocalizations while maintaining speech performance. Generated by…

30
Hugging Face Daily Papers research 4d ago

Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching

Abstract Lite Any Stereo V2 (LAS2) presents an efficient stereo matching approach that achieves state-of-the-art accuracy with significantly reduced latency through optimized architecture and training strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent advances in…

9
Hugging Face Daily Papers research 4d ago

PrivacyAlign: Contextual Privacy Alignment for LLM Agents

Abstract Researchers develop a human-centered approach to align AI agents with privacy norms by creating a comprehensive dataset of privacy judgments and using annotation-conditioned reward modeling to improve agent behavior. Generated by Qwen/Qwen2.5-Coder-32B-Instruct AI…

7
Hugging Face Daily Papers research 4d ago

What Intermediate Layers Know: Detecting Jailbreaks from Entropy Dynamics

Abstract Jailbreak attacks expose vulnerabilities in aligned large language models, revealing that harmful intent is encoded in structured intermediate uncertainty dynamics rather than output representations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Jailbreak attacks reveal…

23
Hugging Face Daily Papers research 4d ago

Distill Once, Adapt Life-Long: Exploring Dataset Distillation for Continual Test-Time Adaptation

Abstract DO-ALL is a test-time adaptation framework that uses dataset distillation to create synthetic anchors for stable long-term model performance without retaining source data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Continual Test-Time Adaptation (CTTA) aims to…

20
Hugging Face Daily Papers research 4d ago

ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation

Abstract ReNIO enhances on-policy distillation for language models by reweighting negative trajectories based on token-level probability ratios, improving reasoning performance in mathematical and code generation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct On-policy…

25
Hugging Face Daily Papers research 4d ago

Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints

Abstract Tool Suppression occurs when JSON Schema constraints and tool calling are jointly enabled, preventing open-weight models from invoking tools despite maintaining schema compliance, with the issue stemming from grammar-based token masking that makes tool-call tokens…

5
Hugging Face Daily Papers research 4d ago

Autodata: An agentic data scientist to create high quality synthetic data

Abstract Autodata enables AI agents to function as data scientists who create high-quality training data through meta-optimization, demonstrating improved performance across multiple task domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce Autodata, a general…

30
Hugging Face Daily Papers research 4d ago

Causal-rCM: A Unified Teacher-Forcing and Self-Forcing Open Recipe for Autoregressive Diffusion Distillation in Streaming Video Generation and Interactive World Models

Abstract Autoregressive video diffusion extends diffusion distillation frameworks to real-time streaming generation through causal training paradigms, achieving state-of-the-art performance with fast convergence and interactive world modeling capabilities. Generated by…

4
Hugging Face Daily Papers research 4d ago

Improved Large Language Diffusion Models

Abstract Masked diffusion language models with fully bidirectional attention outperform autoregressive counterparts on various benchmarks while maintaining competitiveness with established models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern large language models are…

18
Hugging Face Daily Papers research 4d ago

MVTrack4Gen: Multi-View Point Tracking as Geometric Supervision for 4D Video Generation

Abstract A novel-view video synthesis method that enhances motion-aware diffusion models through multi-view point tracking supervision to improve geometric consistency and motion fidelity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Synthesizing a novel-view video from a…

37
Hugging Face Daily Papers research 4d ago

ShutterMuse: Capture-Time Photography Guidance with MLLMs

Abstract Researchers developed a new benchmark and dataset for photography assistance, along with a unified multimodal model that provides both composition guidance and pose recommendations during image capture. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Real-world photography…

12
Hugging Face Daily Papers research 5d ago

The Hitchhiker's Guide to Agentic AI: From Foundations to Systems

Abstract The book provides a comprehensive guide to building autonomous AI systems, covering foundational elements like transformer architecture and training methods, along with advanced topics such as reinforcement learning, agent architectures, and production deployment.…

5
Hugging Face Daily Papers research 5d ago

RL-Index: Reinforcement Learning for Retrieval Index Reasoning

Abstract RL-Index introduces an agentic indexing framework that shifts reasoning from query time to indexing stage by using LLM-generated rationales and reinforcement learning to improve retrieval effectiveness and reduce latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

25
Hugging Face Daily Papers research 5d ago

CAVEWOMAN: How Large Language Models Behave Under Linguistic Input and Output Compression

Abstract Two-channel evaluation shows output compression reduces costs while input compression increases costs and degrades accuracy across models and datasets. Generated by Qwen/Qwen2.5-Coder-32B-Instruct "Talk short. Drop grammar. Save token." This caveman style is widely…

28
Hugging Face Daily Papers research 5d ago

When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents

Abstract LLM agents frequently select higher-privilege tools unnecessarily, and while safety alignment doesn't ensure least-privilege choices, a post-training defense can reduce excessive privilege use without sacrificing performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

26
Hugging Face Daily Papers research 5d ago

UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating

Abstract UnityShots is a memory-driven audio-video generation system that maintains consistent subject appearance and audio across video cuts using fixed-size long-term and short-term memory slots with boundary-conditioned gates and discrete cut-type priors. Generated by…

7
Hugging Face Daily Papers research 5d ago

V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning

Abstract A novel label-free framework for visual reasoning called V-Zero is presented, which uses contrastive evidence gating to improve fine-grained visual reasoning without requiring annotated answer labels, achieving faster training than traditional methods. Generated by…

12
Hugging Face Daily Papers research 5d ago

EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

Abstract EBench is a comprehensive simulation benchmark for evaluating generalist mobile manipulation policies across diverse tasks and dimensions, revealing distinct capability profiles and generalization patterns among state-of-the-art models. Generated by…

18
Hugging Face Daily Papers research 5d ago

TryOnCrafter: Unleashing Camera Trajectories for Realistic Video Virtual Try-on via a Renderable 4D Try-on Proxy

Abstract Camera-controllable video virtual try-on framework uses a 4D proxy with explicit human-environment decoupling and DiT-based video generation for omnidirectional viewing. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While Video Virtual Try-on (VVT) has achieved…

4
Hugging Face Daily Papers research 5d ago

Are We Ready For An Agent-Native Memory System?

Abstract Large language model agents' memory systems have evolved into complex data management frameworks requiring systematic evaluation across multiple modules and workloads to understand their performance characteristics and trade-offs. Generated by…

7

COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami

Fast LeWorldModel

ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and Generation

Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents

Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation

Information-Aware KV Cache Compression for Long Reasoning

EO-WM: A Physically Informed World Model for Probabilistic Earth Observation Forecasting

LISA: Likelihood Score Alignment for Visual-condition Controllable Generation

Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments

When Does Combining Language Models Help? A Co-Failure Ceiling on Routing, Voting, and Mixture-of-Agents Across 67 Frontier Models

PhysiFormer: Learning to Simulate Mechanics in World Space

CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies

Discretizing Reward Models

JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

How Post-Training Shapes Biological Reasoning Models

Hallucination in World Models is Predictable and Preventable

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It

OpenBioRQ: Unsolved Biomedical Research Questions for Agents

In-Context World Modeling for Robotic Control

Confidence-Aware Tool Orchestration for Robust Video Understanding

ViQ: Text-Aligned Visual Quantized Representations at Any Resolution

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

DanceOPD: On-Policy Generative Field Distillation

Physics Question Scene Graph: Fine-grained Evaluation of Physical Plausibility in Text-to-Video Generation

Do Thinking Tokens Help with Safety?

Forecasting Future Behavior as a Learning Task

Plans Don't Persist: Why Context Management Is Load Bearing for LLM Agents

Speaker Identity in Non-Verbal Vocalizations: Conditional Distillation and Mixture of Experts Approach

Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching

PrivacyAlign: Contextual Privacy Alignment for LLM Agents

What Intermediate Layers Know: Detecting Jailbreaks from Entropy Dynamics

Distill Once, Adapt Life-Long: Exploring Dataset Distillation for Continual Test-Time Adaptation

ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation

Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints

Autodata: An agentic data scientist to create high quality synthetic data

Causal-rCM: A Unified Teacher-Forcing and Self-Forcing Open Recipe for Autoregressive Diffusion Distillation in Streaming Video Generation and Interactive World Models

Improved Large Language Diffusion Models

MVTrack4Gen: Multi-View Point Tracking as Geometric Supervision for 4D Video Generation

ShutterMuse: Capture-Time Photography Guidance with MLLMs

The Hitchhiker's Guide to Agentic AI: From Foundations to Systems

RL-Index: Reinforcement Learning for Retrieval Index Reasoning

CAVEWOMAN: How Large Language Models Behave Under Linguistic Input and Output Compression

When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents

UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating

V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning

EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

TryOnCrafter: Unleashing Camera Trajectories for Realistic Video Virtual Try-on via a Renderable 4D Try-on Proxy

Are We Ready For An Agent-Native Memory System?