Hugging Face Daily Papers
500 articles archived · Visit source ↗ · RSS
-
Hugging Face Daily Papers research 2d ago
COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami
Abstract A computational origami system generates crease patterns from natural language using AI-driven optimization and aesthetic evaluation, enabling human-AI collaboration in mathematically constrained design. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While generative AI…
11 -
Hugging Face Daily Papers research 2d ago
Fast LeWorldModel
Abstract Fast-LeWM accelerates visual planning by replacing autoregressive rollout with parallel action-prefix prediction, reducing computational costs and latency accumulation during long-horizon predictions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Joint-Embedding…
20 -
Hugging Face Daily Papers research 3d ago
ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and Generation
Abstract ABACUS is a unified vision-language model that performs object counting and related tasks through innovative spatial grounding, boundary-aware counting policies, and self-critical learning strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct ABACUS is a unified…
16 -
Hugging Face Daily Papers research 3d ago
Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents
Abstract Reinforcement learning post-training enables effective step-level scoring for language models without requiring dedicated reward model training by deriving an implicit advantage function called progress advantage. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Process…
6 -
Hugging Face Daily Papers research 3d ago
Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation
Abstract A unified agentic framework called Qwen-Image-Agent is proposed to address the context gap in text-to-image generation by progressively constructing complete generation context through planning, reasoning, searching, and memory mechanisms. Generated by…
22 -
Hugging Face Daily Papers research 3d ago
Information-Aware KV Cache Compression for Long Reasoning
Abstract InfoKV is an entropy-aware KV cache compression framework that enhances long-context reasoning in LLMs by incorporating information-theoretic signals alongside attention weights. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reasoning capability has advanced rapidly in…
10 -
Hugging Face Daily Papers research 3d ago
EO-WM: A Physically Informed World Model for Probabilistic Earth Observation Forecasting
Abstract EO-WM is a video diffusion transformer for multispectral Earth Observation forecasting that incorporates physically informed conditioning frameworks to better capture weather-driven uncertainties in land-surface dynamics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
10 -
Hugging Face Daily Papers research 3d ago
LISA: Likelihood Score Alignment for Visual-condition Controllable Generation
Abstract Score-based generative modeling reveals that side networks contribute likelihood scores to conditional control, leading to improved training efficiency through likelihood score alignment regularization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The prevalent…
36 -
Hugging Face Daily Papers research 3d ago
Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments
Abstract A web-based benchmark evaluates agent generalization across challenging scenarios, revealing significant gaps between current agentic systems and human performance in temporal perception, graphical understanding, and 3D reasoning. Generated by…
10 -
-
Hugging Face Daily Papers research 3d ago
PhysiFormer: Learning to Simulate Mechanics in World Space
Abstract PhysiFormer uses coordinate-space diffusion to generate physically-plausible 3D object motions without explicit inductive biases, enabling efficient multi-object reasoning and generalization to complex materials and geometries. Generated by…
30 -
Hugging Face Daily Papers research 3d ago
CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies
Abstract CoffeeBench evaluates LLM agents in a multi-agent economic simulation where firms interact over 90 days to maximize profits, revealing differences in communication patterns and performance among various models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As LLM agents…
4 -
Hugging Face Daily Papers research 3d ago
Discretizing Reward Models
Abstract Reward models in reinforcement learning suffer from oversensitivity issues where they assign different scores to equally good responses, leading to poor policy learning, but this can be mitigated through discretization techniques that maintain discriminative ability…
16 -
Hugging Face Daily Papers research 4d ago
JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting
Abstract JetSpec is a speculative decoding framework that combines efficient forward drafting with causal conditioning to improve LLM inference speed and acceptance rates across various benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speculative decoding (SD)…
17 -
Hugging Face Daily Papers research 4d ago
How Post-Training Shapes Biological Reasoning Models
Abstract Post-training stages in biological reasoning models differently affect generalization, with continued pre-training aligning models with biological language, supervised fine-tuning improving in-domain performance but reducing out-of-domain generalization, and…
8 -
Hugging Face Daily Papers research 4d ago
Hallucination in World Models is Predictable and Preventable
Abstract World models exhibit hallucinations in low-data regions of state-action space, which can be detected and mitigated using data-centric signals and coverage-aware sampling techniques. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern generative world models render…
25 -
Hugging Face Daily Papers research 4d ago
The Verification Horizon: No Silver Bullet for Coding Agent Rewards
Abstract Verification challenges in AI agents arise from the difficulty of aligning proxy signals with human intent, requiring adaptive verification systems that evolve alongside generative capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A classical intuition holds…
26 -
Hugging Face Daily Papers research 4d ago
GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents
Abstract Computer-use agents can execute software tasks through either graphical interfaces or programmatic command interfaces, but existing evaluations confound interaction modality with differences in tasks, initial states, verifiers, and permitted actions. We introduce a…
7 -
Hugging Face Daily Papers research 4d ago
Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It
Abstract Research investigates how different supervisory signals and training strategies improve the stability and performance of large language models in tool-use tasks, addressing issues like catastrophic collapse and format sensitivity through interleaved supervised…
21 -
Hugging Face Daily Papers research 4d ago
OpenBioRQ: Unsolved Biomedical Research Questions for Agents
Abstract A new biomedical benchmark evaluates agentic models' ability to verify sources and avoid false citations by testing unsolved research questions with no answer keys, revealing significant failures in retrieval-grounded reasoning and tool usage. Generated by…
9 -
Hugging Face Daily Papers research 4d ago
In-Context World Modeling for Robotic Control
Abstract ICWM enables robot policies to infer system variables from self-generated interactions, allowing adaptation to novel configurations without parameter updates by treating system identification as an in-context adaptation problem. Generated by…
8 -
Hugging Face Daily Papers research 4d ago
Confidence-Aware Tool Orchestration for Robust Video Understanding
Abstract Robust-TO addresses the Blind Trust Problem in video reasoning by integrating per-frame trustworthiness into an agentic framework that improves accuracy under realistic perturbations through calibrated evidence weighting and reliability-aware reasoning. Generated by…
17 -
Hugging Face Daily Papers research 4d ago
ViQ: Text-Aligned Visual Quantized Representations at Any Resolution
Abstract ViQ presents a visual quantization framework that balances semantic richness and detail preservation in discrete representations, enabling efficient multimodal training with native-resolution inputs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A unified representation…
26 -
Hugging Face Daily Papers research 4d ago
OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning
Abstract On-policy skill distillation framework extracts dense hindsight supervision from completed trajectories to improve language agent training efficiency and performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Outcome-based reinforcement learning provides a stable…
20 -
Hugging Face Daily Papers research 4d ago
DanceOPD: On-Policy Generative Field Distillation
Abstract A novel on-policy generative field distillation framework called DanceOPD is proposed to unify text-to-image generation, local editing, and global editing capabilities in flow-matching models through capability-specific routing and velocity-based training. Generated by…
10 -
Hugging Face Daily Papers research 4d ago
Physics Question Scene Graph: Fine-grained Evaluation of Physical Plausibility in Text-to-Video Generation
Abstract A vision-language model-based hierarchical question graph framework evaluates video generation models' adherence to physical laws with granular violation detection and human correlation validation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video generation models are…
23 -
Hugging Face Daily Papers research 4d ago
Do Thinking Tokens Help with Safety?
Abstract Research reveals that reasoning models' safety outcomes are predictable from early hidden representations, with deliberation appearing but not substantially influencing final responses, and current safety interventions inadvertently suppress genuine deliberation…
25 -
Hugging Face Daily Papers research 4d ago
Forecasting Future Behavior as a Learning Task
Abstract Behavior Forecasters are trained to predict large reasoning model outputs from single trajectories, outperforming large language models while requiring significantly less computational cost. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Trust in an AI system is often…
24 -
Hugging Face Daily Papers research 4d ago
Plans Don't Persist: Why Context Management Is Load Bearing for LLM Agents
Abstract Standard LLM agents rely on plan content remaining in context rather than maintaining it as persistent state, with evidence shown through replay pairing diagnostics and compression stress tests. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Long-horizon agents depend on…
27 -
Hugging Face Daily Papers research 4d ago
Speaker Identity in Non-Verbal Vocalizations: Conditional Distillation and Mixture of Experts Approach
Abstract A novel speaker verification framework combines frozen self-supervised features with ECAPA-TDNN and MoE modules to improve identity verification across both speech and non-verbal vocalizations while maintaining speech performance. Generated by…
30 -
Hugging Face Daily Papers research 4d ago
Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching
Abstract Lite Any Stereo V2 (LAS2) presents an efficient stereo matching approach that achieves state-of-the-art accuracy with significantly reduced latency through optimized architecture and training strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Recent advances in…
9 -
Hugging Face Daily Papers research 4d ago
PrivacyAlign: Contextual Privacy Alignment for LLM Agents
Abstract Researchers develop a human-centered approach to align AI agents with privacy norms by creating a comprehensive dataset of privacy judgments and using annotation-conditioned reward modeling to improve agent behavior. Generated by Qwen/Qwen2.5-Coder-32B-Instruct AI…
7 -
Hugging Face Daily Papers research 4d ago
What Intermediate Layers Know: Detecting Jailbreaks from Entropy Dynamics
Abstract Jailbreak attacks expose vulnerabilities in aligned large language models, revealing that harmful intent is encoded in structured intermediate uncertainty dynamics rather than output representations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Jailbreak attacks reveal…
23 -
Hugging Face Daily Papers research 4d ago
Distill Once, Adapt Life-Long: Exploring Dataset Distillation for Continual Test-Time Adaptation
Abstract DO-ALL is a test-time adaptation framework that uses dataset distillation to create synthetic anchors for stable long-term model performance without retaining source data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Continual Test-Time Adaptation (CTTA) aims to…
20 -
Hugging Face Daily Papers research 4d ago
ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation
Abstract ReNIO enhances on-policy distillation for language models by reweighting negative trajectories based on token-level probability ratios, improving reasoning performance in mathematical and code generation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct On-policy…
25 -
Hugging Face Daily Papers research 4d ago
Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints
Abstract Tool Suppression occurs when JSON Schema constraints and tool calling are jointly enabled, preventing open-weight models from invoking tools despite maintaining schema compliance, with the issue stemming from grammar-based token masking that makes tool-call tokens…
5 -
Hugging Face Daily Papers research 4d ago
Autodata: An agentic data scientist to create high quality synthetic data
Abstract Autodata enables AI agents to function as data scientists who create high-quality training data through meta-optimization, demonstrating improved performance across multiple task domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce Autodata, a general…
30 -
-
Hugging Face Daily Papers research 4d ago
Improved Large Language Diffusion Models
Abstract Masked diffusion language models with fully bidirectional attention outperform autoregressive counterparts on various benchmarks while maintaining competitiveness with established models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern large language models are…
18 -
Hugging Face Daily Papers research 4d ago
MVTrack4Gen: Multi-View Point Tracking as Geometric Supervision for 4D Video Generation
Abstract A novel-view video synthesis method that enhances motion-aware diffusion models through multi-view point tracking supervision to improve geometric consistency and motion fidelity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Synthesizing a novel-view video from a…
37 -
Hugging Face Daily Papers research 4d ago
ShutterMuse: Capture-Time Photography Guidance with MLLMs
Abstract Researchers developed a new benchmark and dataset for photography assistance, along with a unified multimodal model that provides both composition guidance and pose recommendations during image capture. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Real-world photography…
12 -
Hugging Face Daily Papers research 4d ago
The Hitchhiker's Guide to Agentic AI: From Foundations to Systems
Abstract The book provides a comprehensive guide to building autonomous AI systems, covering foundational elements like transformer architecture and training methods, along with advanced topics such as reinforcement learning, agent architectures, and production deployment.…
5 -
Hugging Face Daily Papers research 4d ago
RL-Index: Reinforcement Learning for Retrieval Index Reasoning
Abstract RL-Index introduces an agentic indexing framework that shifts reasoning from query time to indexing stage by using LLM-generated rationales and reinforcement learning to improve retrieval effectiveness and reduce latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
25 -
Hugging Face Daily Papers research 4d ago
CAVEWOMAN: How Large Language Models Behave Under Linguistic Input and Output Compression
Abstract Two-channel evaluation shows output compression reduces costs while input compression increases costs and degrades accuracy across models and datasets. Generated by Qwen/Qwen2.5-Coder-32B-Instruct "Talk short. Drop grammar. Save token." This caveman style is widely…
28 -
Hugging Face Daily Papers research 5d ago
When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents
Abstract LLM agents frequently select higher-privilege tools unnecessarily, and while safety alignment doesn't ensure least-privilege choices, a post-training defense can reduce excessive privilege use without sacrificing performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
26 -
Hugging Face Daily Papers research 5d ago
UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating
Abstract UnityShots is a memory-driven audio-video generation system that maintains consistent subject appearance and audio across video cuts using fixed-size long-term and short-term memory slots with boundary-conditioned gates and discrete cut-type priors. Generated by…
7 -
Hugging Face Daily Papers research 5d ago
V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning
Abstract A novel label-free framework for visual reasoning called V-Zero is presented, which uses contrastive evidence gating to improve fine-grained visual reasoning without requiring annotated answer labels, achieving faster training than traditional methods. Generated by…
12 -
Hugging Face Daily Papers research 5d ago
EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies
Abstract EBench is a comprehensive simulation benchmark for evaluating generalist mobile manipulation policies across diverse tasks and dimensions, revealing distinct capability profiles and generalization patterns among state-of-the-art models. Generated by…
18 -
Hugging Face Daily Papers research 5d ago
TryOnCrafter: Unleashing Camera Trajectories for Realistic Video Virtual Try-on via a Renderable 4D Try-on Proxy
Abstract Camera-controllable video virtual try-on framework uses a 4D proxy with explicit human-environment decoupling and DiT-based video generation for omnidirectional viewing. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While Video Virtual Try-on (VVT) has achieved…
4 -
Hugging Face Daily Papers research 5d ago
Are We Ready For An Agent-Native Memory System?
Abstract Large language model agents' memory systems have evolved into complex data management frameworks requiring systematic evaluation across multiple modules and workloads to understand their performance characteristics and trade-offs. Generated by…
7