Hugging Face Daily Papers
500 articles archived · Visit source ↗ · RSS
-
Hugging Face Daily Papers research 13d ago
A Gradient Perspective on RLVR Stability and Winner Advantage Policy Optimization
Abstract Training instability in reinforcement learning with verifiable rewards is analyzed through token-level gradient dynamics, leading to a stable policy optimization method that updates only on positive-advantage completions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
20 -
Hugging Face Daily Papers research 13d ago
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?
Abstract End-to-end game generation presents significant challenges for coding agents, requiring them to create complete playable games from natural language descriptions while meeting specific evaluation criteria for engine grounding, artifact completeness, and interactive…
31 -
Hugging Face Daily Papers research 13d ago
Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification
Abstract UniAR presents a unified autoregressive framework that uses a single discrete visual tokenizer to bridge visual understanding and generation, achieving state-of-the-art results in image generation and editing through multi-level feature fusion, bitwise quantization, and…
19 -
Hugging Face Daily Papers research 13d ago
Looped World Models
Abstract Looped World Models introduce iterative latent state refinement through shared transformer blocks, achieving 100x parameter efficiency while adapting computational depth to prediction complexity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current world models face a…
14 -
Hugging Face Daily Papers research 13d ago
TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs
Abstract A framework called TRIAGE is proposed to improve clinical early warning systems by training large language models to generate dialectical reasoning for continuous risk scoring with better calibration and interpretability. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
29 -
Hugging Face Daily Papers research 13d ago
LectūraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching
Abstract LectūraAgents is a multi-agent framework that enables personalized learning through adaptive embodied teaching by mimicking professor-student interactions and generating coordinated teaching actions aligned with learner profiles. Generated by…
9 -
Hugging Face Daily Papers research 13d ago
Aligning Quantum Operators with Large Language Models
Abstract Large language models can be adapted to understand quantum operators by mapping unitary matrices into their latent space, enabling quantum circuit synthesis and language-conditioned gate constraint specification. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Can Large…
19 -
Hugging Face Daily Papers research 13d ago
Attacks on Machine-Text Detectors Retain Stylistic Fingerprints
Abstract Machine-text detection remains challenging despite evasion techniques, but stylistic features can provide robust defense when analyzed across multiple documents rather than individual instances. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Despite considerable progress…
17 -
Hugging Face Daily Papers research 13d ago
You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences
Abstract Temporal Difference in Vision (TDV) presents a novel self-supervised learning approach for video data that eliminates traditional inductive biases by leveraging causal relationships between past and future frames. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Progress in…
30 -
Hugging Face Daily Papers research 13d ago
Track2View: 4D-Consistent Camera-Controlled Video Generation via Paired 3D Point Tracks
Abstract Track2View generates novel camera viewpoints from videos by using 3D point tracks to establish explicit spatiotemporal correspondences, achieving superior visual quality and camera accuracy compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
9 -
Hugging Face Daily Papers research 13d ago
ExpRL: Exploratory RL for LLM Mid-Training
Abstract ExpRL uses human-written question-answer data as reward scaffolds to provide automated reinforcement learning priming for language models, outperforming traditional methods on math reasoning tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse reward reinforcement…
23 -
Hugging Face Daily Papers research 13d ago
Human Universal Grasping
Abstract A flow-matching model generates diverse human grasps from RGB-D images, enabling zero-shot robotic grasping with improved performance over existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Humans can grasp objects effortlessly, whereas multi-fingered robots…
25 -
Hugging Face Daily Papers research 13d ago
EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video
Abstract EgoPhys enables deformable digital twin generation from egocentric RGB video by using generalizable priors and compact codebooks to predict dense spring stiffness fields without per-spring optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Humans naturally…
33 -
Hugging Face Daily Papers research 13d ago
Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders
Abstract Sparse autoencoders exhibit feature stability patterns where stable features carry most predictive signal while unstable features reflect reproducible low-dimensional structure despite individual non-reproducibility. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse…
13 -
Hugging Face Daily Papers research 13d ago
LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies
Abstract LaWAM enables efficient robot control by predicting compact latent visual subgoals instead of expensive video generation, achieving high performance with reduced computational latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision-Language-Action models (VLAs)…
33 -
Hugging Face Daily Papers research 13d ago
MVEB: Massive Video Embedding Benchmark
Abstract A large-scale video embedding benchmark evaluates diverse models across multiple video understanding tasks, revealing that different model architectures excel in specific domains and demonstrating the nuanced impact of audio on performance based on dataset…
7 -
Hugging Face Daily Papers research 13d ago
Artificial Intelligence Index Report 2026
Abstract Welcome to the ninth edition of the AI Index report. As AI continues to advance rapidly, the question becomes whether the systems built around it can keep up. Governance frameworks, evaluation methods, education systems, and the data infrastructure needed to track AI's…
32 -
Hugging Face Daily Papers research 13d ago
Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation
Abstract Qwen-RobotWorld is a language-conditioned video world model that predicts future visual trajectories across multiple robotic domains using a double-stream diffusion transformer and embodied world knowledge corpus. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We…
5 -
Hugging Face Daily Papers research 13d ago
Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale
Abstract Ling-2.6 and Ring-2.6 models are presented as scalable solutions for agentic intelligence, featuring architectural upgrades and specialized training methods to balance fast response times with advanced reasoning capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
34 -
Hugging Face Daily Papers research 13d ago
The Ghosts of Polymarket: When Off-Chain Matches Meet On-Chain Reverts
Abstract Polymarket has emerged as a prominent prediction market platform and one of the fastest-growing applications in DeFi. To achieve low-latency trading, it adopts a hybrid architecture that matches orders off-chain but settles them on-chain for final execution. This design…
18 -
Hugging Face Daily Papers research 13d ago
Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning
Abstract Prompt-Level Distillation extracts reasoning patterns from teacher models to enhance student model performance while maintaining interpretability and reducing latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Advanced reasoning typically requires Chain-of-Thought…
18 -
Hugging Face Daily Papers research 14d ago
Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes
Abstract Hierarchical Advantage-Weighted Behavior Cloning (HABC) addresses sparse reward challenges in robot learning by separately optimizing viability and efficiency objectives through adaptive critic heads and intervention-aware credit assignment, significantly improving…
9 -
Hugging Face Daily Papers research 14d ago
Memento: Reconstruct to Remember for Consistent Long Video Generation
Abstract Memento is a subject-reconstruction-guided framework that improves long-form video generation by preserving recurring subjects through memory-based reconstruction and dual-query mechanisms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Long-form video generation requires…
17 -
Hugging Face Daily Papers research 14d ago
SP^3: Spherical Priors for Plug-and-Play Restoration
Abstract SP³ uses spherical encoders as generative priors to accelerate maximum a posteriori image restoration, enabling fast convergence and high-quality results through structured latent space projections. Generated by Qwen/Qwen2.5-Coder-32B-Instruct In this paper, we…
17 -
Hugging Face Daily Papers research 14d ago
GD^2PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization
Abstract Multi-dimensional reward optimization in large language models is enhanced through a conflict-aware filtering mechanism that prevents signal cancellation and accelerates reinforcement learning efficiency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As LLMs advance,…
12 -
Hugging Face Daily Papers research 14d ago
MMDiff: Extending Diffusion Transformers for Multi-Modal Generation
Abstract MMDiff transforms frozen diffusion transformers into multi-modal generative systems that produce images and perceptual modalities using lightweight decoders, achieving improved semantic segmentation through multi-timestep feature fusion and spatial aggregation.…
32 -
Hugging Face Daily Papers research 14d ago
Selective Control under Noisy Perception: Governance Failures Hidden by Aggregate Metrics in Modular Networks
Abstract Content moderation systems can cause disproportionate harm to bridge users connecting separate communities, even when overall accuracy metrics appear satisfactory, with governance loss increasing significantly under false-positive-heavy conditions. Generated by…
11 -
Hugging Face Daily Papers research 14d ago
Who Flips? Self- and Cross-Model Counterarguments Reveal Answer Instability in LLMs
Abstract Answer stability in large language models is evaluated through controlled challenges that measure response consistency when correct answers face plausible counterarguments, revealing significant variation in model reliability beyond traditional accuracy metrics.…
9 -
Hugging Face Daily Papers research 14d ago
Where Did It Go Wrong? Process-Level Evaluation of Web Agents with Semantic State Tracking
Abstract WebStep benchmark enables process-level analysis of web agents through semantic MDP tracking, revealing detailed performance differences and error localization that terminal success metrics miss. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Web agents act through long…
28 -
Hugging Face Daily Papers research 14d ago
PermaVid: Consistent Video Generation Across Edits via Disentangled Context Memory
Abstract PermaVid addresses long-term video consistency after edits by using multi-modal memory banks that separate appearance and geometric structure, enabling coherent video generation across time and viewpoints. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Consistent video…
30 -
Hugging Face Daily Papers research 14d ago
Geometric Action Model for Robot Policy Learning
Abstract A geometric action model leverages pretrained geometric foundation models to enable language-conditioned manipulation policies with improved accuracy, robustness, and efficiency in 3D physical environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generalist robot…
21 -
Hugging Face Daily Papers research 14d ago
Implicit Reasoning for Large Language Model-based Generative Recommendation
Abstract Large Language Models for generative recommendation face challenges with semantic IDs disrupting natural-language reasoning, prompting a lightweight implicit reasoning approach that outperforms explicit methods while reducing computational costs. Generated by…
16 -
Hugging Face Daily Papers research 14d ago
PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions
Abstract PhoneHarness presents a mixed-action benchmark and execution framework for evaluating phone-use agents on verifiable mobile workflows, demonstrating superior performance over existing approaches through deterministic action routing and auditable execution traces.…
13 -
Hugging Face Daily Papers research 14d ago
Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving
Abstract Multi-turn large language model serving faces memory constraints due to growing key-value cache, but a structured approach to non-uniform compression enables significant throughput improvements through static budget allocation and optimized memory management. Generated…
14 -
Hugging Face Daily Papers research 14d ago
JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence
Abstract A vision-language model operates continuously in real-time, making autonomous decisions about when to respond or delegate, enabling interactive systems that perceive and act upon environmental changes without user prompting. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
17 -
Hugging Face Daily Papers research 14d ago
Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Abstract Nemotron 3 Ultra is a large-scale language model featuring hybrid Mamba-Attention architecture with 550 billion parameters, achieving high inference throughput and extended context length through specialized training techniques. Generated by…
5 -
Hugging Face Daily Papers research 14d ago
VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models
Abstract VibeThinker-3B demonstrates that compact models can achieve state-of-the-art performance on verifiable reasoning tasks through specialized training techniques, challenging conventional scaling assumptions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct This technical…
16 -
Hugging Face Daily Papers research 14d ago
OneRank: Unified Transformer-Native Ranking Architecture for Multi-Task Recommendation
Abstract OneRank presents a Transformer-native multi-task learning framework that integrates feature encoding and prediction to reduce inter-task interference and improve ranking performance in recommender systems. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-task learning…
26 -
Hugging Face Daily Papers research 14d ago
BadWorld: Adversarial Attacks on World Models
Abstract BadWorld is a label-free adversarial framework that reveals structural vulnerabilities in visual world models by generating imperceptible perturbations that cause catastrophic failures in future rollouts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Visual world models…
26 -
Hugging Face Daily Papers research 14d ago
BRDFusion: Physics Meets Generation for Urban Scene Inverse Rendering
Abstract BRDFusion combines physical modeling and generative priors to achieve high-quality inverse and forward rendering of urban scenes with precise control and artifact reduction. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Inverse rendering of urban scenes from captured…
14 -
Hugging Face Daily Papers research 14d ago
Retrieve, Don't Retrain: Extending Vision Language Action Models to New Tasks at Test Time
Abstract Retrieval-augmented vision-language-action policies eliminate per-task fine-tuning costs by using pre-trained models with indexed demonstrations, enabling efficient cross-embodiment generalization and task adaptation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
26 -
Hugging Face Daily Papers research 14d ago
Who Should Lead Decoding Now? Tracking Reliable Trajectories for Ensembling Masked Diffusion Language Models
Abstract Masked diffusion language models exhibit unique decoding dynamics where reliable trajectories show stable confidence patterns, enabling iterative ensemble methods that transfer partially denoised sequences between models based on confidence evolution. Generated by…
28 -
Hugging Face Daily Papers research 14d ago
DreamX-World 1.0: A General-Purpose Interactive World Model
Abstract DreamX-World 1.0 is a interactive text/image-to-video model that generates long-horizon content with camera control and scene persistence using specialized encoding, training techniques, and optimization methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct DreamX-World…
26 -
Hugging Face Daily Papers research 14d ago
TuneJury: An Open Metric for Improving Music Generation Preference Alignment
Abstract A novel open-source pairwise reward model for text-to-music generation that provides calibrated preference scoring and generalizes across multiple downstream applications through a frozen reward mechanism. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce…
5 -
Hugging Face Daily Papers research 14d ago
CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks?
Abstract Advanced agents struggle to effectively integrate data discovery with code execution in data-intensive environments, revealing a significant gap in current agentic capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Advanced agents are increasingly demonstrating…
6 -
Hugging Face Daily Papers research 14d ago
UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer
Abstract UniDDT addresses key challenges in unified multimodal models by leveraging a Noisy ViT encoder and LLM for semantic encoding while using separate diffusion decoders to balance visual understanding and generation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
12 -
Hugging Face Daily Papers research 14d ago
FastContext: Training Efficient Repository Explorer for Coding Agents
Abstract FastContext separates repository exploration from code solving in LLM agents using specialized exploration models that reduce token consumption and improve resolution rates. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large Language Model (LLM) coding agents have…
19 -
Hugging Face Daily Papers research 14d ago
TokenPilot: Cache-Efficient Context Management for LLM Agents
Abstract TokenPilot is a dual-granularity context management framework that reduces inference costs in long-horizon LLM sessions by stabilizing prompt prefixes and conservatively managing context segments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As LLM agents are deployed…
28 -
Hugging Face Daily Papers research 14d ago
VisualClaw: A Real-Time, Personalized Agent for the Physical World
Abstract VisualClaw is a self-evolving multimodal agent that reduces deployment costs through hybrid encoding and skill evolution while improving video-QA accuracy across multiple benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision language models are serving as…
32 -
Hugging Face Daily Papers research 17d ago
On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance
Abstract Large language models exhibit limited ability to correct zero-shot errors through prompting, with model performance more strongly linked to definition-specific familiarity than text-level memorization metrics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large Language…
5