Hugging Face Daily Papers

500 articles archived · Visit source ↗ · RSS

Hugging Face Daily Papers research 13d ago

A Gradient Perspective on RLVR Stability and Winner Advantage Policy Optimization

Abstract Training instability in reinforcement learning with verifiable rewards is analyzed through token-level gradient dynamics, leading to a stable policy optimization method that updates only on positive-advantage completions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

20
Hugging Face Daily Papers research 13d ago

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

Abstract End-to-end game generation presents significant challenges for coding agents, requiring them to create complete playable games from natural language descriptions while meeting specific evaluation criteria for engine grounding, artifact completeness, and interactive…

31
Hugging Face Daily Papers research 13d ago

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

Abstract UniAR presents a unified autoregressive framework that uses a single discrete visual tokenizer to bridge visual understanding and generation, achieving state-of-the-art results in image generation and editing through multi-level feature fusion, bitwise quantization, and…

19
Hugging Face Daily Papers research 13d ago

Looped World Models

Abstract Looped World Models introduce iterative latent state refinement through shared transformer blocks, achieving 100x parameter efficiency while adapting computational depth to prediction complexity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current world models face a…

14
Hugging Face Daily Papers research 13d ago

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

Abstract A framework called TRIAGE is proposed to improve clinical early warning systems by training large language models to generate dialectical reasoning for continuous risk scoring with better calibration and interpretability. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

29
Hugging Face Daily Papers research 13d ago

LectūraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching

Abstract LectūraAgents is a multi-agent framework that enables personalized learning through adaptive embodied teaching by mimicking professor-student interactions and generating coordinated teaching actions aligned with learner profiles. Generated by…

9
Hugging Face Daily Papers research 13d ago

Aligning Quantum Operators with Large Language Models

Abstract Large language models can be adapted to understand quantum operators by mapping unitary matrices into their latent space, enabling quantum circuit synthesis and language-conditioned gate constraint specification. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Can Large…

19
Hugging Face Daily Papers research 13d ago

Attacks on Machine-Text Detectors Retain Stylistic Fingerprints

Abstract Machine-text detection remains challenging despite evasion techniques, but stylistic features can provide robust defense when analyzed across multiple documents rather than individual instances. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Despite considerable progress…

17
Hugging Face Daily Papers research 13d ago

You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences

Abstract Temporal Difference in Vision (TDV) presents a novel self-supervised learning approach for video data that eliminates traditional inductive biases by leveraging causal relationships between past and future frames. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Progress in…

30
Hugging Face Daily Papers research 13d ago

Track2View: 4D-Consistent Camera-Controlled Video Generation via Paired 3D Point Tracks

Abstract Track2View generates novel camera viewpoints from videos by using 3D point tracks to establish explicit spatiotemporal correspondences, achieving superior visual quality and camera accuracy compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

9
Hugging Face Daily Papers research 13d ago

ExpRL: Exploratory RL for LLM Mid-Training

Abstract ExpRL uses human-written question-answer data as reward scaffolds to provide automated reinforcement learning priming for language models, outperforming traditional methods on math reasoning tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse reward reinforcement…

23
Hugging Face Daily Papers research 13d ago

Human Universal Grasping

Abstract A flow-matching model generates diverse human grasps from RGB-D images, enabling zero-shot robotic grasping with improved performance over existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Humans can grasp objects effortlessly, whereas multi-fingered robots…

25
Hugging Face Daily Papers research 13d ago

EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video

Abstract EgoPhys enables deformable digital twin generation from egocentric RGB video by using generalizable priors and compact codebooks to predict dense spring stiffness fields without per-spring optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Humans naturally…

33
Hugging Face Daily Papers research 13d ago

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

Abstract Sparse autoencoders exhibit feature stability patterns where stable features carry most predictive signal while unstable features reflect reproducible low-dimensional structure despite individual non-reproducibility. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse…

13
Hugging Face Daily Papers research 13d ago

LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies

Abstract LaWAM enables efficient robot control by predicting compact latent visual subgoals instead of expensive video generation, achieving high performance with reduced computational latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision-Language-Action models (VLAs)…

33
Hugging Face Daily Papers research 13d ago

MVEB: Massive Video Embedding Benchmark

Abstract A large-scale video embedding benchmark evaluates diverse models across multiple video understanding tasks, revealing that different model architectures excel in specific domains and demonstrating the nuanced impact of audio on performance based on dataset…

7
Hugging Face Daily Papers research 13d ago

Artificial Intelligence Index Report 2026

Abstract Welcome to the ninth edition of the AI Index report. As AI continues to advance rapidly, the question becomes whether the systems built around it can keep up. Governance frameworks, evaluation methods, education systems, and the data infrastructure needed to track AI's…

32
Hugging Face Daily Papers research 13d ago

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

Abstract Qwen-RobotWorld is a language-conditioned video world model that predicts future visual trajectories across multiple robotic domains using a double-stream diffusion transformer and embodied world knowledge corpus. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We…

5
Hugging Face Daily Papers research 13d ago

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale

Abstract Ling-2.6 and Ring-2.6 models are presented as scalable solutions for agentic intelligence, featuring architectural upgrades and specialized training methods to balance fast response times with advanced reasoning capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

34
Hugging Face Daily Papers research 13d ago

The Ghosts of Polymarket: When Off-Chain Matches Meet On-Chain Reverts

Abstract Polymarket has emerged as a prominent prediction market platform and one of the fastest-growing applications in DeFi. To achieve low-latency trading, it adopts a hybrid architecture that matches orders off-chain but settles them on-chain for final execution. This design…

18
Hugging Face Daily Papers research 13d ago

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Abstract Prompt-Level Distillation extracts reasoning patterns from teacher models to enhance student model performance while maintaining interpretability and reducing latency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Advanced reasoning typically requires Chain-of-Thought…

18
Hugging Face Daily Papers research 14d ago

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Abstract Hierarchical Advantage-Weighted Behavior Cloning (HABC) addresses sparse reward challenges in robot learning by separately optimizing viability and efficiency objectives through adaptive critic heads and intervention-aware credit assignment, significantly improving…

9
Hugging Face Daily Papers research 14d ago

Memento: Reconstruct to Remember for Consistent Long Video Generation

Abstract Memento is a subject-reconstruction-guided framework that improves long-form video generation by preserving recurring subjects through memory-based reconstruction and dual-query mechanisms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Long-form video generation requires…

17
Hugging Face Daily Papers research 14d ago

SP^3: Spherical Priors for Plug-and-Play Restoration

Abstract SP³ uses spherical encoders as generative priors to accelerate maximum a posteriori image restoration, enabling fast convergence and high-quality results through structured latent space projections. Generated by Qwen/Qwen2.5-Coder-32B-Instruct In this paper, we…

17
Hugging Face Daily Papers research 14d ago

GD^2PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

Abstract Multi-dimensional reward optimization in large language models is enhanced through a conflict-aware filtering mechanism that prevents signal cancellation and accelerates reinforcement learning efficiency. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As LLMs advance,…

12
Hugging Face Daily Papers research 14d ago

MMDiff: Extending Diffusion Transformers for Multi-Modal Generation

Abstract MMDiff transforms frozen diffusion transformers into multi-modal generative systems that produce images and perceptual modalities using lightweight decoders, achieving improved semantic segmentation through multi-timestep feature fusion and spatial aggregation.…

32
Hugging Face Daily Papers research 14d ago

Selective Control under Noisy Perception: Governance Failures Hidden by Aggregate Metrics in Modular Networks

Abstract Content moderation systems can cause disproportionate harm to bridge users connecting separate communities, even when overall accuracy metrics appear satisfactory, with governance loss increasing significantly under false-positive-heavy conditions. Generated by…

11
Hugging Face Daily Papers research 14d ago

Who Flips? Self- and Cross-Model Counterarguments Reveal Answer Instability in LLMs

Abstract Answer stability in large language models is evaluated through controlled challenges that measure response consistency when correct answers face plausible counterarguments, revealing significant variation in model reliability beyond traditional accuracy metrics.…

9
Hugging Face Daily Papers research 14d ago

Where Did It Go Wrong? Process-Level Evaluation of Web Agents with Semantic State Tracking

Abstract WebStep benchmark enables process-level analysis of web agents through semantic MDP tracking, revealing detailed performance differences and error localization that terminal success metrics miss. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Web agents act through long…

28
Hugging Face Daily Papers research 14d ago

PermaVid: Consistent Video Generation Across Edits via Disentangled Context Memory

Abstract PermaVid addresses long-term video consistency after edits by using multi-modal memory banks that separate appearance and geometric structure, enabling coherent video generation across time and viewpoints. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Consistent video…

30
Hugging Face Daily Papers research 14d ago

Geometric Action Model for Robot Policy Learning

Abstract A geometric action model leverages pretrained geometric foundation models to enable language-conditioned manipulation policies with improved accuracy, robustness, and efficiency in 3D physical environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Generalist robot…

21
Hugging Face Daily Papers research 14d ago

Implicit Reasoning for Large Language Model-based Generative Recommendation

Abstract Large Language Models for generative recommendation face challenges with semantic IDs disrupting natural-language reasoning, prompting a lightweight implicit reasoning approach that outperforms explicit methods while reducing computational costs. Generated by…

16
Hugging Face Daily Papers research 14d ago

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Abstract PhoneHarness presents a mixed-action benchmark and execution framework for evaluating phone-use agents on verifiable mobile workflows, demonstrating superior performance over existing approaches through deterministic action routing and auditable execution traces.…

13
Hugging Face Daily Papers research 14d ago

Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving

Abstract Multi-turn large language model serving faces memory constraints due to growing key-value cache, but a structured approach to non-uniform compression enables significant throughput improvements through static budget allocation and optimized memory management. Generated…

14
Hugging Face Daily Papers research 14d ago

JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence

Abstract A vision-language model operates continuously in real-time, making autonomous decisions about when to respond or delegate, enabling interactive systems that perceive and act upon environmental changes without user prompting. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

17
Hugging Face Daily Papers research 14d ago

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Abstract Nemotron 3 Ultra is a large-scale language model featuring hybrid Mamba-Attention architecture with 550 billion parameters, achieving high inference throughput and extended context length through specialized training techniques. Generated by…

5
Hugging Face Daily Papers research 14d ago

VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models

Abstract VibeThinker-3B demonstrates that compact models can achieve state-of-the-art performance on verifiable reasoning tasks through specialized training techniques, challenging conventional scaling assumptions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct This technical…

16
Hugging Face Daily Papers research 14d ago

OneRank: Unified Transformer-Native Ranking Architecture for Multi-Task Recommendation

Abstract OneRank presents a Transformer-native multi-task learning framework that integrates feature encoding and prediction to reduce inter-task interference and improve ranking performance in recommender systems. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-task learning…

26
Hugging Face Daily Papers research 14d ago

BadWorld: Adversarial Attacks on World Models

Abstract BadWorld is a label-free adversarial framework that reveals structural vulnerabilities in visual world models by generating imperceptible perturbations that cause catastrophic failures in future rollouts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Visual world models…

26
Hugging Face Daily Papers research 14d ago

BRDFusion: Physics Meets Generation for Urban Scene Inverse Rendering

Abstract BRDFusion combines physical modeling and generative priors to achieve high-quality inverse and forward rendering of urban scenes with precise control and artifact reduction. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Inverse rendering of urban scenes from captured…

14
Hugging Face Daily Papers research 14d ago

Retrieve, Don't Retrain: Extending Vision Language Action Models to New Tasks at Test Time

Abstract Retrieval-augmented vision-language-action policies eliminate per-task fine-tuning costs by using pre-trained models with indexed demonstrations, enabling efficient cross-embodiment generalization and task adaptation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

26
Hugging Face Daily Papers research 14d ago

Who Should Lead Decoding Now? Tracking Reliable Trajectories for Ensembling Masked Diffusion Language Models

Abstract Masked diffusion language models exhibit unique decoding dynamics where reliable trajectories show stable confidence patterns, enabling iterative ensemble methods that transfer partially denoised sequences between models based on confidence evolution. Generated by…

28
Hugging Face Daily Papers research 14d ago

DreamX-World 1.0: A General-Purpose Interactive World Model

Abstract DreamX-World 1.0 is a interactive text/image-to-video model that generates long-horizon content with camera control and scene persistence using specialized encoding, training techniques, and optimization methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct DreamX-World…

26
Hugging Face Daily Papers research 14d ago

TuneJury: An Open Metric for Improving Music Generation Preference Alignment

Abstract A novel open-source pairwise reward model for text-to-music generation that provides calibrated preference scoring and generalizes across multiple downstream applications through a frozen reward mechanism. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We introduce…

5
Hugging Face Daily Papers research 14d ago

CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks?

Abstract Advanced agents struggle to effectively integrate data discovery with code execution in data-intensive environments, revealing a significant gap in current agentic capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Advanced agents are increasingly demonstrating…

6
Hugging Face Daily Papers research 14d ago

UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer

Abstract UniDDT addresses key challenges in unified multimodal models by leveraging a Noisy ViT encoder and LLM for semantic encoding while using separate diffusion decoders to balance visual understanding and generation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

12
Hugging Face Daily Papers research 14d ago

FastContext: Training Efficient Repository Explorer for Coding Agents

Abstract FastContext separates repository exploration from code solving in LLM agents using specialized exploration models that reduce token consumption and improve resolution rates. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large Language Model (LLM) coding agents have…

19
Hugging Face Daily Papers research 14d ago

TokenPilot: Cache-Efficient Context Management for LLM Agents

Abstract TokenPilot is a dual-granularity context management framework that reduces inference costs in long-horizon LLM sessions by stabilizing prompt prefixes and conservatively managing context segments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As LLM agents are deployed…

28
Hugging Face Daily Papers research 14d ago

VisualClaw: A Real-Time, Personalized Agent for the Physical World

Abstract VisualClaw is a self-evolving multimodal agent that reduces deployment costs through hybrid encoding and skill evolution while improving video-QA accuracy across multiple benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision language models are serving as…

32
Hugging Face Daily Papers research 17d ago

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

Abstract Large language models exhibit limited ability to correct zero-shot errors through prompting, with model performance more strongly linked to definition-specific familiarity than text-level memorization metrics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large Language…

5

A Gradient Perspective on RLVR Stability and Winner Advantage Policy Optimization

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

Looped World Models

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

LectūraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching

Aligning Quantum Operators with Large Language Models

Attacks on Machine-Text Detectors Retain Stylistic Fingerprints

You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences

Track2View: 4D-Consistent Camera-Controlled Video Generation via Paired 3D Point Tracks

ExpRL: Exploratory RL for LLM Mid-Training

Human Universal Grasping

EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies

MVEB: Massive Video Embedding Benchmark

Artificial Intelligence Index Report 2026

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale

The Ghosts of Polymarket: When Off-Chain Matches Meet On-Chain Reverts

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Memento: Reconstruct to Remember for Consistent Long Video Generation

SP^3: Spherical Priors for Plug-and-Play Restoration

GD^2PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

MMDiff: Extending Diffusion Transformers for Multi-Modal Generation

Selective Control under Noisy Perception: Governance Failures Hidden by Aggregate Metrics in Modular Networks

Who Flips? Self- and Cross-Model Counterarguments Reveal Answer Instability in LLMs

Where Did It Go Wrong? Process-Level Evaluation of Web Agents with Semantic State Tracking

PermaVid: Consistent Video Generation Across Edits via Disentangled Context Memory

Geometric Action Model for Robot Policy Learning

Implicit Reasoning for Large Language Model-based Generative Recommendation

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving

JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models

OneRank: Unified Transformer-Native Ranking Architecture for Multi-Task Recommendation

BadWorld: Adversarial Attacks on World Models

BRDFusion: Physics Meets Generation for Urban Scene Inverse Rendering

Retrieve, Don't Retrain: Extending Vision Language Action Models to New Tasks at Test Time

Who Should Lead Decoding Now? Tracking Reliable Trajectories for Ensembling Masked Diffusion Language Models

DreamX-World 1.0: A General-Purpose Interactive World Model

TuneJury: An Open Metric for Improving Music Generation Preference Alignment

CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks?

UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer

FastContext: Training Efficient Repository Explorer for Coding Agents

TokenPilot: Cache-Efficient Context Management for LLM Agents

VisualClaw: A Real-Time, Personalized Agent for the Physical World

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance