Hugging Face Daily Papers

500 articles archived · Visit source ↗ · RSS

Hugging Face Daily Papers research 11d ago

MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

Abstract MyPCBench evaluates computer-use agents as personal assistants in a simulated Linux desktop environment with real-world web applications, revealing that Claude Opus 4.6 achieves the highest task completion rate of 55.4% while struggles with multi-application tasks and…

29
Hugging Face Daily Papers research 11d ago

A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets

Abstract A benchmark for predicting spreadsheet user actions is introduced, addressing challenges in edit history availability and complex action spaces through manual curation and online evaluation methodology. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Predictive code…

17
Hugging Face Daily Papers research 11d ago

LLM-Enabled NWDAF: A Step Toward AI-Native 6G Network Intelligence

Abstract An open-source Network Data Analytics Function compatible with Free5GC integrates a Large Language Model interface for natural language interaction and intent-based network management. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The Network Data Analytics Function…

17
Hugging Face Daily Papers research 11d ago

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

Abstract GRPO algorithms face policy entropy collapse during training, which STARE addresses through surprisal-guided token-level advantage reweighting and target-entropy regulation to maintain stable reinforcement learning for large language models. Generated by…

13
Hugging Face Daily Papers research 11d ago

Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish

Abstract A neural morpheme-boundary model for Turkish achieves lossless tokenization and morphology-aware embeddings with improved efficiency and performance over traditional subword methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Turkish is agglutinative: meaning is…

27
Hugging Face Daily Papers research 11d ago

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

Abstract A framework automates environment redesign in reinforcement learning for large language models by having the policy analyze failures and suggest configuration changes, achieving superior performance over larger proprietary models and fixed-environment baselines.…

6
Hugging Face Daily Papers research 11d ago

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

Abstract EfficientRollout is a system-aware self-speculative decoding framework that accelerates reinforcement learning rollouts by adapting drafters to evolving policies and optimizing speculative decoding regimes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement…

36
Hugging Face Daily Papers research 11d ago

Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems

Abstract Multicultural multi-agent systems exhibit limited value diversity despite cultural alignment, with social interaction reducing diversity and compromising collective decision-making breadth. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multicultural multi-agent systems…

28
Hugging Face Daily Papers research 11d ago

PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation

Abstract PAIWorld enhances diffusion-transformer world models with geometric awareness and cross-view attention to improve multi-view 3D consistency for robotic manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World foundation models (WFMs) are powerful…

18
Hugging Face Daily Papers research 11d ago

Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness

Abstract Xcientist enables transparent and accountable AI-driven scientific research by creating persistent artifacts that track the complete research process from problem formulation to mechanism validation and revision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct AI systems…

11
Hugging Face Daily Papers research 12d ago

SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks

Abstract SciOrch is a framework that uses a lightweight orchestrator model to coordinate multiple frontier LLMs for scientific reasoning, achieving superior performance through MCTS-based training and GRPO-style optimization while reducing API costs. Generated by…

31
Hugging Face Daily Papers research 12d ago

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

Abstract RODS addresses sample depletion in multi-turn tool-use reinforcement learning by dynamically synthesizing new data based on reward variance to maintain informative training samples. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-turn tool-use RL is bottlenecked by…

21
Hugging Face Daily Papers research 12d ago

Native Active Perception as Reasoning for Omni-Modal Understanding

Abstract OmniAgent is a novel omni-modal agent that addresses long video understanding by using an iterative observation-thought-action cycle with active perception, achieving superior performance compared to larger models through efficient selective processing. Generated by…

24
Hugging Face Daily Papers research 12d ago

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

Abstract A unified framework for spatial vision-language models that combines linguistic deduction and 3D geometric reasoning through reinforcement learning, enabling robust spatial reasoning across diverse tasks and domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spatial…

9
Hugging Face Daily Papers research 12d ago

SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior

Abstract Sparse Autoencoders' feature-level interventions may appear successful but can be circumvented through residual-space optimization that recovers original behaviors, revealing limitations in using SAE features for complete behavioral control. Generated by…

25
Hugging Face Daily Papers research 12d ago

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

Abstract ViGOS is a visually grounded on-policy self-distillation framework for multimodal large language models that improves image-grounded behavior by using specialized teachers for different stages of reasoning and handling invalid rollouts. Generated by…

8
Hugging Face Daily Papers research 12d ago

CEO-Bench: Can Agents Play the Long Game?

Abstract CEO-Bench evaluates language model agents' ability to manage a simulated startup over 500 days, testing their proficiency in long-term planning, noise handling, adaptability, and multi-task coordination through a Python interface. Generated by…

5
Hugging Face Daily Papers research 12d ago

Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

Abstract Quality-aware self-distillation improves vision-language model performance for GUI grounding by enhancing coordinate-token teacher signals through correctness-aware gating and probability scaling. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Graphical user interface…

38
Hugging Face Daily Papers research 12d ago

IndustryBench-MIPU: Benchmarking Multi-Image Attribute Value Extraction for Industrial Products

Abstract IndustryBench-MIPU is introduced as the first large-scale benchmark for multi-image industrial product understanding, focusing on structured attribute extraction from heterogeneous product images to evaluate multimodal models' ability to recover dense technical…

24
Hugging Face Daily Papers research 12d ago

Kairos: A Native World Model Stack for Physical AI

Abstract Kairos is a native world model framework that learns from diverse experiences, maintains persistent states through hybrid temporal attention, and supports efficient deployment for physical AI applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World models are…

33
Hugging Face Daily Papers research 12d ago

Learning User Simulators with Turing Rewards

Abstract A reinforcement learning approach using Turing test-based rewards trains language models to generate responses indistinguishable from human users in conversational and forum discussion settings. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Learning to simulate human…

26
Hugging Face Daily Papers research 12d ago

Physics-IQ Verified

Abstract A systematic evaluation of the Physics-IQ benchmark reveals limitations in measuring physical understanding of video generative models, leading to improvements in prompt quality and sample-level scoring that enhance reliability for assessing physically accurate video…

29
Hugging Face Daily Papers research 12d ago

Guava: An Effective and Universal Harness for Embodied Manipulation

Abstract A harness framework for embodied tool use combines high-level reasoning with external modules, enabling compact models to perform complex manipulation tasks with minimal training data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Language models trained on large-scale…

15
Hugging Face Daily Papers research 12d ago

Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games

Abstract A new benchmark suite called RNG-Bench is introduced to evaluate multimodal foundation models' ability to reconstruct past observations and use them for decision-making in multi-step interactions, featuring two games with controlled difficulty parameters and a memory…

23
Hugging Face Daily Papers research 12d ago

Sumi: Open Uniform Diffusion Language Model from Scratch

Abstract A large-scale uniform diffusion language model pretrained from scratch demonstrates competitive performance on knowledge and reasoning tasks while highlighting differences in commonsense reasoning compared to autoregressive models. Generated by…

15
Hugging Face Daily Papers research 12d ago

ActWorld: From Explorable to Interactive World Model via Action-Aware Memory

Abstract ActWorld extends navigation-centric interactive world models to support object interaction through a chunk-autoregressive framework with hierarchical action-aware memory and persistent memory banks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Interactive world models…

9
Hugging Face Daily Papers research 12d ago

Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences

Abstract A unified scientific generative language model encodes diverse scientific objects and spatial interactions as token sequences, demonstrating strong performance across multiple domains through autoregressive next-token prediction. Generated by…

16
Hugging Face Daily Papers research 12d ago

Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings

Abstract SAGA framework uses multimodal large language models to provide attribute-aware supervision for vision encoders through Group Relative Policy Optimization, improving zero-shot image retrieval performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision encoders for…

21
Hugging Face Daily Papers research 12d ago

Self-Evolving Visual Questioner

Abstract A vision-language model autonomously improves its question-generation capabilities through self-evolution, enhancing both question quality and answerer performance without external supervision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision-language models (VLMs)…

10
Hugging Face Daily Papers research 12d ago

Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems

Abstract Multi-agent LLM systems with shared state are analyzed through formal methods identifying concurrency anomalies and establishing a verified consistency hierarchy with mechanized proofs of soundness and completeness. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

14
Hugging Face Daily Papers research 12d ago

The Price of Anarchy in Disaggregated Inference

Abstract Disaggregated inference architectures separate prefill and decode phases across distinct GPU pools, and a game-theoretic analysis characterizes how GPU saturation affects system performance through regime transitions and payoff structure changes, enabling an adaptive…

25
Hugging Face Daily Papers research 12d ago

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

Abstract DR-DCI framework combines retrieval with direct corpus interaction by dynamically pulling relevant documents into a local workspace, enabling scalable and efficient agentic search across large corpora. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agentic search over…

27
Hugging Face Daily Papers research 12d ago

Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning

Abstract Visual-Seeker enables visual-native multimodal deep search through active visual reasoning, outperforming proprietary models on real-world web environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal large language models (MLLMs) have demonstrated…

25
Hugging Face Daily Papers research 12d ago

EgoCS-400K: An Egocentric Gameplay Dataset for World Models

Abstract EgoCS-400K is a large-scale egocentric Counter-Strike dataset that bridges passive web videos and costly real-world embodied data by providing temporally aligned video-action-language trajectories with detailed player states and game events. Generated by…

16
Hugging Face Daily Papers research 12d ago

RepSelect: Robust LLM Unlearning via Representation Selectivity

Abstract RepSelect isolates forget-set-specific representations in LLMs by collapsing top principal components of weight gradients, achieving deeper and more robust unlearning compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Making large language models…

32
Hugging Face Daily Papers research 12d ago

RefGC-SR^2: Reference-guided Generated Content Super-Resolution and Refinement

Abstract A new reference-guided generated content super-resolution-refinement task is introduced that simultaneously recovers high-resolution details and refines generative artifacts using a frequency-aware diffusion transformer model. Generated by…

32
Hugging Face Daily Papers research 13d ago

Text-Vision Co-Instructed Image Editing

Abstract A unified text-visual image editing framework is presented that combines semantic intent from textual instructions with spatial guidance from visual prompts to achieve more precise and faithful image manipulation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing…

16
Hugging Face Daily Papers research 13d ago

Learning from the Self-future: On-policy Self-distillation for dLLMs

Abstract d-OPSD introduces a novel on-policy self-distillation framework for diffusion language models by adapting self-teacher construction and supervision mechanisms to match the non-autoregressive nature of diffusion models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

29
Hugging Face Daily Papers research 13d ago

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

Abstract Parallel loop Transformers achieve better code generation performance with two loops due to refined representations, while additional loops cause diminishing returns and increased positional mismatch costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Looped…

5
Hugging Face Daily Papers research 13d ago

Rethinking the Role of Efficient Attention in Hybrid Architectures

Abstract Hybrid architectures combining full attention with efficient attention modules like sliding-window attention exhibit distinct scaling behaviors and optimization trajectories, with efficient attention primarily affecting the emergence speed of long-context capabilities…

29
Hugging Face Daily Papers research 13d ago

Variable-Width Transformers

Abstract A novel transformer architecture with nonuniform width allocation across layers achieves better performance and efficiency compared to uniform designs by utilizing a parameter-free residual resizing mechanism. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Scaling model…

5
Hugging Face Daily Papers research 13d ago

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

Abstract ChLogic benchmark reveals persistent performance gaps between English and Chinese logical reasoning in large language models, influenced by surface realization differences and translation artifacts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models…

37
Hugging Face Daily Papers research 13d ago

MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation with Multi-turn Local Revision

Abstract MemSlides presents a hierarchical memory framework for personalized presentation agents that separates long-term user profiles, working memory for session constraints, and tool memory for reusable execution experiences to enable stable personalization and reliable local…

21
Hugging Face Daily Papers research 13d ago

MotionVLA: Vision-Language-Action Model for Humanoid Motion

Abstract A dual-stream frequency tokenizer and autoregressive model are proposed to improve humanoid motion generation by separately encoding pose and physical dynamics, achieving better diversity and consistency compared to single-codebook approaches. Generated by…

11
Hugging Face Daily Papers research 13d ago

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

Abstract Spectral Forcing, a time-conditional 2D-DCT low-pass operator, improves diffusion model efficiency by explicitly separating signal from noise in pixel-space models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Pixel-space diffusion models are trained on full-bandwidth…

32
Hugging Face Daily Papers research 13d ago

ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining

Abstract A unified Vision-Language-Action pretraining framework leverages heterogeneous data sources including human egocentric videos and robot trajectories through a reliability-aware training approach that improves performance on embodied AI tasks. Generated by…

6
Hugging Face Daily Papers research 13d ago

ProCUA-SFT Technical Report

Abstract Training computer-use agents using a large-scale synthetic dataset with automated task generation and verification achieves significantly improved performance on desktop interaction benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Training computer-use agents…

4
Hugging Face Daily Papers research 13d ago

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

Abstract Zone of Proximal Policy Optimization (ZPPO) improves knowledge distillation by using reformulated prompts that help students learn from both correct and incorrect responses, enhancing performance especially at smaller model sizes. Generated by…

32
Hugging Face Daily Papers research 13d ago

OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation

Abstract OPD-Evolver is a self-evolving agent framework that combines slow-fast co-evolution with on-policy self-distillation to enhance memory management and policy learning across multiple domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Memory has become a standard…

28
Hugging Face Daily Papers research 13d ago

Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus

Abstract Research agents face significant challenges when evidence is in a different language than the query, with performance degrading even when gold evidence is provided directly. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deep research agents are increasingly evaluated on…

28

MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets

LLM-Enabled NWDAF: A Step Toward AI-Native 6G Network Intelligence

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems

PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation

Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness

SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

Native Active Perception as Reasoning for Omni-Modal Understanding

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

CEO-Bench: Can Agents Play the Long Game?

Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

IndustryBench-MIPU: Benchmarking Multi-Image Attribute Value Extraction for Industrial Products

Kairos: A Native World Model Stack for Physical AI

Learning User Simulators with Turing Rewards

Physics-IQ Verified

Guava: An Effective and Universal Harness for Embodied Manipulation

Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games

Sumi: Open Uniform Diffusion Language Model from Scratch

ActWorld: From Explorable to Interactive World Model via Action-Aware Memory

Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences

Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings

Self-Evolving Visual Questioner

Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems

The Price of Anarchy in Disaggregated Inference

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning

EgoCS-400K: An Egocentric Gameplay Dataset for World Models

RepSelect: Robust LLM Unlearning via Representation Selectivity

RefGC-SR^2: Reference-guided Generated Content Super-Resolution and Refinement

Text-Vision Co-Instructed Image Editing

Learning from the Self-future: On-policy Self-distillation for dLLMs

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

Rethinking the Role of Efficient Attention in Hybrid Architectures

Variable-Width Transformers

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation with Multi-turn Local Revision

MotionVLA: Vision-Language-Action Model for Humanoid Motion

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining

ProCUA-SFT Technical Report

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation

Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus