Hugging Face Daily Papers
500 articles archived · Visit source ↗ · RSS
-
Hugging Face Daily Papers research 11d ago
MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents
Abstract MyPCBench evaluates computer-use agents as personal assistants in a simulated Linux desktop environment with real-world web applications, revealing that Claude Opus 4.6 achieves the highest task completion rate of 55.4% while struggles with multi-application tasks and…
29 -
Hugging Face Daily Papers research 11d ago
A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets
Abstract A benchmark for predicting spreadsheet user actions is introduced, addressing challenges in edit history availability and complex action spaces through manual curation and online evaluation methodology. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Predictive code…
17 -
Hugging Face Daily Papers research 11d ago
LLM-Enabled NWDAF: A Step Toward AI-Native 6G Network Intelligence
Abstract An open-source Network Data Analytics Function compatible with Free5GC integrates a Large Language Model interface for natural language interaction and intent-based network management. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The Network Data Analytics Function…
17 -
Hugging Face Daily Papers research 11d ago
STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability
Abstract GRPO algorithms face policy entropy collapse during training, which STARE addresses through surprisal-guided token-level advantage reweighting and target-entropy regulation to maintain stable reinforcement learning for large language models. Generated by…
13 -
Hugging Face Daily Papers research 11d ago
Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish
Abstract A neural morpheme-boundary model for Turkish achieves lossless tokenization and morphology-aware embeddings with improved efficiency and performance over traditional subword methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Turkish is agglutinative: meaning is…
27 -
Hugging Face Daily Papers research 11d ago
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning
Abstract A framework automates environment redesign in reinforcement learning for large language models by having the policy analyze failures and suggest configuration changes, achieving superior performance over larger proprietary models and fixed-environment baselines.…
6 -
Hugging Face Daily Papers research 11d ago
EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts
Abstract EfficientRollout is a system-aware self-speculative decoding framework that accelerates reinforcement learning rollouts by adapting drafters to evolving policies and optimizing speculative decoding regimes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement…
36 -
Hugging Face Daily Papers research 11d ago
Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems
Abstract Multicultural multi-agent systems exhibit limited value diversity despite cultural alignment, with social interaction reducing diversity and compromising collective decision-making breadth. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multicultural multi-agent systems…
28 -
Hugging Face Daily Papers research 11d ago
PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation
Abstract PAIWorld enhances diffusion-transformer world models with geometric awareness and cross-view attention to improve multi-view 3D consistency for robotic manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World foundation models (WFMs) are powerful…
18 -
Hugging Face Daily Papers research 11d ago
Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness
Abstract Xcientist enables transparent and accountable AI-driven scientific research by creating persistent artifacts that track the complete research process from problem formulation to mechanism validation and revision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct AI systems…
11 -
Hugging Face Daily Papers research 12d ago
SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks
Abstract SciOrch is a framework that uses a lightweight orchestrator model to coordinate multiple frontier LLMs for scientific reasoning, achieving superior performance through MCTS-based training and GRPO-style optimization while reducing API costs. Generated by…
31 -
Hugging Face Daily Papers research 12d ago
RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents
Abstract RODS addresses sample depletion in multi-turn tool-use reinforcement learning by dynamically synthesizing new data based on reward variance to maintain informative training samples. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-turn tool-use RL is bottlenecked by…
21 -
Hugging Face Daily Papers research 12d ago
Native Active Perception as Reasoning for Omni-Modal Understanding
Abstract OmniAgent is a novel omni-modal agent that addresses long video understanding by using an iterative observation-thought-action cycle with active perception, achieving superior performance compared to larger models through efficient selective processing. Generated by…
24 -
Hugging Face Daily Papers research 12d ago
Reinforcing Dual-Path Reasoning in Spatial Vision Language Models
Abstract A unified framework for spatial vision-language models that combines linguistic deduction and 3D geometric reasoning through reinforcement learning, enabling robust spatial reasoning across diverse tasks and domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spatial…
9 -
Hugging Face Daily Papers research 12d ago
SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior
Abstract Sparse Autoencoders' feature-level interventions may appear successful but can be circumvented through residual-space optimization that recovers original behaviors, revealing limitations in using SAE features for complete behavioral control. Generated by…
25 -
Hugging Face Daily Papers research 12d ago
Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation
Abstract ViGOS is a visually grounded on-policy self-distillation framework for multimodal large language models that improves image-grounded behavior by using specialized teachers for different stages of reasoning and handling invalid rollouts. Generated by…
8 -
Hugging Face Daily Papers research 12d ago
CEO-Bench: Can Agents Play the Long Game?
Abstract CEO-Bench evaluates language model agents' ability to manage a simulated startup over 500 days, testing their proficiency in long-term planning, noise handling, adaptability, and multi-task coordination through a Python interface. Generated by…
5 -
Hugging Face Daily Papers research 12d ago
Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding
Abstract Quality-aware self-distillation improves vision-language model performance for GUI grounding by enhancing coordinate-token teacher signals through correctness-aware gating and probability scaling. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Graphical user interface…
38 -
Hugging Face Daily Papers research 12d ago
IndustryBench-MIPU: Benchmarking Multi-Image Attribute Value Extraction for Industrial Products
Abstract IndustryBench-MIPU is introduced as the first large-scale benchmark for multi-image industrial product understanding, focusing on structured attribute extraction from heterogeneous product images to evaluate multimodal models' ability to recover dense technical…
24 -
Hugging Face Daily Papers research 12d ago
Kairos: A Native World Model Stack for Physical AI
Abstract Kairos is a native world model framework that learns from diverse experiences, maintains persistent states through hybrid temporal attention, and supports efficient deployment for physical AI applications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World models are…
33 -
Hugging Face Daily Papers research 12d ago
Learning User Simulators with Turing Rewards
Abstract A reinforcement learning approach using Turing test-based rewards trains language models to generate responses indistinguishable from human users in conversational and forum discussion settings. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Learning to simulate human…
26 -
Hugging Face Daily Papers research 12d ago
Physics-IQ Verified
Abstract A systematic evaluation of the Physics-IQ benchmark reveals limitations in measuring physical understanding of video generative models, leading to improvements in prompt quality and sample-level scoring that enhance reliability for assessing physically accurate video…
29 -
Hugging Face Daily Papers research 12d ago
Guava: An Effective and Universal Harness for Embodied Manipulation
Abstract A harness framework for embodied tool use combines high-level reasoning with external modules, enabling compact models to perform complex manipulation tasks with minimal training data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Language models trained on large-scale…
15 -
Hugging Face Daily Papers research 12d ago
Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games
Abstract A new benchmark suite called RNG-Bench is introduced to evaluate multimodal foundation models' ability to reconstruct past observations and use them for decision-making in multi-step interactions, featuring two games with controlled difficulty parameters and a memory…
23 -
Hugging Face Daily Papers research 12d ago
Sumi: Open Uniform Diffusion Language Model from Scratch
Abstract A large-scale uniform diffusion language model pretrained from scratch demonstrates competitive performance on knowledge and reasoning tasks while highlighting differences in commonsense reasoning compared to autoregressive models. Generated by…
15 -
Hugging Face Daily Papers research 12d ago
ActWorld: From Explorable to Interactive World Model via Action-Aware Memory
Abstract ActWorld extends navigation-centric interactive world models to support object interaction through a chunk-autoregressive framework with hierarchical action-aware memory and persistent memory banks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Interactive world models…
9 -
Hugging Face Daily Papers research 12d ago
Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences
Abstract A unified scientific generative language model encodes diverse scientific objects and spatial interactions as token sequences, demonstrating strong performance across multiple domains through autoregressive next-token prediction. Generated by…
16 -
Hugging Face Daily Papers research 12d ago
Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings
Abstract SAGA framework uses multimodal large language models to provide attribute-aware supervision for vision encoders through Group Relative Policy Optimization, improving zero-shot image retrieval performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision encoders for…
21 -
Hugging Face Daily Papers research 12d ago
Self-Evolving Visual Questioner
Abstract A vision-language model autonomously improves its question-generation capabilities through self-evolution, enhancing both question quality and answerer performance without external supervision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision-language models (VLMs)…
10 -
Hugging Face Daily Papers research 12d ago
Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems
Abstract Multi-agent LLM systems with shared state are analyzed through formal methods identifying concurrency anomalies and establishing a verified consistency hierarchy with mechanized proofs of soundness and completeness. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
14 -
Hugging Face Daily Papers research 12d ago
The Price of Anarchy in Disaggregated Inference
Abstract Disaggregated inference architectures separate prefill and decode phases across distinct GPU pools, and a game-theoretic analysis characterizes how GPU saturation affects system performance through regime transitions and payoff structure changes, enabling an adaptive…
25 -
Hugging Face Daily Papers research 12d ago
Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion
Abstract DR-DCI framework combines retrieval with direct corpus interaction by dynamically pulling relevant documents into a local workspace, enabling scalable and efficient agentic search across large corpora. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agentic search over…
27 -
Hugging Face Daily Papers research 12d ago
Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning
Abstract Visual-Seeker enables visual-native multimodal deep search through active visual reasoning, outperforming proprietary models on real-world web environments. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multimodal large language models (MLLMs) have demonstrated…
25 -
Hugging Face Daily Papers research 12d ago
EgoCS-400K: An Egocentric Gameplay Dataset for World Models
Abstract EgoCS-400K is a large-scale egocentric Counter-Strike dataset that bridges passive web videos and costly real-world embodied data by providing temporally aligned video-action-language trajectories with detailed player states and game events. Generated by…
16 -
Hugging Face Daily Papers research 12d ago
RepSelect: Robust LLM Unlearning via Representation Selectivity
Abstract RepSelect isolates forget-set-specific representations in LLMs by collapsing top principal components of weight gradients, achieving deeper and more robust unlearning compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Making large language models…
32 -
Hugging Face Daily Papers research 12d ago
RefGC-SR^2: Reference-guided Generated Content Super-Resolution and Refinement
Abstract A new reference-guided generated content super-resolution-refinement task is introduced that simultaneously recovers high-resolution details and refines generative artifacts using a frequency-aware diffusion transformer model. Generated by…
32 -
Hugging Face Daily Papers research 13d ago
Text-Vision Co-Instructed Image Editing
Abstract A unified text-visual image editing framework is presented that combines semantic intent from textual instructions with spatial guidance from visual prompts to achieve more precise and faithful image manipulation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing…
16 -
Hugging Face Daily Papers research 13d ago
Learning from the Self-future: On-policy Self-distillation for dLLMs
Abstract d-OPSD introduces a novel on-policy self-distillation framework for diffusion language models by adapting self-teacher construction and supervision mechanisms to match the non-autoregressive nature of diffusion models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
29 -
Hugging Face Daily Papers research 13d ago
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling
Abstract Parallel loop Transformers achieve better code generation performance with two loops due to refined representations, while additional loops cause diminishing returns and increased positional mismatch costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Looped…
5 -
Hugging Face Daily Papers research 13d ago
Rethinking the Role of Efficient Attention in Hybrid Architectures
Abstract Hybrid architectures combining full attention with efficient attention modules like sliding-window attention exhibit distinct scaling behaviors and optimization trajectories, with efficient attention primarily affecting the emergence speed of long-context capabilities…
29 -
Hugging Face Daily Papers research 13d ago
Variable-Width Transformers
Abstract A novel transformer architecture with nonuniform width allocation across layers achieves better performance and efficiency compared to uniform designs by utilizing a parameter-free residual resizing mechanism. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Scaling model…
5 -
Hugging Face Daily Papers research 13d ago
ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions
Abstract ChLogic benchmark reveals persistent performance gaps between English and Chinese logical reasoning in large language models, influenced by surface realization differences and translation artifacts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models…
37 -
Hugging Face Daily Papers research 13d ago
MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation with Multi-turn Local Revision
Abstract MemSlides presents a hierarchical memory framework for personalized presentation agents that separates long-term user profiles, working memory for session constraints, and tool memory for reusable execution experiences to enable stable personalization and reliable local…
21 -
Hugging Face Daily Papers research 13d ago
MotionVLA: Vision-Language-Action Model for Humanoid Motion
Abstract A dual-stream frequency tokenizer and autoregressive model are proposed to improve humanoid motion generation by separately encoding pose and physical dynamics, achieving better diversity and consistency compared to single-codebook approaches. Generated by…
11 -
Hugging Face Daily Papers research 13d ago
Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion
Abstract Spectral Forcing, a time-conditional 2D-DCT low-pass operator, improves diffusion model efficiency by explicitly separating signal from noise in pixel-space models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Pixel-space diffusion models are trained on full-bandwidth…
32 -
Hugging Face Daily Papers research 13d ago
ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining
Abstract A unified Vision-Language-Action pretraining framework leverages heterogeneous data sources including human egocentric videos and robot trajectories through a reliability-aware training approach that improves performance on embodied AI tasks. Generated by…
6 -
Hugging Face Daily Papers research 13d ago
ProCUA-SFT Technical Report
Abstract Training computer-use agents using a large-scale synthetic dataset with automated task generation and verification achieves significantly improved performance on desktop interaction benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Training computer-use agents…
4 -
Hugging Face Daily Papers research 13d ago
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients
Abstract Zone of Proximal Policy Optimization (ZPPO) improves knowledge distillation by using reformulated prompts that help students learn from both correct and incorrect responses, enhancing performance especially at smaller model sizes. Generated by…
32 -
Hugging Face Daily Papers research 13d ago
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation
Abstract OPD-Evolver is a self-evolving agent framework that combines slow-fast co-evolution with on-policy self-distillation to enhance memory management and policy learning across multiple domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Memory has become a standard…
28 -
Hugging Face Daily Papers research 13d ago
Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus
Abstract Research agents face significant challenges when evidence is in a different language than the query, with performance degrading even when gold evidence is provided directly. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deep research agents are increasingly evaluated on…
28