Hugging Face Daily Papers
65 articles archived · Visit source ↗ · RSS
-
Hugging Face Daily Papers research 1h ago
Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation
Abstract INSET is a unified multimodal model that embeds images as native vocabulary within textual instructions, enabling better handling of complex interleaved inputs through transformer-based contextual locality and supporting both image generation and editing tasks.…
34 -
Hugging Face Daily Papers research 1h ago
Reward Hacking in Rubric-Based Reinforcement Learning
Abstract Research examines reward hacking in rubric-based reinforcement learning, identifying verifier failure and rubric-design limitations as key sources of divergence between training and evaluation metrics. AI-generated summary Reinforcement learning with verifiable rewards…
31 -
Hugging Face Daily Papers research 1h ago
VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors
Abstract VidSplat is a training-free generative reconstruction framework that uses video diffusion priors to synthesize novel views and recover complete 3D scenes from sparse inputs through adaptive denoising and iterative refinement. AI-generated summary Gaussian Splatting has…
29 -
Hugging Face Daily Papers research 1h ago
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training
Abstract Training efficiency is improved by strategically allocating scarce labeled data through staged reinforcement learning and dense supervision, using sparse rewards for teacher model discovery and dense rewards for student model compression. AI-generated summary In…
35 -
Hugging Face Daily Papers research 1h ago
On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment
Abstract FATE is an on-policy framework that uses failure trajectories to improve agent safety and performance through self-evolution and Pareto-aware optimization. AI-generated summary Tool-using LLM agents fail through trajectories rather than only final responses, as they may…
10 -
Hugging Face Daily Papers research 3h ago
LLM Agents Already Know When to Call Tools -- Even Without Reasoning
Abstract When2Tool benchmark identifies conditions under which tool calls are necessary for LLM agents, revealing that models can predict tool necessity from hidden states but fail to act on this knowledge, leading to the development of Probe&Prefill method that reduces…
15 -
Hugging Face Daily Papers research 3h ago
Micro-Defects Expose Macro-Fakes: Detecting AI-Generated Images via Local Distributional Shifts
Abstract A local distribution-aware detection framework that amplifies micro-scale statistical irregularities to identify AI-generated images with improved accuracy. AI-generated summary Recent generative models can produce images that appear highly realistic, raising challenges…
26 -
Hugging Face Daily Papers research 4h ago
Solve the Loop: Attractor Models for Language and Reasoning
Abstract Attractor Models enable efficient iterative refinement through fixed-point solving with implicit differentiation, achieving superior language modeling and reasoning performance with reduced computational costs compared to traditional transformers. AI-generated summary…
5 -
Hugging Face Daily Papers research 4h ago
A Single Layer to Explain Them All:Understanding Massive Activations in Large Language Models
Abstract Massive activation emergence in LLMs occurs consistently across model families at a specific layer, where RMSNorm and FFN parameters jointly contribute, leading to reduced hidden representation diversity that can be mitigated through a proposed method improving…
36 -
Hugging Face Daily Papers research 5h ago
Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception
Abstract Urban-ImageNet presents a large-scale multi-modal dataset and evaluation benchmark for urban space perception from social media imagery, organized under a hierarchical taxonomy for scene classification, cross-modal retrieval, and instance segmentation tasks.…
36 -
Hugging Face Daily Papers research 5h ago
Learning, Fast and Slow: Towards LLMs That Adapt Continually
Abstract A fast-slow learning framework for large language models combines fixed parameters with optimized context to achieve better sample efficiency, reduced catastrophic forgetting, and improved adaptability in continual learning scenarios. AI-generated summary Large language…
22 -
Hugging Face Daily Papers research 6h ago
Efficient Pre-Training with Token Superposition
Abstract Token-Superposition Training (TST) improves pre-training efficiency by combining contiguous tokens into bags during a superposition phase with multi-hot cross-entropy objective, achieving faster training times without architectural changes. AI-generated summary…
30 -
Hugging Face Daily Papers research 6h ago
Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty
Abstract Agent-BRACE decomposes LLM agents into belief state and policy models, using structured textual claims with certainty labels to handle partial observability and long-term dependencies in complex environments. AI-generated summary Large language models (LLMs) are…
28 -
Hugging Face Daily Papers research 7h ago
ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging
Abstract ORBIT addresses catastrophic forgetting in large language model fine-tuning for generative retrieval by tracking parameter distances and employing weight averaging to maintain model performance. AI-generated summary Despite the rapid advancements in large language model…
7 -
Hugging Face Daily Papers research 9h ago
UniPath: Adaptive Coordination of Understanding and Generation for Unified Multimodal Reasoning
Abstract Unified multimodal models can improve performance by adaptively selecting coordination paths rather than using fixed patterns, enabling diverse reasoning strategies for different inputs. AI-generated summary Unified multimodal models (UMMs) aim to integrate…
19 -
Hugging Face Daily Papers research 9h ago
Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs
Abstract Reinforcement learning improves large language model recall of parametric knowledge by redistributing probability mass toward correct answers, with gains driven primarily by reinforcing rare but learnable examples. AI-generated summary Reinforcement learning (RL) has…
14 -
Hugging Face Daily Papers research 10h ago
Relit-LiVE: Relight Video by Jointly Learning Environment Video
Abstract A novel video relighting framework called Relit-LiVE is presented that produces physically consistent results without requiring camera pose information by incorporating raw reference images and using environment video prediction for joint relighting and environment map…
16 -
Hugging Face Daily Papers research 10h ago
Reliable Chain-of-Thought via Prefix Consistency
Abstract Prefix consistency uses answer reproduction rates under trace regeneration to weight candidate responses, achieving high accuracy with significantly fewer tokens than standard majority voting. AI-generated summary Large Language Models often improve accuracy on…
29 -
Hugging Face Daily Papers research 10h ago
Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States
Abstract POISE enables stable and efficient policy optimization for large reasoning models by estimating baselines using internal model signals, reducing computational overhead while maintaining performance comparable to existing methods. AI-generated summary Reinforcement…
37 -
Hugging Face Daily Papers research 12h ago
IndustryBench: Probing the Industrial Knowledge Boundaries of LLMs
Abstract IndustryBench evaluates industrial procurement question answering systems in Chinese against national standards, revealing significant gaps in safety compliance and highlighting the need for safety-aware assessment beyond standard accuracy metrics. AI-generated summary…
5 -
Hugging Face Daily Papers research 12h ago
Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs
Abstract Language models can be enhanced by transitioning from sequential message-based instruction-tuning to parallel stream processing, enabling simultaneous reading and generation across multiple concurrent data flows. AI-generated summary The continued improvements in…
6 -
Hugging Face Daily Papers research 13h ago
Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation
Abstract Pion is a spectrum-preserving optimizer for large language model training that uses orthogonal equivalence transformations to maintain singular values during weight updates, offering stable performance comparable to standard optimizers. AI-generated summary We introduce…
34 -
Hugging Face Daily Papers research 13h ago
Do not copy and paste! Rewriting strategies for code retrieval
Abstract Research investigates how different text rewriting strategies impact code retrieval performance, identifying that full natural language rewriting provides the greatest improvements while proposing entropy-based diagnostics to determine when such costly rewrites are…
15 -
Hugging Face Daily Papers research 13h ago
Large Language Models over Networks: Collaborative Intelligence under Resource Constraints
Abstract Collaborative intelligence enables multiple distributed LLMs to work together across devices and clouds to provide high-quality responses under diverse resource constraints. AI-generated summary Large language models (LLMs) are transforming society, powering…
9 -
Hugging Face Daily Papers research 13h ago
Debiased Model-based Representations for Sample-efficient Continuous Control
Abstract DR.Q algorithm improves model-based representations for Q-learning by maximizing mutual information and using faded prioritized experience replay to reduce bias and overfitting in representation learning. AI-generated summary Model-based representations recently stand…
20 -
Hugging Face Daily Papers research 13h ago
WildRelight: A Real-World Benchmark and Physics-Guided Adaptation for Single-Image Relighting
Abstract WildRelight dataset addresses the gap between synthetic and real-world single-image relighting by providing high-resolution outdoor scenes with aligned natural illumination, enabling physics-guided domain adaptation through diffusion posterior sampling and test-time…
21 -
Hugging Face Daily Papers research 13h ago
PAAC: Privacy-Aware Agentic Device-Cloud Collaboration
Abstract PAAC is a privacy-aware agentic framework that aligns planner-executor decomposition with device-cloud boundaries, using typed placeholder tokens and deterministic registries to enhance privacy while maintaining accuracy in distributed language model agents.…
23 -
Hugging Face Daily Papers research 14h ago
FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation
Abstract FaithfulFaces is a pose-faithful facial identity preservation framework that improves identity consistency in text-to-video generation through pose-shared alignment and explicit Euler angle embeddings. AI-generated summary Identity-preserving text-to-video generation…
38 -
Hugging Face Daily Papers research 14h ago
Implicit Preference Alignment for Human Image Animation
Abstract Implicit Preference Alignment (IPA) addresses hand motion generation challenges through data-efficient post-training that eliminates need for paired preference data while using hand-aware local optimization for improved quality. AI-generated summary Human image…
37 -
Hugging Face Daily Papers research 15h ago
One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue
Abstract Multi-turn dialogue safety monitoring system detects harmful intent accumulation through turn-level analysis and evaluates performance on a new benchmark dataset. AI-generated summary Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed…
18 -
Hugging Face Daily Papers research 15h ago
Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction
Abstract Asynchronous reinforcement learning in large language models faces challenges with PPO-style corrections due to delayed updates and missing historical logits, which are addressed through exact and approximate correction methods including snapshot tracking and revised…
9 -
Hugging Face Daily Papers research 16h ago
GLiNER-Relex: A Unified Framework for Joint Named Entity Recognition and Relation Extraction
Abstract A unified model for joint named entity recognition and relation extraction that uses a shared transformer encoder to simultaneously identify entities and extract relations with zero-shot capabilities. AI-generated summary Joint named entity recognition (NER) and…
27 -
Hugging Face Daily Papers research 16h ago
A Causal Language Modeling Detour Improves Encoder Continued Pretraining
Abstract Switching from Masked Language Modeling to Causal Language Modeling during encoder adaptation improves downstream performance on biomedical texts through dense supervision effects in lower transformer layers. AI-generated summary When adapting an encoder to a new…
25 -
Hugging Face Daily Papers research 16h ago
World Action Models: The Next Frontier in Embodied AI
Abstract World Action Models unify predictive state modeling with action generation for embodied policy learning, forming a cohesive framework for understanding environment dynamics and action prediction. AI-generated summary Vision-Language-Action (VLA) models have achieved…
15 -
Hugging Face Daily Papers research 17h ago
Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents
Abstract A visual-native agent harness with image bank reference protocol enables reusable intermediate visual evidence and closed-loop data generation that improves multimodal deep search performance across multiple benchmarks. AI-generated summary Multimodal deep search…
33 -
Hugging Face Daily Papers research 17h ago
L2P: Unlocking Latent Potential for Pixel Generation
Abstract Latent-to-Pixel transfer paradigm efficiently leverages pre-trained latent diffusion models to create pixel-space models with minimal training overhead and high-resolution generation capabilities. AI-generated summary Pixel diffusion models have recently regained…
14 -
Hugging Face Daily Papers research 17h ago
From Web to Pixels: Bringing Agentic Search into Visual Perception
Abstract Researchers introduce WebEye, a benchmark for object localization requiring external knowledge resolution, and Pixel-Searcher, an agent-based approach that connects hidden target identities to visual annotations through search and reasoning. AI-generated summary Visual…
22 -
Hugging Face Daily Papers research 17h ago
SeePhys Pro: Diagnosing Modality Transfer and Blind-Training Effects in Multimodal RLVR for Physics Reasoning
Abstract SeePhys Pro benchmark reveals that current multimodal models struggle with representation-invariant reasoning when information shifts from text to visual formats, and demonstrates that blind training can improve performance through residual textual cues. AI-generated…
36 -
Hugging Face Daily Papers research 17h ago
MEME: Multi-entity & Evolving Memory Evaluation
Abstract MEME benchmark evaluates memory systems across multiple entities and evolving conditions, revealing persistent challenges in dependency reasoning despite advanced retrieval and prompting techniques. AI-generated summary LLM-based agents increasingly operate in…
17 -
Hugging Face Daily Papers research 17h ago
Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction
Abstract Lite3R addresses efficiency challenges in transformer-based 3D reconstruction through sparse attention and low-precision quantization while maintaining geometric accuracy. AI-generated summary Transformer-based 3D reconstruction has emerged as a powerful paradigm for…
22 -
Hugging Face Daily Papers research 18h ago
PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks
Abstract PASA is a robust watermarking algorithm for large language models that operates at the semantic level using latent embedding spaces and shared randomness for secure text detection. AI-generated summary Watermarking for large language models (LLMs) is a promising…
16 -
Hugging Face Daily Papers research 18h ago
FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning
Abstract Training framework FocuSFT improves long-context language model performance by addressing attention allocation issues through bilevel optimization with parametric memory that focuses attention on semantically relevant content. AI-generated summary Large language models…
25 -
Hugging Face Daily Papers research 18h ago
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents
Abstract ToolCUA is an end-to-end agent that learns optimal GUI-tool path selection through staged training, achieving superior performance in hybrid action space environments. AI-generated summary Computer Use Agents (CUAs) can act through both atomic GUI actions, such as click…
38 -
Hugging Face Daily Papers research 18h ago
AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward
Abstract AlphaGRPO enhances multimodal generation by applying Group Relative Policy Optimization to AR-Diffusion Unified Multimodal Models through self-reflective refinement and decompositional verifiable reward mechanisms. AI-generated summary In this paper, we propose…
26 -
Hugging Face Daily Papers research 18h ago
AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation
Abstract LoRA optimizers are analyzed through a unified framework based on surrogate matrices and preconditioners, with AdaPreLoRA proposing a novel approach using Adafactor diagonal Kronecker preconditioning to improve factor-space updates while maintaining low memory usage.…
38 -
Hugging Face Daily Papers research 18h ago
World Model for Robot Learning: A Comprehensive Survey
Abstract World models as predictive representations of environmental dynamics have become essential for robot learning, supporting policy learning, planning, and simulation across various embodied applications. AI-generated summary World models, which are predictive…
12 -
Hugging Face Daily Papers research 18h ago
Geometric Factual Recall in Transformers
Abstract Transformer language models use geometric memorization where embeddings encode linear superpositions of attributes and MLPs act as relation-conditioned selectors rather than associative key-value mappings. AI-generated summary How do transformer language models memorize…
6 -
Hugging Face Daily Papers research 18h ago
Continual Harness: Online Adaptation for Self-Improving Foundation Agents
Abstract A self-improving AI system for embodied agents autonomously refines its own prompts, skills, and memory through continuous learning without environment resets, achieving human-level performance in complex video games. AI-generated summary Coding harnesses such as Claude…
15 -
Hugging Face Daily Papers research 19h ago
Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values
Abstract Autonomous agents exhibit distinct value systems from underlying language models, requiring new benchmarking approaches to assess alignment across diverse execution environments. AI-generated summary Autonomous agents have rapidly matured as task executors and seen…
28 -
Hugging Face Daily Papers research 19h ago
MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments
Abstract MCP-Cosmos integrates generative World Models into the Model Context Protocol ecosystem to enhance agent planning and execution through predictive simulation in latent space. AI-generated summary The Model Context Protocol (MCP) has unified the interface between Large…
6 -
Hugging Face Daily Papers research 19h ago
LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues
Abstract A new benchmark called LongMemEval-V2 is introduced to evaluate memory systems' ability to help agents acquire environment-specific experience in web environments, featuring a suite of memory methods including AgentRunbook-R and AgentRunbook-C that demonstrate varying…
23 -
Hugging Face Daily Papers research 19h ago
δ-mem: Efficient Online Memory for Large Language Models
Abstract A lightweight memory mechanism called δ-mem enhances large language models by augmenting a frozen attention backbone with a compact associative memory state that provides low-rank corrections to attention computations. AI-generated summary Large language models…
12 -
Hugging Face Daily Papers research 19h ago
Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics
Abstract Enterprise discovery agents that read system configuration at runtime outperform traditional world models in configurable environments where dynamics change over time. AI-generated summary World models enable agents to anticipate the effects of their actions by…
37 -
Hugging Face Daily Papers research 19h ago
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
Abstract Unified vision-language models treat understanding and generation as integrated processes rather than separate tasks, demonstrating strong performance across multiple multimodal capabilities including image synthesis and action reasoning. AI-generated summary Recent…
37 -
Hugging Face Daily Papers research 19h ago
Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization
Abstract DRoRAE enhances visual representation by fusing multi-layer features from pretrained vision encoders through adaptive routing and incremental correction, improving reconstruction and generation quality. AI-generated summary Representation autoencoders that reuse frozen…
6 -
Hugging Face Daily Papers research 20h ago
The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes
Abstract On-policy distillation and self-distillation methods for large language models exhibit varying effectiveness depending on teacher choice, loss formulation, and instance-specific privileged information availability, with identified failure mechanisms including…
32 -
Hugging Face Daily Papers research 20h ago
CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives
Abstract CausalCine enables interactive, multi-shot video generation by addressing limitations of autoregressive models through causal modeling, dynamic memory routing, and real-time distillation techniques. AI-generated summary Autoregressive video generation aims at real-time,…
38 -
Hugging Face Daily Papers research 20h ago
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents
Abstract MemPrivacy enables privacy-preserving personalized memory in edge-cloud environments by using type-aware placeholders to protect sensitive data while maintaining semantic integrity for effective memory operations. AI-generated summary As LLM-powered agents are…
30 -
Hugging Face Daily Papers research 20h ago
LychSim: A Controllable and Interactive Simulation Framework for Vision Research
Abstract A simulation framework called LychSim is introduced, featuring a Python API, procedural data pipeline, and MCP integration to enable controllable and interactive environments for vision system development and evaluation. AI-generated summary While self-supervised…
23 -
Hugging Face Daily Papers research 20h ago
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration -- Learning from Cheap, Optimizing Expensive
Abstract An agentic framework called AutoLLMResearch automates high-cost large language model experiment configurations by learning from multi-fidelity experimental environments and enabling efficient configuration identification through cross-fidelity extrapolation.…
5 -
Hugging Face Daily Papers research 20h ago
MoCam: Unified Novel View Synthesis via Structured Denoising Dynamics
Abstract MoCam addresses the challenge of generative novel view synthesis by dynamically coordinating geometric and appearance priors through structured denoising dynamics within a diffusion framework. AI-generated summary Generative novel view synthesis faces a fundamental…
13 -
Hugging Face Daily Papers research 20h ago
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
Abstract Deep research agents trained using RubricEM framework demonstrate superior performance on long-form research tasks through rubric-guided reinforcement learning with stage-aware planning and reflection-based meta-policy evolution. AI-generated summary Training deep…
16 -
Hugging Face Daily Papers research 20h ago
LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models
Abstract LoopUS is a post-training framework that transforms pretrained LLMs into looped architectures for improved reasoning performance through latent-refinement and adaptive early exiting mechanisms. AI-generated summary Looped computation shows promise in improving the…
31 -
Hugging Face Daily Papers research 20h ago
Teaching Language Models to Think in Code
Abstract ThinC framework enables mathematical problem solving where code serves as the primary reasoning mechanism instead of a verification tool, demonstrating superior performance on math benchmarks. AI-generated summary Tool-integrated reasoning (TIR) has emerged as a…
6 -
Hugging Face Daily Papers research 1d ago
TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems
Abstract Test-time co-evolution framework for multi-agent systems that jointly adapts agent capabilities and communication topology at different time scales to achieve task-conditioned stability and improved performance. AI-generated summary Multi-agent systems (MAS) have…
16