Hugging Face Daily Papers
500 articles archived · Visit source ↗ · RSS
-
Hugging Face Daily Papers research 8d ago
Distilling Examples into Task Instructions: Enhanced In-Context Learning for Real-World B2B Conversations
Abstract A novel approach for B2B conversation classification that reduces token usage by 99% while improving performance and maintaining robustness as context length increases. Generated by Qwen/Qwen2.5-Coder-32B-Instruct In-context learning (ICL) is the standard method for…
8 -
Hugging Face Daily Papers research 8d ago
GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents
Abstract Current memory agents lack reliable shared institutional deployment due to challenges in balancing utility, access control, and forgetting across multiple principals with diverse authorization contexts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Memory benchmarks for…
5 -
Hugging Face Daily Papers research 8d ago
BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation
Abstract A 3D brain MRI generative model uses a masked-autoencoder tokenizer to create clinically informative embeddings that support both medical task performance and controlled image generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Three-dimensional (3D) brain MRI is…
6 -
Hugging Face Daily Papers research 8d ago
WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents
Abstract WorldLines benchmark evaluates long-term memory in embodied agents through household scenarios, while ObsMem framework addresses challenges in partial observability and memory translation for decision-making. Generated by Qwen/Qwen2.5-Coder-32B-Instruct To assist humans…
19 -
Hugging Face Daily Papers research 10d ago
LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents
Abstract LEDGERAGENT is a method for customer service agents that maintains task states in a separate ledger to improve policy adherence and state management during tool calling. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Policy-adherent tool-calling agents in customer-service…
36 -
Hugging Face Daily Papers research 10d ago
PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models
Abstract PerceptionDLM enables efficient parallel region perception in multimodal diffusion language models through structured attention masking and efficient prompting, achieving faster inference without sacrificing caption quality. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
12 -
Hugging Face Daily Papers research 10d ago
Context-Aware RL for Agentic and Multimodal LLMs
Abstract ContextRL enhances long-horizon reasoning and multimodal performance through reinforcement learning that rewards context selection for supporting query-answer pairs, achieving improvements over standard methods on diverse benchmarks. Generated by…
21 -
Hugging Face Daily Papers research 10d ago
Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States
Abstract A comprehensive corpus and access layer for U.S. local ordinance codes has been developed to enable machine-readable legal AI research, addressing the lack of authoritative legal text at scale for local regulations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Progress…
4 -
Hugging Face Daily Papers research 10d ago
ReSyn: A Generalized Recursive Regular Expression Synthesis Framework
Abstract A divide-and-conquer framework named ReSyn enhances regex synthesis accuracy by decomposing complex problems, combined with a parameter-efficient synthesizer called Set2Regex that handles example permutation invariance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
27 -
Hugging Face Daily Papers research 10d ago
LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI
Abstract LegalHalluLens audits AI systems in legal workflows by identifying specific error patterns and directional biases in hallucinations across different claim types, enabling more reliable deployment through targeted diagnostic and mitigation approaches. Generated by…
31 -
Hugging Face Daily Papers research 10d ago
Configurable Clinical Information Extraction with Agentic RAG: What Works, What Breaks, and Why
Abstract ACIE, an agentic RAG system deployed in a clinical setting, demonstrates high accuracy in extracting medical information from complex patient contexts, achieving 96.5% acceptance rate by nuclear-medicine physicians across 7,326 judgments. Generated by…
5 -
Hugging Face Daily Papers research 10d ago
The Data Manifold under the Microscope
Abstract A benchmarking framework is introduced to study data-manifold geometry by extending dSprites and COIL-20 datasets with additional transformation dimensions and dense sampling, enabling accurate estimation of curvature, reach, and volume for theoretical analysis and…
36 -
Hugging Face Daily Papers research 10d ago
The FID Lottery: Quantifying Hidden Randomness in Generative-Model Evaluation
Abstract Analysis of FID variance across different training and sampling seeds reveals significant reproducibility issues in image generation evaluation, with retraining causing larger fluctuations than resampling, and recommends updated evaluation protocols with error bars and…
21 -
Hugging Face Daily Papers research 10d ago
Duration Aware Scheduling for ASR Serving Under Workload Drift
Abstract Duration-aware scheduling policies improve ASR serving latency by leveraging audio length as a predictor for processing time, with SJF and HRRN algorithms showing significant median latency reductions while maintaining throughput. Generated by…
26 -
Hugging Face Daily Papers research 10d ago
Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe
Abstract Uniform 4-bit training with RHT-based quantization outperforms E2M1-based methods by eliminating shrinkage bias and improving training stability across large language model architectures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct FP4 training promises substantial…
31 -
Hugging Face Daily Papers research 10d ago
Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages
Abstract Multi-LCB addresses the limitation of LiveCodeBench by providing a multi-language benchmark for evaluating LLMs across twelve programming languages while maintaining contamination controls and evaluation protocols. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
33 -
Hugging Face Daily Papers research 10d ago
No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages
Abstract Research addresses code generation challenges for no-resource programming languages by developing benchmarks and proposing a method that combines further pre-training with weight difference transfer to create specialized instruction-following models at reduced…
27 -
Hugging Face Daily Papers research 10d ago
JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines
Abstract Game development frameworks and benchmarks were created using data from game jam competitions to evaluate code generation and project-level programming capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current AI-driven game development has made substantial…
25 -
Hugging Face Daily Papers research 10d ago
ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?
Abstract ImageWAM demonstrates that pretrained image editing models can effectively replace video generation in world action models for robot control, achieving better performance with reduced computational costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World Action Models…
25 -
Hugging Face Daily Papers research 10d ago
ENPIRE: Agentic Robot Policy Self-Improvement in the Real World
Abstract ENPIRE framework enables autonomous robotics research through a closed-loop system that automates policy improvement via environment feedback, policy refinement, and evolutionary code optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Achieving dexterous robotic…
27 -
Hugging Face Daily Papers research 10d ago
Holo-World: Unified Camera, Object and Weather Control for Video World Model
Abstract A unified controllable video world model generates videos from a single image while preserving scene structure and transferring to target weather states through specialized parameterization and conditioning techniques. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video…
22 -
Hugging Face Daily Papers research 10d ago
Current World Models Lack a Persistent State Core
Abstract Current world models fail to maintain consistent world states when unobserved, indicating a need for design changes that prioritize physical state stability over appearance fidelity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World models are increasingly regarded as…
18 -
Hugging Face Daily Papers research 10d ago
Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation
Abstract Hybrid linear attention models can be improved through a novel initialization technique that enhances conversion from pretrained Transformers by leveraging teacher attention statistics and alignment steps. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Hybrid linear…
6 -
Hugging Face Daily Papers research 11d ago
FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows
Abstract FlowBender is a closed-loop framework that addresses constraint satisfaction in diffusion and flow models by training networks to correct alignment errors using inference-time feedback, outperforming traditional supervised and guidance-based approaches across multiple…
11 -
Hugging Face Daily Papers research 11d ago
DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects
Abstract DragMesh-2 enables dexterous hand-object interaction through contact-driven manipulation, with PICA enhancing robustness under varying contact loads without tactile feedback. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Dexterous interaction with articulated objects is…
19 -
Hugging Face Daily Papers research 11d ago
HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining
Abstract Egocentric human video can effectively replace teleoperated robot trajectories for embodied model pretraining, achieving better performance with reduced data collection costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Embodied foundation models are expected to…
22 -
Hugging Face Daily Papers research 11d ago
DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis
Abstract A large-scale real-world dataset called DF3DV-1K is introduced to address the lack of clean and cluttered image sets for distractor-free radiance field research, containing 1,048 scenes with 89,924 images across 128 distractor types and 161 scene themes, along with a…
5 -
Hugging Face Daily Papers research 11d ago
Playful Agentic Robot Learning
Abstract Embodied robots learn reusable skills through self-directed play and exploration, then apply these skills to improve performance on downstream tasks without additional training. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current agentic robot systems can write…
4 -
Hugging Face Daily Papers research 11d ago
S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence
Abstract S-Agent is a spatial reasoning framework that enhances visual language models with temporal memory and hierarchical spatial tools to enable continuous 3D world understanding from multi-view imagery. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Real-world spatial…
28 -
Hugging Face Daily Papers research 11d ago
JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising
Abstract A fast, training-free framework generates text-driven 3D visual illusions by decoupling generation into cross-space dual-branch denoising and view-conditioned texture synthesis for seamless geometric fusion and semantic coherence. Generated by…
26 -
Hugging Face Daily Papers research 11d ago
Understanding the Behaviors of Environment-aware Information Retrieval
Abstract Large language models can be trained via reinforcement learning to adapt query formulation strategies for different retrievers, with distinct optimal query styles and improved performance through retriever-specific guidance and model scaling. Generated by…
16 -
Hugging Face Daily Papers research 11d ago
FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines
Abstract FAPO optimizes LLM pipelines by combining prompt editing with structural changes, demonstrating superior performance across multiple benchmarks and security tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-step LLM pipelines fail through interactions among…
38 -
Hugging Face Daily Papers research 11d ago
Selective Synergistic Learning for Video Object-Centric Learning
Abstract Selective Synergistic Learning (SSync) addresses limitations in video object-centric learning by selectively distilling reliable cues through pseudo-labeling and transitive merging to improve object decomposition quality and robustness. Generated by…
30 -
Hugging Face Daily Papers research 11d ago
Adaptive Volumetric Mechanical Property Fields Invariant to Resolution
Abstract AdaVoMP predicts dense spatially-varying mechanical properties for 3D objects using a sparse adaptive voxel structure and transformer encoder-decoder model, enabling realistic deformable simulations with improved accuracy and efficiency. Generated by…
33 -
Hugging Face Daily Papers research 11d ago
FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining
Abstract FreeStyle is a scalable dual-reference generation framework that uses community LoRA mining to create large-scale style-content triplets while addressing content leakage through disentanglement mechanisms and a comprehensive benchmark. Generated by…
16 -
Hugging Face Daily Papers research 11d ago
Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents
Abstract Aggregate-score leaderboards in agent benchmarks fail to capture deployment-relevant dimensions and show rank instability, necessitating new evaluation frameworks based on predictive validity and out-of-distribution criteria. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…
27 -
Hugging Face Daily Papers research 11d ago
Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance
Abstract A lightweight image inpainting framework achieves high-fidelity results with significantly reduced parameters and inference time through novel local-global interaction blocks and adaptive distillation strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While…
35 -
Hugging Face Daily Papers research 11d ago
Thinking with Visual Grounding
Abstract Visually grounded thinking integrates natural-language reasoning with explicit visual evidence grounding in vision-language models, improving reasoning accuracy through scalable synthesis and reinforcement learning techniques. Generated by…
34 -
Hugging Face Daily Papers research 11d ago
LooseControlVideo: Directorial Video Control using Spatial Blocking
Abstract LooseControlVideo enables intuitive 3D spatial control in text-to-video generation using sparse oriented 3D boxes as proxies, achieving superior trajectory accuracy and occlusion handling compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Precise…
10 -
Hugging Face Daily Papers research 11d ago
REVES: REvision and VErification--Augmented Training for Test-Time Scaling
Abstract A two-stage iterative framework alternates between data augmentation and policy optimization to improve LLM reasoning by leveraging intermediate correction steps, achieving superior performance on coding benchmarks and constraint satisfaction problems. Generated by…
23 -
Hugging Face Daily Papers research 11d ago
Re-Centering Humans in LLM Personalization
Abstract Human-centered evaluation reveals significant gaps between synthetic and real-world LLM personalization performance, with models struggling to extract user attributes and generate truly personalized responses that match human quality judgments. Generated by…
30 -
Hugging Face Daily Papers research 11d ago
Reinforcement Learning-Guided Retrieval with Soft Fusion for Robust Multimodal Imitation Learning under Missing Modalities
Abstract RL4IL enables robust robotic manipulation under sensor dropout by using reinforcement learning to retrieve relevant demonstrations and cross-attention fusion to impute missing modalities without retraining. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic systems…
23 -
Hugging Face Daily Papers research 11d ago
When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?
Abstract Offline reinforcement learning with trajectory-level outcome supervision presents statistical challenges that can be addressed through pessimistic actor-critic methods, though fundamental barriers exist for certain generalized outcome-based problems. Generated by…
35 -
Hugging Face Daily Papers research 11d ago
HiLo-Token: Input-Adaptive High-Low Frequency Token Compression for Efficient Image Editing
Abstract A novel token compression framework called HiLo-Token is introduced to accelerate Diffusion Transformers in image editing tasks by adaptively allocating tokens based on spatial frequency and context importance, achieving significant speedups without quality loss.…
27 -
Hugging Face Daily Papers research 11d ago
The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL
Abstract Discriminator-Guided Reinforcement Learning (DRL) addresses alignment issues in score- and flow-matching models by using a pretrained representation space discriminator as an optimal reward signal, improving both visual fidelity and semantic quality without human…
4 -
Hugging Face Daily Papers research 11d ago
MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model
Abstract MaineCoon represents the first real-time audio-visual autoregressive model for social worlds, achieving high frame rates and long-horizon generation through novel training techniques and inference frameworks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As an increasing…
21 -
Hugging Face Daily Papers research 11d ago
MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction
Abstract 3D point motion forecasting model predicts object trajectories from visual history and language goals, demonstrating superior performance on benchmarks and transferring effectively to robot manipulation and video generation tasks. Generated by…
4 -
Hugging Face Daily Papers research 11d ago
ViT-Up: Faithful Feature Upsampling for Vision Transformers
Abstract ViT-Up is a feature upsampling framework for Vision Transformers that uses layer-wise query construction from hidden states to improve dense prediction tasks, outperforming existing image-guided methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision Transformers…
27 -
Hugging Face Daily Papers research 11d ago
Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns
Abstract The standard basis of transformer hidden states serves as a training-free, architecture-general feature representation where individual dimensions encode semantic content through signs and confidence through magnitudes, functioning as independent binary registers…
10 -
Hugging Face Daily Papers research 11d ago
iOSWorld: A Benchmark for Personally Intelligent Phone Agents
Abstract IOSWorld is introduced as the first interactive native iOS simulator benchmark featuring persistent user identity across multiple apps to evaluate personalized mobile agent capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A useful phone agent needs to be…
6