Hugging Face Daily Papers

500 articles archived · Visit source ↗ · RSS

Hugging Face Daily Papers research 8d ago

Distilling Examples into Task Instructions: Enhanced In-Context Learning for Real-World B2B Conversations

Abstract A novel approach for B2B conversation classification that reduces token usage by 99% while improving performance and maintaining robustness as context length increases. Generated by Qwen/Qwen2.5-Coder-32B-Instruct In-context learning (ICL) is the standard method for…

8
Hugging Face Daily Papers research 8d ago

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

Abstract Current memory agents lack reliable shared institutional deployment due to challenges in balancing utility, access control, and forgetting across multiple principals with diverse authorization contexts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Memory benchmarks for…

5
Hugging Face Daily Papers research 8d ago

BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation

Abstract A 3D brain MRI generative model uses a masked-autoencoder tokenizer to create clinically informative embeddings that support both medical task performance and controlled image generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Three-dimensional (3D) brain MRI is…

6
Hugging Face Daily Papers research 8d ago

WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

Abstract WorldLines benchmark evaluates long-term memory in embodied agents through household scenarios, while ObsMem framework addresses challenges in partial observability and memory translation for decision-making. Generated by Qwen/Qwen2.5-Coder-32B-Instruct To assist humans…

19
Hugging Face Daily Papers research 10d ago

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents

Abstract LEDGERAGENT is a method for customer service agents that maintains task states in a separate ledger to improve policy adherence and state management during tool calling. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Policy-adherent tool-calling agents in customer-service…

36
Hugging Face Daily Papers research 10d ago

PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models

Abstract PerceptionDLM enables efficient parallel region perception in multimodal diffusion language models through structured attention masking and efficient prompting, achieving faster inference without sacrificing caption quality. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

12
Hugging Face Daily Papers research 10d ago

Context-Aware RL for Agentic and Multimodal LLMs

Abstract ContextRL enhances long-horizon reasoning and multimodal performance through reinforcement learning that rewards context selection for supporting query-answer pairs, achieving improvements over standard methods on diverse benchmarks. Generated by…

21
Hugging Face Daily Papers research 10d ago

Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

Abstract A comprehensive corpus and access layer for U.S. local ordinance codes has been developed to enable machine-readable legal AI research, addressing the lack of authoritative legal text at scale for local regulations. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Progress…

4
Hugging Face Daily Papers research 10d ago

ReSyn: A Generalized Recursive Regular Expression Synthesis Framework

Abstract A divide-and-conquer framework named ReSyn enhances regex synthesis accuracy by decomposing complex problems, combined with a parameter-efficient synthesizer called Set2Regex that handles example permutation invariance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

27
Hugging Face Daily Papers research 10d ago

LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI

Abstract LegalHalluLens audits AI systems in legal workflows by identifying specific error patterns and directional biases in hallucinations across different claim types, enabling more reliable deployment through targeted diagnostic and mitigation approaches. Generated by…

31
Hugging Face Daily Papers research 10d ago

Configurable Clinical Information Extraction with Agentic RAG: What Works, What Breaks, and Why

Abstract ACIE, an agentic RAG system deployed in a clinical setting, demonstrates high accuracy in extracting medical information from complex patient contexts, achieving 96.5% acceptance rate by nuclear-medicine physicians across 7,326 judgments. Generated by…

5
Hugging Face Daily Papers research 10d ago

The Data Manifold under the Microscope

Abstract A benchmarking framework is introduced to study data-manifold geometry by extending dSprites and COIL-20 datasets with additional transformation dimensions and dense sampling, enabling accurate estimation of curvature, reach, and volume for theoretical analysis and…

36
Hugging Face Daily Papers research 10d ago

The FID Lottery: Quantifying Hidden Randomness in Generative-Model Evaluation

Abstract Analysis of FID variance across different training and sampling seeds reveals significant reproducibility issues in image generation evaluation, with retraining causing larger fluctuations than resampling, and recommends updated evaluation protocols with error bars and…

21
Hugging Face Daily Papers research 10d ago

Duration Aware Scheduling for ASR Serving Under Workload Drift

Abstract Duration-aware scheduling policies improve ASR serving latency by leveraging audio length as a predictor for processing time, with SJF and HRRN algorithms showing significant median latency reductions while maintaining throughput. Generated by…

26
Hugging Face Daily Papers research 10d ago

Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

Abstract Uniform 4-bit training with RHT-based quantization outperforms E2M1-based methods by eliminating shrinkage bias and improving training stability across large language model architectures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct FP4 training promises substantial…

31
Hugging Face Daily Papers research 10d ago

Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages

Abstract Multi-LCB addresses the limitation of LiveCodeBench by providing a multi-language benchmark for evaluating LLMs across twelve programming languages while maintaining contamination controls and evaluation protocols. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

33
Hugging Face Daily Papers research 10d ago

No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages

Abstract Research addresses code generation challenges for no-resource programming languages by developing benchmarks and proposing a method that combines further pre-training with weight difference transfer to create specialized instruction-following models at reduced…

27
Hugging Face Daily Papers research 10d ago

JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines

Abstract Game development frameworks and benchmarks were created using data from game jam competitions to evaluate code generation and project-level programming capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current AI-driven game development has made substantial…

25
Hugging Face Daily Papers research 10d ago

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

Abstract ImageWAM demonstrates that pretrained image editing models can effectively replace video generation in world action models for robot control, achieving better performance with reduced computational costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World Action Models…

25
Hugging Face Daily Papers research 10d ago

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

Abstract ENPIRE framework enables autonomous robotics research through a closed-loop system that automates policy improvement via environment feedback, policy refinement, and evolutionary code optimization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Achieving dexterous robotic…

27
Hugging Face Daily Papers research 10d ago

Holo-World: Unified Camera, Object and Weather Control for Video World Model

Abstract A unified controllable video world model generates videos from a single image while preserving scene structure and transferring to target weather states through specialized parameterization and conditioning techniques. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video…

22
Hugging Face Daily Papers research 10d ago

Current World Models Lack a Persistent State Core

Abstract Current world models fail to maintain consistent world states when unobserved, indicating a need for design changes that prioritize physical state stability over appearance fidelity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct World models are increasingly regarded as…

18
Hugging Face Daily Papers research 10d ago

Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation

Abstract Hybrid linear attention models can be improved through a novel initialization technique that enhances conversion from pretrained Transformers by leveraging teacher attention statistics and alignment steps. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Hybrid linear…

6
Hugging Face Daily Papers research 11d ago

FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows

Abstract FlowBender is a closed-loop framework that addresses constraint satisfaction in diffusion and flow models by training networks to correct alignment errors using inference-time feedback, outperforming traditional supervised and guidance-based approaches across multiple…

11
Hugging Face Daily Papers research 11d ago

DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects

Abstract DragMesh-2 enables dexterous hand-object interaction through contact-driven manipulation, with PICA enhancing robustness under varying contact loads without tactile feedback. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Dexterous interaction with articulated objects is…

19
Hugging Face Daily Papers research 11d ago

HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining

Abstract Egocentric human video can effectively replace teleoperated robot trajectories for embodied model pretraining, achieving better performance with reduced data collection costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Embodied foundation models are expected to…

22
Hugging Face Daily Papers research 11d ago

DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

Abstract A large-scale real-world dataset called DF3DV-1K is introduced to address the lack of clean and cluttered image sets for distractor-free radiance field research, containing 1,048 scenes with 89,924 images across 128 distractor types and 161 scene themes, along with a…

5
Hugging Face Daily Papers research 11d ago

Playful Agentic Robot Learning

Abstract Embodied robots learn reusable skills through self-directed play and exploration, then apply these skills to improve performance on downstream tasks without additional training. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current agentic robot systems can write…

4
Hugging Face Daily Papers research 11d ago

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

Abstract S-Agent is a spatial reasoning framework that enhances visual language models with temporal memory and hierarchical spatial tools to enable continuous 3D world understanding from multi-view imagery. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Real-world spatial…

28
Hugging Face Daily Papers research 11d ago

JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising

Abstract A fast, training-free framework generates text-driven 3D visual illusions by decoupling generation into cross-space dual-branch denoising and view-conditioned texture synthesis for seamless geometric fusion and semantic coherence. Generated by…

26
Hugging Face Daily Papers research 11d ago

Understanding the Behaviors of Environment-aware Information Retrieval

Abstract Large language models can be trained via reinforcement learning to adapt query formulation strategies for different retrievers, with distinct optimal query styles and improved performance through retriever-specific guidance and model scaling. Generated by…

16
Hugging Face Daily Papers research 11d ago

FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines

Abstract FAPO optimizes LLM pipelines by combining prompt editing with structural changes, demonstrating superior performance across multiple benchmarks and security tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multi-step LLM pipelines fail through interactions among…

38
Hugging Face Daily Papers research 11d ago

Selective Synergistic Learning for Video Object-Centric Learning

Abstract Selective Synergistic Learning (SSync) addresses limitations in video object-centric learning by selectively distilling reliable cues through pseudo-labeling and transitive merging to improve object decomposition quality and robustness. Generated by…

30
Hugging Face Daily Papers research 11d ago

Adaptive Volumetric Mechanical Property Fields Invariant to Resolution

Abstract AdaVoMP predicts dense spatially-varying mechanical properties for 3D objects using a sparse adaptive voxel structure and transformer encoder-decoder model, enabling realistic deformable simulations with improved accuracy and efficiency. Generated by…

33
Hugging Face Daily Papers research 11d ago

FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining

Abstract FreeStyle is a scalable dual-reference generation framework that uses community LoRA mining to create large-scale style-content triplets while addressing content leakage through disentanglement mechanisms and a comprehensive benchmark. Generated by…

16
Hugging Face Daily Papers research 11d ago

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

Abstract Aggregate-score leaderboards in agent benchmarks fail to capture deployment-relevant dimensions and show rank instability, necessitating new evaluation frameworks based on predictive validity and out-of-distribution criteria. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

27
Hugging Face Daily Papers research 11d ago

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

Abstract A lightweight image inpainting framework achieves high-fidelity results with significantly reduced parameters and inference time through novel local-global interaction blocks and adaptive distillation strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct While…

35
Hugging Face Daily Papers research 11d ago

Thinking with Visual Grounding

Abstract Visually grounded thinking integrates natural-language reasoning with explicit visual evidence grounding in vision-language models, improving reasoning accuracy through scalable synthesis and reinforcement learning techniques. Generated by…

34
Hugging Face Daily Papers research 11d ago

LooseControlVideo: Directorial Video Control using Spatial Blocking

Abstract LooseControlVideo enables intuitive 3D spatial control in text-to-video generation using sparse oriented 3D boxes as proxies, achieving superior trajectory accuracy and occlusion handling compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Precise…

10
Hugging Face Daily Papers research 11d ago

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

Abstract A two-stage iterative framework alternates between data augmentation and policy optimization to improve LLM reasoning by leveraging intermediate correction steps, achieving superior performance on coding benchmarks and constraint satisfaction problems. Generated by…

23
Hugging Face Daily Papers research 11d ago

Re-Centering Humans in LLM Personalization

Abstract Human-centered evaluation reveals significant gaps between synthetic and real-world LLM personalization performance, with models struggling to extract user attributes and generate truly personalized responses that match human quality judgments. Generated by…

30
Hugging Face Daily Papers research 11d ago

Reinforcement Learning-Guided Retrieval with Soft Fusion for Robust Multimodal Imitation Learning under Missing Modalities

Abstract RL4IL enables robust robotic manipulation under sensor dropout by using reinforcement learning to retrieve relevant demonstrations and cross-attention fusion to impute missing modalities without retraining. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robotic systems…

23
Hugging Face Daily Papers research 11d ago

When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?

Abstract Offline reinforcement learning with trajectory-level outcome supervision presents statistical challenges that can be addressed through pessimistic actor-critic methods, though fundamental barriers exist for certain generalized outcome-based problems. Generated by…

35
Hugging Face Daily Papers research 11d ago

HiLo-Token: Input-Adaptive High-Low Frequency Token Compression for Efficient Image Editing

Abstract A novel token compression framework called HiLo-Token is introduced to accelerate Diffusion Transformers in image editing tasks by adaptively allocating tokens based on spatial frequency and context importance, achieving significant speedups without quality loss.…

27
Hugging Face Daily Papers research 11d ago

The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL

Abstract Discriminator-Guided Reinforcement Learning (DRL) addresses alignment issues in score- and flow-matching models by using a pretrained representation space discriminator as an optimal reward signal, improving both visual fidelity and semantic quality without human…

4
Hugging Face Daily Papers research 11d ago

MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

Abstract MaineCoon represents the first real-time audio-visual autoregressive model for social worlds, achieving high frame rates and long-horizon generation through novel training techniques and inference frameworks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As an increasing…

21
Hugging Face Daily Papers research 11d ago

MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction

Abstract 3D point motion forecasting model predicts object trajectories from visual history and language goals, demonstrating superior performance on benchmarks and transferring effectively to robot manipulation and video generation tasks. Generated by…

4
Hugging Face Daily Papers research 11d ago

ViT-Up: Faithful Feature Upsampling for Vision Transformers

Abstract ViT-Up is a feature upsampling framework for Vision Transformers that uses layer-wise query construction from hidden states to improve dense prediction tasks, outperforming existing image-guided methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision Transformers…

27
Hugging Face Daily Papers research 11d ago

Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns

Abstract The standard basis of transformer hidden states serves as a training-free, architecture-general feature representation where individual dimensions encode semantic content through signs and confidence through magnitudes, functioning as independent binary registers…

10
Hugging Face Daily Papers research 11d ago

iOSWorld: A Benchmark for Personally Intelligent Phone Agents

Abstract IOSWorld is introduced as the first interactive native iOS simulator benchmark featuring persistent user identity across multiple apps to evaluate personalized mobile agent capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A useful phone agent needs to be…

6

Distilling Examples into Task Instructions: Enhanced In-Context Learning for Real-World B2B Conversations

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation

WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents

PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models

Context-Aware RL for Agentic and Multimodal LLMs

Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

ReSyn: A Generalized Recursive Regular Expression Synthesis Framework

LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI

Configurable Clinical Information Extraction with Agentic RAG: What Works, What Breaks, and Why

The Data Manifold under the Microscope

The FID Lottery: Quantifying Hidden Randomness in Generative-Model Evaluation

Duration Aware Scheduling for ASR Serving Under Workload Drift

Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages

No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages

JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

Holo-World: Unified Camera, Object and Weather Control for Video World Model

Current World Models Lack a Persistent State Core

Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation

FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows

DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects

HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining

DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

Playful Agentic Robot Learning

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising

Understanding the Behaviors of Environment-aware Information Retrieval

FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines

Selective Synergistic Learning for Video Object-Centric Learning

Adaptive Volumetric Mechanical Property Fields Invariant to Resolution

FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

Thinking with Visual Grounding

LooseControlVideo: Directorial Video Control using Spatial Blocking

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

Re-Centering Humans in LLM Personalization

Reinforcement Learning-Guided Retrieval with Soft Fusion for Robust Multimodal Imitation Learning under Missing Modalities

When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?

HiLo-Token: Input-Adaptive High-Low Frequency Token Compression for Efficient Image Editing

The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL

MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction

ViT-Up: Faithful Feature Upsampling for Vision Transformers

Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns

iOSWorld: A Benchmark for Personally Intelligent Phone Agents