Hugging Face Daily Papers

500 articles archived · Visit source ↗ · RSS

Hugging Face Daily Papers research 18d ago

DRIFT: A Residual Flow Adapter for Decoding Continuous Outputs in Vision-Language Models

Abstract DRIFT is a framework that adapts pretrained vision-language models for continuous decoding tasks by combining coarse prediction with iterative refinement through flow matching, improving performance across perception and planning tasks. Generated by…

12
Hugging Face Daily Papers research 18d ago

Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

Abstract Vision-language models can improve grounding performance under aggressive token reduction by replacing irreversible visual-token pruning with recoverable routing that allows tokens to re-enter the processing pipeline at later stages. Generated by…

16
Hugging Face Daily Papers research 18d ago

Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models

Abstract SKIM is an adaptive multi-resolution soft token compression framework that efficiently compresses procedural skills while maintaining task performance and enabling lightweight offline compression for frequently updated community skills. Generated by…

6
Hugging Face Daily Papers research 18d ago

τ-Rec: A Verifiable Benchmark for Agentic Recommender Systems

Abstract A benchmark for agentic recommender systems is introduced that uses verifiable rewards and controlled dialogue constraints to evaluate conversational agent reliability, revealing significant performance gaps among leading models. Generated by…

6
Hugging Face Daily Papers research 18d ago

On Subquadratic Architectures: From Applications to Principles

Abstract xLSTM demonstrates superior performance in sequence modeling tasks compared to Mamba-2 and Gated DeltaNet due to enhanced state tracking and memory dynamics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Transformers dominate modern sequence modeling, but their quadratic…

4
Hugging Face Daily Papers research 19d ago

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

Abstract ART enables parameter-efficient fine-tuning of frozen multimodal language models by optimizing raw visual input through gradient backpropagation, achieving performance comparable to LoRA while supporting pre-compiled computational graphs. Generated by…

8
Hugging Face Daily Papers research 19d ago

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

Abstract TRACE is a rollout allocation framework that improves reward contrast in multi-turn agentic reinforcement learning by dynamically distributing resources across tree-structured rollouts based on prefix-level informativeness. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

27
Hugging Face Daily Papers research 19d ago

FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching

Abstract FlowLet is a conditional generative framework that synthesizes age-conditioned 3D MRIs using flow matching in an invertible 3D wavelet domain, improving brain age prediction performance for underrepresented age groups. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Brain…

18
Hugging Face Daily Papers research 19d ago

POISE: Position-Aware Undetectable Skill Injection on LLM Agents

Abstract POISE is a stealthy skill-poisoning attack that embeds malicious triggers within benign-looking instructions, achieving high attack success rates while avoiding detection by LLM scanners that are overly sensitive to privileged tool operations. Generated by…

16
Hugging Face Daily Papers research 19d ago

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

Abstract Grammar-constrained decoding techniques used to ensure syntactic validity in code generation can be exploited as an attack surface, leading to the development of a jailbreak method called CodeSpear and a safety alignment approach named CodeShield. Generated by…

37
Hugging Face Daily Papers research 19d ago

Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency

Abstract PACI enables efficient asynchronous pipeline training by controlling forward/backward weight inconsistency through local gradient accumulation, achieving higher throughput and faster training time-to-accuracy without sacrificing stability or memory usage. Generated by…

9
Hugging Face Daily Papers research 19d ago

Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation

Abstract A lightweight approach combining a frozen pretrained time-series foundation model with a simple regression head achieves superior RUL prediction performance compared to various baseline methods on industrial sensor data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

15
Hugging Face Daily Papers research 19d ago

Large Language Models Are Overconfident in Their Own Responses

Abstract Instruction tuning degrades calibration in large language models, with chat templates exacerbating overconfidence through ownership bias, which can be mitigated by reframing model responses as user input during confidence assessment. Generated by…

22
Hugging Face Daily Papers research 19d ago

Distilling LLM Feedback for Lean Theorem Proving

Abstract Feedback Distillation improves post-training of reasoning models by using self-distillation with token-level supervision and privileged feedback from language models, offering better diversity and complementary benefits when combined with GRPO. Generated by…

38
Hugging Face Daily Papers research 19d ago

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

Abstract EvoTrainer autonomously evolves both language model policies and training harnesses through empirical feedback, demonstrating superior performance in complex reasoning and coding tasks compared to traditional handcrafted approaches. Generated by…

6
Hugging Face Daily Papers research 19d ago

Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

Abstract A training-free framework for spatial reasoning from egocentric videos that enables revisiting conclusions through synthesized novel-view videos generated from predicted 3D geometry. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Spatial reasoning from egocentric videos…

11
Hugging Face Daily Papers research 19d ago

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Abstract Researchers propose a novel router redesign for Mixture-of-Experts models that aligns router rows with the principal singular directions of expert matrices using Manifold Power Iteration to improve model effectiveness. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Router…

33
Hugging Face Daily Papers research 19d ago

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

Abstract A new benchmark and adapter protocol called Claw-SWE-Bench enables fair comparison of diverse coding agents by standardizing evaluation conditions and revealing the importance of adapter design for effective code generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

16
Hugging Face Daily Papers research 19d ago

ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics

Abstract A new benchmark called ComBench is introduced to evaluate large language models' combinatorial reasoning abilities through Olympiad-level problems that test both proof construction and explicit mathematical constructions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

37
Hugging Face Daily Papers research 19d ago

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Abstract An AI framework called Arbor enables autonomous scientific research by combining strategic coordination, isolated hypothesis testing, and a persistent knowledge tree to iteratively improve research outcomes across multiple domains. Generated by…

18
Hugging Face Daily Papers research 19d ago

TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders

Abstract TRL-Bench establishes a standardized benchmark for evaluating tabular representation learning models across multiple granularities, revealing that encoder performance varies by task type and requires capability-specific assessment rather than single leaderboard…

6
Hugging Face Daily Papers research 19d ago

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

Abstract Recursive automated composition framework enables scalable reinforcement learning for language models by automatically combining verifiable environments through compositional operators. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement Learning (RL) with…

11
Hugging Face Daily Papers research 19d ago

Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions

Abstract A teacher-student framework decouples complex reasoning from efficient reward deployment in text-to-image training, achieving superior preference accuracy and optimization performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reward models are central to…

22
Hugging Face Daily Papers research 19d ago

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

Abstract Large language model agents require specialized environments for training and evaluation, which can be categorized by their engineering lifecycle stages and evolved through various paradigms including neural and symbolic approaches. Generated by…

8
Hugging Face Daily Papers research 19d ago

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Abstract Embodied-R1.5 is a unified embodied foundation model that integrates embodied reasoning capabilities and achieves state-of-the-art performance on embodied vision-language benchmarks through a multi-task balanced reinforcement learning approach. Generated by…

35
Hugging Face Daily Papers research 19d ago

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

Abstract InternVideo3 enhances long-horizon multimodal tasks through Multimodal Contextual Reasoning and efficient attention mechanisms, demonstrating strong performance on video understanding benchmarks and video agent capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

18
Hugging Face Daily Papers research 19d ago

i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models

Abstract A comprehensive experimental study of text-to-image diffusion models reveals key design choices and training insights leading to the development of i1, a 3B-parameter model that matches leading performance while maintaining full openness. Generated by…

21
Hugging Face Daily Papers research 19d ago

World Model Self-Distillation: Training World Models to Solve General Tasks

Abstract A scalable framework combines self-distillation and reinforcement learning to transfer task-solving abilities from vision-language models to video diffusion models without requiring labeled task-video data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Pretrained video…

15
Hugging Face Daily Papers research 19d ago

Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

Abstract Bebop addresses the efficiency bottleneck in reinforcement learning training of large language models by optimizing multi-token prediction techniques through entropy-aware sampling and novel training objectives that improve acceptance rates and inference throughput.…

28
Hugging Face Daily Papers research 19d ago

World Pilot: Steering Vision-Language-Action Models with World-Action Priors

Abstract World Pilot enhances Vision-Language-Action models by incorporating dynamic scene evolution and trajectory priors from a World-Action Model, achieving superior performance in zero-shot out-of-distribution manipulation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

10
Hugging Face Daily Papers research 19d ago

Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay

Abstract Continual Instruction Tuning enables effective fine-tuning of large language models for low-resource language translation, achieving superior performance compared to standard instruction tuning and multilingual models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large…

4
Hugging Face Daily Papers research 19d ago

DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

Abstract A large-scale dataset called DeNovoSWE is introduced for training code agents to generate entire software repositories from documentation, significantly improving performance on long-horizon software engineering tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As the…

15
Hugging Face Daily Papers research 19d ago

ICA Lens: Interpreting Language Models Without Training Another Dictionary

Abstract Independent component analysis (ICA) is revived as an efficient method for discovering interpretable directions in language model representations, offering a faster alternative to sparse autoencoder training while maintaining competitive performance in probing tasks.…

22
Hugging Face Daily Papers research 19d ago

PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers on Overleaf

Abstract A human-centered writing assistant system called PaperMentor integrates expert research advice with specialized agents to provide actionable feedback during manuscript drafting, outperforming AI baselines in usability and relevance. Generated by…

38
Hugging Face Daily Papers research 19d ago

When Behavioral Safety Evaluation Fails: A Representation-Level Perspective

Abstract Behavioral safety evaluations of large language models provide incomplete insights into internal robustness, as demonstrated by the audit gap between observable outputs and latent space vulnerabilities revealed through intervention-based testing. Generated by…

38
Hugging Face Daily Papers research 19d ago

In-Context Multiple Instance Learning

Abstract Pretraining a Perceiver-style architecture on synthetic bag-structured data enables efficient, task-adaptive classification from few labeled examples in multiple instance learning scenarios. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Multiple Instance Learning (MIL)…

10
Hugging Face Daily Papers research 19d ago

MilliVid: Hierarchical Latents for Long-Range Consistency in Video Generation

Abstract Video generative models achieve improved long-range consistency through coarse-to-fine token generation using a multi-scale autoencoder and diffusion model architecture. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video generative models have become increasingly…

28
Hugging Face Daily Papers research 19d ago

Decentralized Multi-Agent Systems with Shared Context

Abstract Decentralized Language Models (DeLM) framework enables scalable large language model reasoning through parallel agents that asynchronously coordinate via a shared verified context, improving performance and efficiency over centralized approaches. Generated by…

25
Hugging Face Daily Papers research 19d ago

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

Abstract SkillHarm is a benchmark for evaluating skill-based attacks across the skill-use lifecycle, demonstrating significant vulnerabilities in current agents with attack success rates up to 86.3%. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Agent skills occupy a privileged…

36
Hugging Face Daily Papers research 19d ago

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

Abstract CapCode framework uses randomized testing with performance caps to detect and prevent shortcut exploitation in agent evaluation, while CapReward rewards systems that adhere to intended task specifications. Generated by Qwen/Qwen2.5-Coder-32B-Instruct A growing failure…

21
Hugging Face Daily Papers research 19d ago

The Role of Feedback Alignment in Self-Distillation

Abstract Self-distillation effectiveness depends on structural alignment between feedback and solver reasoning, with step-aligned critique outperforming binary rewards and reference solutions by targeting specific reasoning failures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

32
Hugging Face Daily Papers research 19d ago

Next Forcing: Causal World Modeling with Multi-Chunk Prediction

Abstract Next Forcing introduces a multi-chunk prediction framework that accelerates training and inference for autoregressive video generation while improving accuracy and physical law adherence. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Autoregressive video generation has…

19
Hugging Face Daily Papers research 19d ago

FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion

Abstract FadeMem introduces a distance-aware key-value memory consolidation mechanism that organizes historical video data into a temporal hierarchy, improving long-video generation by preserving recent context and long-range anchors under fixed cache constraints. Generated by…

36
Hugging Face Daily Papers research 20d ago

Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning

Abstract CPPO addresses limitations in reinforcement learning with verifiable rewards by introducing position-weighted thresholds and cumulative prefix budgeting to better handle autoregressive generation challenges. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reinforcement…

12
Hugging Face Daily Papers research 20d ago

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

Abstract Sparse autoencoders trained on language model representations reveal interpretable features for speech synthesis that can be manipulated to control linguistic and prosodic attributes. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Language models increasingly serve as the…

19
Hugging Face Daily Papers research 20d ago

Kwai Keye-VL-2.0 Technical Report

Abstract Kwai Keye-VL-2.0-30B-A3B is an open-source Mixture-of-Experts multimodal foundation model that enables long-video understanding and agentic intelligence through DeepSeek Sparse Attention and specialized training infrastructure. Generated by…

36
Hugging Face Daily Papers research 20d ago

IR3DE: A Linear Router for Large Language Models

Abstract A ridge regression-based routing method achieves competitive performance in selecting domain-expert LLMs for different tasks while enabling dynamic addition/removal of experts without retraining. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Foundational Large Language…

28
Hugging Face Daily Papers research 20d ago

PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models

Abstract A psychologically-informed refusal framework called PsychoSafe is developed for large language models to improve harmful request handling through structured supportive communication, showing enhanced refusal quality and resource referral while maintaining performance on…

14
Hugging Face Daily Papers research 20d ago

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

Abstract BrainSurgery is a tool for robust and reproducible tensor manipulation of neural network checkpoints through declarative YAML plans with built-in validation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct As deep learning models scale, managing, inspecting, and modifying…

12
Hugging Face Daily Papers research 20d ago

UniPET: a universal network for high-quality PET image denoising across varied dose reduction factors

Abstract A universal PET image denoising framework addresses variability in dose reduction factors through domain generalization techniques and region-aware learning strategies. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Most existing deep learning-based PET image denoising…

26

DRIFT: A Residual Flow Adapter for Decoding Continuous Outputs in Vision-Language Models

Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models

τ-Rec: A Verifiable Benchmark for Agentic Recommender Systems

On Subquadratic Architectures: From Applications to Principles

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching

POISE: Position-Aware Undetectable Skill Injection on LLM Agents

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency

Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation

Large Language Models Are Overconfident in Their Own Responses

Distilling LLM Feedback for Lean Theorem Proving

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models

World Model Self-Distillation: Training World Models to Solve General Tasks

Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

World Pilot: Steering Vision-Language-Action Models with World-Action Priors

Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay

DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

ICA Lens: Interpreting Language Models Without Training Another Dictionary

PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers on Overleaf

When Behavioral Safety Evaluation Fails: A Representation-Level Perspective

In-Context Multiple Instance Learning

MilliVid: Hierarchical Latents for Long-Range Consistency in Video Generation

Decentralized Multi-Agent Systems with Shared Context

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

The Role of Feedback Alignment in Self-Distillation

Next Forcing: Causal World Modeling with Multi-Chunk Prediction

FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion

Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

Kwai Keye-VL-2.0 Technical Report

IR3DE: A Linear Router for Large Language Models

PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

UniPET: a universal network for high-quality PET image denoising across varied dose reduction factors