Hugging Face Daily Papers · · 4 min read

D^2-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

26 pages, 8 figures, 11 tables</p>\n","updatedAt":"2026-05-27T08:44:17.240Z","author":{"_id":"68fb48cdf17f694439d34daa","avatarUrl":"/avatars/73c4a84d6f9e24d7d6c4d82fc2861a6d.svg","fullname":"Yupeng Chen","name":"Samchen374","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"fr","probability":0.2475743144750595},"editors":["Samchen374"],"editorAvatarUrls":["/avatars/73c4a84d6f9e24d7d6c4d82fc2861a6d.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.25893","authors":[{"_id":"6a155c09b57a1823d5708e0d","user":{"_id":"6914c35db0eda273f94559e8","avatarUrl":"/avatars/e89fa91feddd32d1b9cd1969f10d98b3.svg","isPro":false,"fullname":"Aoxi Liu","user":"LAX666","type":"user","name":"LAX666"},"name":"Aoxi Liu","status":"claimed_verified","statusLastChangedAt":"2026-05-27T08:00:40.810Z","hidden":false},{"_id":"6a155c09b57a1823d5708e0e","user":{"_id":"68fb48cdf17f694439d34daa","avatarUrl":"/avatars/73c4a84d6f9e24d7d6c4d82fc2861a6d.svg","isPro":false,"fullname":"Yupeng Chen","user":"Samchen374","type":"user","name":"Samchen374"},"name":"Yupeng Chen","status":"claimed_verified","statusLastChangedAt":"2026-05-27T07:54:40.209Z","hidden":false},{"_id":"6a155c09b57a1823d5708e0f","name":"James Oldfield","hidden":false},{"_id":"6a155c09b57a1823d5708e10","user":{"_id":"66a917ab4122a1a25677ff7b","avatarUrl":"/avatars/d3d056e4e9eb40a2d65c33252b393122.svg","isPro":false,"fullname":"Hong","user":"Ed267","type":"user","name":"Ed267"},"name":"Guanzhe Hong","status":"claimed_verified","statusLastChangedAt":"2026-05-27T07:54:46.866Z","hidden":false},{"_id":"6a155c09b57a1823d5708e11","name":"Junchi Yu","hidden":false},{"_id":"6a155c09b57a1823d5708e12","name":"Baoyuan Wu","hidden":false},{"_id":"6a155c09b57a1823d5708e13","name":"Philip Torr","hidden":false},{"_id":"6a155c09b57a1823d5708e14","name":"Adel Bibi","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/68fb48cdf17f694439d34daa/JMVzr0qiY0k5Vpqsf5DZL.png"],"publishedAt":"2026-05-25T00:00:00.000Z","submittedOnDailyAt":"2026-05-27T00:00:00.000Z","title":"D^2-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing","submittedOnDailyBy":{"_id":"68fb48cdf17f694439d34daa","avatarUrl":"/avatars/73c4a84d6f9e24d7d6c4d82fc2861a6d.svg","isPro":false,"fullname":"Yupeng Chen","user":"Samchen374","type":"user","name":"Samchen374"},"summary":"Despite the emergence of diffusion large language models (D-LLMs) as an alternative to autoregressive large language models (AR-LLMs), safety monitoring for D-LLMs remains largely unexplored. Unlike AR-LLMs, D-LLMs generate text through a multi-step denoising process, exposing intermediate hidden representations that may contain safety-relevant information unavailable in standard single-step monitoring setups. Motivated by the suitability of lightweight probes for always-on monitoring, we analyze which trajectory-level signals best indicate when such probes are likely to struggle. We find that the most informative signal is safety hesitation: intermediate hidden states repeatedly falling within a small margin of the probe's decision boundary. The number of such hesitation steps in D-LLM's trajectory predicts probe failure effectively, providing a proxy of sample difficulty. Building on this analysis, we propose D^2-Monitor, a bi-level safety monitor for D-LLMs. D^2-Monitor adopts a lightweight probe as an always-on monitor to jointly estimate hesitation and perform base classification. When the hesitation level exceeds a threshold, a more expressive but computationally heavier probe is activated. This dynamic routing mechanism allocates monitoring resources efficiently at test time. Evaluated on 3 datasets (WildguardMix, ToxicChat, OpenAI-Moderation) across 4 D-LLMs, D^2-Monitor achieves state-of-the-art performance with a compact parameter footprint (leq 0.85M parameters), and exhibits the best trade-off between effectiveness and efficiency relative to 8 baselines.","upvotes":27,"discussionId":"6a155c09b57a1823d5708e15","ai_summary":"Diffusion large language models generate text through multi-step denoising processes that expose intermediate representations useful for safety monitoring, leading to the development of a bi-level safety monitor that dynamically routes computational resources based on hesitation detection.","ai_keywords":["diffusion large language models","autoregressive large language models","denoising process","intermediate hidden representations","lightweight probes","safety hesitation","decision boundary","bi-level safety monitor","dynamic routing mechanism","parameter-efficient monitoring"],"organization":{"_id":"627bbc28fbab61b048eba8b6","name":"Oxford","fullname":"University of Oxford","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/u0ey2LfYu6uG6iu8m_kH7.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65676b1711b2bbd6c2ab093a","avatarUrl":"/avatars/bdd4364057c6b9e54d7ec451ad1ffb64.svg","isPro":false,"fullname":"mingdazhang","user":"mingdazhang","type":"user"},{"_id":"663023917cff1537e3e8d494","avatarUrl":"/avatars/fc58113e540708dc348456e6ddd6a116.svg","isPro":true,"fullname":"Xiaoyu Zhang","user":"Billpai","type":"user"},{"_id":"63061ebd435ec751b7271c3a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1675715934445-63061ebd435ec751b7271c3a.jpeg","isPro":false,"fullname":"james oldfield","user":"james-oldfield","type":"user"},{"_id":"66a917ab4122a1a25677ff7b","avatarUrl":"/avatars/d3d056e4e9eb40a2d65c33252b393122.svg","isPro":false,"fullname":"Hong","user":"Ed267","type":"user"},{"_id":"68eddf812ff6e57ba6d48109","avatarUrl":"/avatars/31856c8bc620472a07e5d36c74dea9f3.svg","isPro":false,"fullname":"Kevin Connor","user":"LKCY23","type":"user"},{"_id":"652049ecfe5881ad35a1e0c6","avatarUrl":"/avatars/fa6a9f584b62a7db7f9c3b9a346590cf.svg","isPro":false,"fullname":"Sean Wu","user":"SeanWu25","type":"user"},{"_id":"66b473a853a224f2de18955c","avatarUrl":"/avatars/a963c019c11eed739b43a8cbfc8d2bab.svg","isPro":false,"fullname":"HZY","user":"ZeyuanHE","type":"user"},{"_id":"66b9b8411e06442d208de3e1","avatarUrl":"/avatars/e1ebdce8275b66252caa05dbefb4b354.svg","isPro":false,"fullname":"Junchi Yu","user":"junchiyu","type":"user"},{"_id":"6790acb39551780939ae9d3d","avatarUrl":"/avatars/6bdd572d75b7b6736d25f48da09438cc.svg","isPro":false,"fullname":"Baicheng Chen","user":"Danny-1223","type":"user"},{"_id":"648068857d65d9ac172fc503","avatarUrl":"/avatars/f264cbf7651d95919e8f42c1775b8966.svg","isPro":false,"fullname":"Bo Zheng","user":"bzheng1024","type":"user"},{"_id":"69d5be25581906c13123b1ee","avatarUrl":"/avatars/050792f5438fd568b96144ff6cf41392.svg","isPro":false,"fullname":"Weinan Guan","user":"weinan1996","type":"user"},{"_id":"6752a9d19df9ef3aeb162db2","avatarUrl":"/avatars/30a6d3d0b0a432a206daa86222a922b7.svg","isPro":false,"fullname":"Kangran ZHAO","user":"KeithKKRR","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"627bbc28fbab61b048eba8b6","name":"Oxford","fullname":"University of Oxford","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/u0ey2LfYu6uG6iu8m_kH7.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.25893.md"}">
Papers
arxiv:2605.25893

D^2-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing

Published on May 25
· Submitted by
Yupeng Chen
on May 27
Authors:
,
,
,
,

Abstract

Diffusion large language models generate text through multi-step denoising processes that expose intermediate representations useful for safety monitoring, leading to the development of a bi-level safety monitor that dynamically routes computational resources based on hesitation detection.

AI-generated summary

Despite the emergence of diffusion large language models (D-LLMs) as an alternative to autoregressive large language models (AR-LLMs), safety monitoring for D-LLMs remains largely unexplored. Unlike AR-LLMs, D-LLMs generate text through a multi-step denoising process, exposing intermediate hidden representations that may contain safety-relevant information unavailable in standard single-step monitoring setups. Motivated by the suitability of lightweight probes for always-on monitoring, we analyze which trajectory-level signals best indicate when such probes are likely to struggle. We find that the most informative signal is safety hesitation: intermediate hidden states repeatedly falling within a small margin of the probe's decision boundary. The number of such hesitation steps in D-LLM's trajectory predicts probe failure effectively, providing a proxy of sample difficulty. Building on this analysis, we propose D^2-Monitor, a bi-level safety monitor for D-LLMs. D^2-Monitor adopts a lightweight probe as an always-on monitor to jointly estimate hesitation and perform base classification. When the hesitation level exceeds a threshold, a more expressive but computationally heavier probe is activated. This dynamic routing mechanism allocates monitoring resources efficiently at test time. Evaluated on 3 datasets (WildguardMix, ToxicChat, OpenAI-Moderation) across 4 D-LLMs, D^2-Monitor achieves state-of-the-art performance with a compact parameter footprint (leq 0.85M parameters), and exhibits the best trade-off between effectiveness and efficiency relative to 8 baselines.

Community

Paper author Paper submitter about 2 hours ago

26 pages, 8 figures, 11 tables

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.25893
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.25893 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.25893 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.25893 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers