Hugging Face Daily Papers · June 1, 2026 · 3 min read

DEMON: Diffusion Engine for Musical Orchestrated Noise

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Diffusion as a real-time, playable musical instrument<br><a href=\"https://music.daydream.live\" rel=\"nofollow\">https://music.daydream.live</a></p>\n","updatedAt":"2026-06-01T17:37:48.053Z","author":{"_id":"66215aeabbe70ad73f9a2955","avatarUrl":"/avatars/74c7a7ac36f7b9b6d036425a70fff1fd.svg","fullname":"Ryan Fosdick","name":"ryanontheinside","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8270454406738281},"editors":["ryanontheinside"],"editorAvatarUrls":["/avatars/74c7a7ac36f7b9b6d036425a70fff1fd.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.28657","authors":[{"_id":"6a1d96ca808ddbc3c7d43962","name":"Ryan Fosdick","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/66215aeabbe70ad73f9a2955/pJeJ2O5WMtQ-6Gu97MgLc.mp4","https://cdn-uploads.huggingface.co/production/uploads/66215aeabbe70ad73f9a2955/ziCNF7njMb6pbzXeuX5yP.mp4","https://cdn-uploads.huggingface.co/production/uploads/66215aeabbe70ad73f9a2955/4UMfEMV8gkAzk84RXnsBs.mp4"],"publishedAt":"2026-05-27T00:00:00.000Z","submittedOnDailyAt":"2026-06-01T00:00:00.000Z","title":"DEMON: Diffusion Engine for Musical Orchestrated Noise","submittedOnDailyBy":{"_id":"66215aeabbe70ad73f9a2955","avatarUrl":"/avatars/74c7a7ac36f7b9b6d036425a70fff1fd.svg","isPro":true,"fullname":"Ryan Fosdick","user":"ryanontheinside","type":"user","name":"ryanontheinside"},"summary":"We present DEMON, a real-time diffusion engine that makes the denoising process playable as a live musical instrument: a control surface both broad (many parameters shaped per-frame across the output) and responsive (each control taking effect as fast as its place in the denoising loop allows). Built on ACE-Step 1.5 and StreamDiffusion's ring-buffer architecture with TensorRT acceleration, it sustains up to 12.3 decoder completions per second for 60-second music on a single consumer GPU (RTX 5090), or 11.3 generations per second at our production ring-depth of 4. At these rates denoising parameters become viable as live performance controls, but the ring buffer propagates per-request changes only at its drain rate, a floor of S denoising steps. We contribute four mechanisms. (1) Per-slot heterogeneous denoise scheduling: each ring-buffer slot owns its timestep schedule, so a moving denoise slider is tracked without wiping the in-flight queue, where the upstream global-schedule design must rebuild and discard it. (2) Shared mutable per-step state, giving any parameter consulted at every solver step next-tick effect, bypassing ring-buffer drain. (3) Per-frame source blending: a sampling-time control on the standard SDE re-noise step, giving a framewise transformation-strength axis that complements scalar denoise scheduling. (4) Windowed VAE decode exploiting receptive-field analysis for an 8.0x decode speedup. Together these separate streaming-diffusion parameters into four propagation classes, by onset and convergence latency.","upvotes":4,"discussionId":"6a1d96ca808ddbc3c7d43963","projectPage":"https://daydreamlive.github.io/DEMON/","githubRepo":"https://github.com/daydreamlive/DEMON","githubRepoAddedBy":"user","ai_summary":"DEMON enables real-time diffusion model control as a musical instrument through specialized scheduling, shared state management, and optimized decoding techniques.","ai_keywords":["diffusion engine","denoising process","denoising loop","ACE-Step 1.5","StreamDiffusion","ring-buffer architecture","TensorRT acceleration","denoising parameters","heterogeneous denoise scheduling","shared mutable per-step state","per-frame source blending","VAE decode","receptive-field analysis"],"githubStars":186,"organization":{"_id":"68d43c50f211456462350e16","name":"daydreamlive","fullname":"Daydream","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6748614dede9695a31968522/16Et2HPrajqiKIlasZyxy.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6748614dede9695a31968522","avatarUrl":"/avatars/7dc1974a7a1b053d9737728f737d88ca.svg","isPro":false,"fullname":"Livepeer Developer","user":"livepeer-dev","type":"user"},{"_id":"66f1e512944b17075eb8ed14","avatarUrl":"/avatars/185b51a4b352733e04ddceeb83d6f90f.svg","isPro":false,"fullname":"Hunter Hillman","user":"hthillman","type":"user"},{"_id":"6540a36372d9237eeaf522c5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/okPsaDu1YfIpqQHjbwctM.jpeg","isPro":false,"fullname":"Vibor Cipan","user":"viborc","type":"user"},{"_id":"68c309df8cfed9811df056ec","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/LFs2nyhrT_xJ2_ugBeSdf.png","isPro":false,"fullname":"Chris M","user":"cemnyc","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68d43c50f211456462350e16","name":"daydreamlive","fullname":"Daydream","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6748614dede9695a31968522/16Et2HPrajqiKIlasZyxy.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.28657.md"}">

Papers

arxiv:2605.28657

DEMON: Diffusion Engine for Musical Orchestrated Noise

Published on May 27

· Submitted by

Ryan Fosdick on Jun 1

Daydream

Upvote

Authors:

Abstract

DEMON enables real-time diffusion model control as a musical instrument through specialized scheduling, shared state management, and optimized decoding techniques.

AI-generated summary

We present DEMON, a real-time diffusion engine that makes the denoising process playable as a live musical instrument: a control surface both broad (many parameters shaped per-frame across the output) and responsive (each control taking effect as fast as its place in the denoising loop allows). Built on ACE-Step 1.5 and StreamDiffusion's ring-buffer architecture with TensorRT acceleration, it sustains up to 12.3 decoder completions per second for 60-second music on a single consumer GPU (RTX 5090), or 11.3 generations per second at our production ring-depth of 4. At these rates denoising parameters become viable as live performance controls, but the ring buffer propagates per-request changes only at its drain rate, a floor of S denoising steps. We contribute four mechanisms. (1) Per-slot heterogeneous denoise scheduling: each ring-buffer slot owns its timestep schedule, so a moving denoise slider is tracked without wiping the in-flight queue, where the upstream global-schedule design must rebuild and discard it. (2) Shared mutable per-step state, giving any parameter consulted at every solver step next-tick effect, bypassing ring-buffer drain. (3) Per-frame source blending: a sampling-time control on the standard SDE re-noise step, giving a framewise transformation-strength axis that complements scalar denoise scheduling. (4) Windowed VAE decode exploiting receptive-field analysis for an 8.0x decode speedup. Together these separate streaming-diffusion parameters into four propagation classes, by onset and convergence latency.

View arXiv page View PDF Project page GitHub 186 Add to collection

Community

ryanontheinside

Paper submitter about 4 hours ago

Diffusion as a real-time, playable musical instrument
https://music.daydream.live

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.28657

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.28657 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.28657 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

No comments yet. Sign in and be the first to say something.

DEMON: Diffusion Engine for Musical Orchestrated Noise

Abstract

Community

Models citing this paper 1

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1

Discussion (0)

More from Hugging Face Daily Papers