Diffusion as a real-time, playable musical instrument<br><a href=\"https://music.daydream.live\" rel=\"nofollow\">https://music.daydream.live</a></p>\n","updatedAt":"2026-06-01T17:37:48.053Z","author":{"_id":"66215aeabbe70ad73f9a2955","avatarUrl":"/avatars/74c7a7ac36f7b9b6d036425a70fff1fd.svg","fullname":"Ryan Fosdick","name":"ryanontheinside","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8270454406738281},"editors":["ryanontheinside"],"editorAvatarUrls":["/avatars/74c7a7ac36f7b9b6d036425a70fff1fd.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.28657","authors":[{"_id":"6a1d96ca808ddbc3c7d43962","name":"Ryan Fosdick","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/66215aeabbe70ad73f9a2955/pJeJ2O5WMtQ-6Gu97MgLc.mp4","https://cdn-uploads.huggingface.co/production/uploads/66215aeabbe70ad73f9a2955/ziCNF7njMb6pbzXeuX5yP.mp4","https://cdn-uploads.huggingface.co/production/uploads/66215aeabbe70ad73f9a2955/4UMfEMV8gkAzk84RXnsBs.mp4"],"publishedAt":"2026-05-27T00:00:00.000Z","submittedOnDailyAt":"2026-06-01T00:00:00.000Z","title":"DEMON: Diffusion Engine for Musical Orchestrated Noise","submittedOnDailyBy":{"_id":"66215aeabbe70ad73f9a2955","avatarUrl":"/avatars/74c7a7ac36f7b9b6d036425a70fff1fd.svg","isPro":true,"fullname":"Ryan Fosdick","user":"ryanontheinside","type":"user","name":"ryanontheinside"},"summary":"We present DEMON, a real-time diffusion engine that makes the denoising process playable as a live musical instrument: a control surface both broad (many parameters shaped per-frame across the output) and responsive (each control taking effect as fast as its place in the denoising loop allows). Built on ACE-Step 1.5 and StreamDiffusion's ring-buffer architecture with TensorRT acceleration, it sustains up to 12.3 decoder completions per second for 60-second music on a single consumer GPU (RTX 5090), or 11.3 generations per second at our production ring-depth of 4. At these rates denoising parameters become viable as live performance controls, but the ring buffer propagates per-request changes only at its drain rate, a floor of S denoising steps. We contribute four mechanisms. (1) Per-slot heterogeneous denoise scheduling: each ring-buffer slot owns its timestep schedule, so a moving denoise slider is tracked without wiping the in-flight queue, where the upstream global-schedule design must rebuild and discard it. (2) Shared mutable per-step state, giving any parameter consulted at every solver step next-tick effect, bypassing ring-buffer drain. (3) Per-frame source blending: a sampling-time control on the standard SDE re-noise step, giving a framewise transformation-strength axis that complements scalar denoise scheduling. (4) Windowed VAE decode exploiting receptive-field analysis for an 8.0x decode speedup. Together these separate streaming-diffusion parameters into four propagation classes, by onset and convergence latency.","upvotes":4,"discussionId":"6a1d96ca808ddbc3c7d43963","projectPage":"https://daydreamlive.github.io/DEMON/","githubRepo":"https://github.com/daydreamlive/DEMON","githubRepoAddedBy":"user","ai_summary":"DEMON enables real-time diffusion model control as a musical instrument through specialized scheduling, shared state management, and optimized decoding techniques.","ai_keywords":["diffusion engine","denoising process","denoising loop","ACE-Step 1.5","StreamDiffusion","ring-buffer architecture","TensorRT acceleration","denoising parameters","heterogeneous denoise scheduling","shared mutable per-step state","per-frame source blending","VAE decode","receptive-field analysis"],"githubStars":186,"organization":{"_id":"68d43c50f211456462350e16","name":"daydreamlive","fullname":"Daydream","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6748614dede9695a31968522/16Et2HPrajqiKIlasZyxy.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6748614dede9695a31968522","avatarUrl":"/avatars/7dc1974a7a1b053d9737728f737d88ca.svg","isPro":false,"fullname":"Livepeer Developer","user":"livepeer-dev","type":"user"},{"_id":"66f1e512944b17075eb8ed14","avatarUrl":"/avatars/185b51a4b352733e04ddceeb83d6f90f.svg","isPro":false,"fullname":"Hunter Hillman","user":"hthillman","type":"user"},{"_id":"6540a36372d9237eeaf522c5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/okPsaDu1YfIpqQHjbwctM.jpeg","isPro":false,"fullname":"Vibor Cipan","user":"viborc","type":"user"},{"_id":"68c309df8cfed9811df056ec","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/LFs2nyhrT_xJ2_ugBeSdf.png","isPro":false,"fullname":"Chris M","user":"cemnyc","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68d43c50f211456462350e16","name":"daydreamlive","fullname":"Daydream","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6748614dede9695a31968522/16Et2HPrajqiKIlasZyxy.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.28657.md"}">
DEMON: Diffusion Engine for Musical Orchestrated Noise
Abstract
DEMON enables real-time diffusion model control as a musical instrument through specialized scheduling, shared state management, and optimized decoding techniques.
AI-generated summary
We present DEMON, a real-time diffusion engine that makes the denoising process playable as a live musical instrument: a control surface both broad (many parameters shaped per-frame across the output) and responsive (each control taking effect as fast as its place in the denoising loop allows). Built on ACE-Step 1.5 and StreamDiffusion's ring-buffer architecture with TensorRT acceleration, it sustains up to 12.3 decoder completions per second for 60-second music on a single consumer GPU (RTX 5090), or 11.3 generations per second at our production ring-depth of 4. At these rates denoising parameters become viable as live performance controls, but the ring buffer propagates per-request changes only at its drain rate, a floor of S denoising steps. We contribute four mechanisms. (1) Per-slot heterogeneous denoise scheduling: each ring-buffer slot owns its timestep schedule, so a moving denoise slider is tracked without wiping the in-flight queue, where the upstream global-schedule design must rebuild and discard it. (2) Shared mutable per-step state, giving any parameter consulted at every solver step next-tick effect, bypassing ring-buffer drain. (3) Per-frame source blending: a sampling-time control on the standard SDE re-noise step, giving a framewise transformation-strength axis that complements scalar denoise scheduling. (4) Windowed VAE decode exploiting receptive-field analysis for an 8.0x decode speedup. Together these separate streaming-diffusion parameters into four propagation classes, by onset and convergence latency.
Community
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.28657 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.28657 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.