Demo: <a href=\"https://stephenbrade.github.io/lmdm-public/\" rel=\"nofollow\">https://stephenbrade.github.io/lmdm-public/</a><br>Code: <a href=\"https://github.com/ZacharyNovack/live-music-diffusion-models\" rel=\"nofollow\">https://github.com/ZacharyNovack/live-music-diffusion-models</a></p>\n","updatedAt":"2026-05-22T19:22:32.232Z","author":{"_id":"643060c6cb3fe707b24c53a2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/643060c6cb3fe707b24c53a2/MIoM9hrX0vV4XRyrm-4Kz.jpeg","fullname":"Zachary Novack","name":"ZacharyNovack","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.4647988975048065},"editors":["ZacharyNovack"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/643060c6cb3fe707b24c53a2/MIoM9hrX0vV4XRyrm-4Kz.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.22717","authors":[{"_id":"6a10aa0ed8ff13e4eeb25831","name":"Zachary Novack","hidden":false},{"_id":"6a10aa0ed8ff13e4eeb25832","name":"Stephen Brade","hidden":false},{"_id":"6a10aa0ed8ff13e4eeb25833","name":"Haven Kim","hidden":false},{"_id":"6a10aa0ed8ff13e4eeb25834","name":"Hugo Flores García","hidden":false},{"_id":"6a10aa0ed8ff13e4eeb25835","name":"Nithya Shikarpur","hidden":false},{"_id":"6a10aa0ed8ff13e4eeb25836","name":"Chinmay Talegaonkar","hidden":false},{"_id":"6a10aa0ed8ff13e4eeb25837","name":"Suwan Kim","hidden":false},{"_id":"6a10aa0ed8ff13e4eeb25838","name":"Valerie K. Chen","hidden":false},{"_id":"6a10aa0ed8ff13e4eeb25839","name":"Julian McAuley","hidden":false},{"_id":"6a10aa0ed8ff13e4eeb2583a","name":"Taylor Berg-Kirkpatrick","hidden":false},{"_id":"6a10aa0ed8ff13e4eeb2583b","name":"Cheng-Zhi Anna Huang","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/643060c6cb3fe707b24c53a2/zU9RiZNR42iatbLHNNHS5.mp4"],"publishedAt":"2026-05-21T00:00:00.000Z","submittedOnDailyAt":"2026-05-22T00:00:00.000Z","title":"Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators","submittedOnDailyBy":{"_id":"643060c6cb3fe707b24c53a2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/643060c6cb3fe707b24c53a2/MIoM9hrX0vV4XRyrm-4Kz.jpeg","isPro":false,"fullname":"Zachary Novack","user":"ZacharyNovack","type":"user","name":"ZacharyNovack"},"summary":"Interactive streaming music generation promises the use of generative models for live performance and co-creation that is impossible with offline models. However, SOTA models exist in the discrete-AR regime, requiring industrial levels of compute for both training and inference. In this work, we investigate whether audio diffusion models, with their wide support in the open-source community but non-streaming bidirectional nature, can be repurposed efficiently into interactive models accessible on consumer hardware. By taking a critical look at the modern pipeline for block-wise outpainting diffusion, we identify critical inefficiencies during inference that result in strictly worse computational efficiency than their discrete-AR counterparts. We propose Live Music Diffusion Models (LMDMs), a simple modification of the generative diffusion process that recovers, and then outperforms, the inference complexity of the discrete Live Music Models (LMMs) through block-wise KV Caching. Unlike LMMs, LMDMs further enable stable post-training alignment through our novel ARC-Forcing paradigm, reducing error accumulation without any explicit RL or reward models. We demonstrate the application of LMDMs in a number of creative domains, including text-conditioned generation, sketch-based music synthesis, and jamming. We finally show how LMDMs can be used as a generative instrument in a real artist-AI collaboration, utilizing LMDMs as a \"generative delay\" to transform musicians' improvisation live for variable timbral effects while running locally on a consumer gaming laptop.","upvotes":0,"discussionId":"6a10aa0ed8ff13e4eeb2583c","projectPage":"https://stephenbrade.github.io/lmdm-public/","githubRepo":"https://github.com/ZacharyNovack/live-music-diffusion-models","githubRepoAddedBy":"user","ai_summary":"Audio diffusion models are adapted for interactive music generation through efficient block-wise processing and novel training paradigms that enable real-time performance on consumer hardware.","ai_keywords":["audio diffusion models","generative models","interactive music generation","block-wise outpainting","inference complexity","block-wise KV Caching","Live Music Diffusion Models","ARC-Forcing paradigm","generative instrument","real-time performance"],"githubStars":5},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.22717.md"}">
Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators
Authors: ,
,
,
,
,
,
,
,
,
,
Abstract
Audio diffusion models are adapted for interactive music generation through efficient block-wise processing and novel training paradigms that enable real-time performance on consumer hardware.
AI-generated summary
Interactive streaming music generation promises the use of generative models for live performance and co-creation that is impossible with offline models. However, SOTA models exist in the discrete-AR regime, requiring industrial levels of compute for both training and inference. In this work, we investigate whether audio diffusion models, with their wide support in the open-source community but non-streaming bidirectional nature, can be repurposed efficiently into interactive models accessible on consumer hardware. By taking a critical look at the modern pipeline for block-wise outpainting diffusion, we identify critical inefficiencies during inference that result in strictly worse computational efficiency than their discrete-AR counterparts. We propose Live Music Diffusion Models (LMDMs), a simple modification of the generative diffusion process that recovers, and then outperforms, the inference complexity of the discrete Live Music Models (LMMs) through block-wise KV Caching. Unlike LMMs, LMDMs further enable stable post-training alignment through our novel ARC-Forcing paradigm, reducing error accumulation without any explicit RL or reward models. We demonstrate the application of LMDMs in a number of creative domains, including text-conditioned generation, sketch-based music synthesis, and jamming. We finally show how LMDMs can be used as a generative instrument in a real artist-AI collaboration, utilizing LMDMs as a "generative delay" to transform musicians' improvisation live for variable timbral effects while running locally on a consumer gaming laptop.
Community
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.22717 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.22717 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.22717 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.