DrawMotion introduces a diffusion-based framework for generating 3D human motions from freehand drawings and text, using a Multi-Condition Module for efficient multi-condition fusion and guidance.</p>\n","updatedAt":"2026-05-21T02:38:05.423Z","author":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","fullname":"taesiri","name":"taesiri","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":302,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6966844201087952},"editors":["taesiri"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.20955","authors":[{"_id":"6a0e6fe9164dbbc68a26c499","name":"Tao Wang","hidden":false},{"_id":"6a0e6fe9164dbbc68a26c49a","name":"Lei Jin","hidden":false},{"_id":"6a0e6fe9164dbbc68a26c49b","name":"Zhihua Wu","hidden":false},{"_id":"6a0e6fe9164dbbc68a26c49c","name":"Qiaozhi He","hidden":false},{"_id":"6a0e6fe9164dbbc68a26c49d","name":"Jiaming Chu","hidden":false},{"_id":"6a0e6fe9164dbbc68a26c49e","name":"Yu Cheng","hidden":false},{"_id":"6a0e6fe9164dbbc68a26c49f","name":"Junliang Xing","hidden":false},{"_id":"6a0e6fe9164dbbc68a26c4a0","name":"Jian Zhao","hidden":false},{"_id":"6a0e6fe9164dbbc68a26c4a1","name":"Shuicheng Yan","hidden":false},{"_id":"6a0e6fe9164dbbc68a26c4a2","name":"Li Wang","hidden":false}],"publishedAt":"2026-05-20T00:00:00.000Z","submittedOnDailyAt":"2026-05-21T00:00:00.000Z","title":"DrawMotion: Generating 3D Human Motions by Freehand Drawing","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user","name":"taesiri"},"summary":"Text-to-motion generation, which translates textual descriptions into human motions, faces the challenge that users often struggle to precisely convey their intended motions through text alone. To address this issue, this paper introduces DrawMotion, an efficient diffusion-based framework designed for multi-condition scenarios. DrawMotion generates motions based on both a conventional text condition and a novel hand-drawing condition, which provide semantic and spatial control over the generated motions, respectively. Specifically, we tackle the fine-grained motion generation task from three perspectives: 1) freehand drawing condition. To accurately capture users' intended motions without requiring tedious textual input, we develop an algorithm to automatically generate hand-drawn stickman sketches across different dataset formats; 2) multi-condition fusion. We propose a Multi-Condition Module (MCM) that is integrated into the diffusion process, enabling the model to exploit all possible condition combinations while reducing computational complexity compared to conventional approaches; and 3) training-free guidance. Notably, the MCM in DrawMotion ensures that its intermediate features lie in a continuous space, allowing classifier-guidance gradients to update the features and thereby aligning the generated motions with user intentions while preserving fidelity. Quantitative experiments and user studies demonstrate that the freehand drawing approach reduces user time by approximately 46.7% when generating motions aligned with their imagination. The code, demos, and relevant data are publicly available at https://github.com/InvertedForest/DrawMotion.","upvotes":1,"discussionId":"6a0e6fe9164dbbc68a26c4a3","githubRepo":"https://github.com/InvertedForest/DrawMotion","githubRepoAddedBy":"user","ai_summary":"DrawMotion is an efficient diffusion-based framework that generates human motions using both text and hand-drawn sketches, reducing user effort by 46.7% while maintaining motion fidelity.","ai_keywords":["diffusion-based framework","text-to-motion generation","hand-drawing condition","multi-condition fusion","Multi-Condition Module","classifier-guidance","motion generation","freehand drawing","sketch generation"],"githubStars":4},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"69980394b420b1ad21335297","avatarUrl":"/avatars/4f8e869e7a9542d63de1e1949d3bc5c8.svg","isPro":false,"fullname":"Harperrami84","user":"harperrami84","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.20955.md"}">
DrawMotion: Generating 3D Human Motions by Freehand Drawing
Authors: ,
,
,
,
,
,
,
,
,
Abstract
DrawMotion is an efficient diffusion-based framework that generates human motions using both text and hand-drawn sketches, reducing user effort by 46.7% while maintaining motion fidelity.
AI-generated summary
Text-to-motion generation, which translates textual descriptions into human motions, faces the challenge that users often struggle to precisely convey their intended motions through text alone. To address this issue, this paper introduces DrawMotion, an efficient diffusion-based framework designed for multi-condition scenarios. DrawMotion generates motions based on both a conventional text condition and a novel hand-drawing condition, which provide semantic and spatial control over the generated motions, respectively. Specifically, we tackle the fine-grained motion generation task from three perspectives: 1) freehand drawing condition. To accurately capture users' intended motions without requiring tedious textual input, we develop an algorithm to automatically generate hand-drawn stickman sketches across different dataset formats; 2) multi-condition fusion. We propose a Multi-Condition Module (MCM) that is integrated into the diffusion process, enabling the model to exploit all possible condition combinations while reducing computational complexity compared to conventional approaches; and 3) training-free guidance. Notably, the MCM in DrawMotion ensures that its intermediate features lie in a continuous space, allowing classifier-guidance gradients to update the features and thereby aligning the generated motions with user intentions while preserving fidelity. Quantitative experiments and user studies demonstrate that the freehand drawing approach reduces user time by approximately 46.7% when generating motions aligned with their imagination. The code, demos, and relevant data are publicly available at https://github.com/InvertedForest/DrawMotion.
Community
DrawMotion introduces a diffusion-based framework for generating 3D human motions from freehand drawings and text, using a Multi-Condition Module for efficient multi-condition fusion and guidance.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.20955 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.20955 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.20955 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.