Hugging Face Daily Papers · · 6 min read

MoZoo:Unleashing Video Diffusion power in animal fur and muscle simulation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

MoZoo is a pioneering generative dynamics solver that revolutionizes cinematic animal production by directly synthesizing high-fidelity fur and muscle simulations from coarse meshes, effectively bypassing the labor-intensive refinement stages of traditional CG pipelines.</p>\n","updatedAt":"2026-05-29T01:55:30.486Z","author":{"_id":"6455afeabda0fbba412d4922","avatarUrl":"/avatars/367731ce1c71d1e19ff415a52ae4067d.svg","fullname":"Jun Liang","name":"utopiar","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8609601855278015},"editors":["utopiar"],"editorAvatarUrls":["/avatars/367731ce1c71d1e19ff415a52ae4067d.svg"],"reactions":[],"isReport":false}},{"id":"6a1a410f68425ef2b92077e7","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false},"createdAt":"2026-05-30T01:44:47.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis](https://huggingface.co/papers/2604.19720) (2026)\n* [MMPhysVideo: Scaling Physical Plausibility in Video Generation via Joint Multimodal Modeling](https://huggingface.co/papers/2604.02817) (2026)\n* [SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion](https://huggingface.co/papers/2605.23245) (2026)\n* [MMControl: Unified Multi-Modal Control for Joint Audio-Video Generation](https://huggingface.co/papers/2604.19679) (2026)\n* [AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model](https://huggingface.co/papers/2604.19747) (2026)\n* [Rein3D: Reinforced 3D Indoor Scene Generation with Panoramic Video Diffusion Models](https://huggingface.co/papers/2604.10578) (2026)\n* [Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers](https://huggingface.co/papers/2604.21592) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2604.19720\">ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.02817\">MMPhysVideo: Scaling Physical Plausibility in Video Generation via Joint Multimodal Modeling</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.23245\">SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.19679\">MMControl: Unified Multi-Modal Control for Joint Audio-Video Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.19747\">AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.10578\">Rein3D: Reinforced 3D Indoor Scene Generation with Panoramic Video Diffusion Models</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.21592\">Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{&quot;user&quot;:&quot;librarian-bot&quot;}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-30T01:44:47.898Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6607837677001953},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.13857","authors":[{"_id":"6a18f1a456b4bb14ec65ce19","name":"Dongxia Liu","hidden":false},{"_id":"6a18f1a456b4bb14ec65ce1a","name":"Jie Ma","hidden":false},{"_id":"6a18f1a456b4bb14ec65ce1b","name":"Xiaochen Yang","hidden":false},{"_id":"6a18f1a456b4bb14ec65ce1c","name":"Jiancheng Zhang","hidden":false},{"_id":"6a18f1a456b4bb14ec65ce1d","name":"Bin Xia","hidden":false},{"_id":"6a18f1a456b4bb14ec65ce1e","name":"Zhehan Kan","hidden":false},{"_id":"6a18f1a456b4bb14ec65ce1f","name":"Nisha Huang","hidden":false},{"_id":"6a18f1a456b4bb14ec65ce20","user":{"_id":"6455afeabda0fbba412d4922","avatarUrl":"/avatars/367731ce1c71d1e19ff415a52ae4067d.svg","isPro":false,"fullname":"Jun Liang","user":"utopiar","type":"user","name":"utopiar"},"name":"Jun Liang","status":"claimed_verified","statusLastChangedAt":"2026-05-29T09:33:34.416Z","hidden":false},{"_id":"6a18f1a456b4bb14ec65ce21","name":"Wenming Yang","hidden":false},{"_id":"6a18f1a456b4bb14ec65ce22","name":"Jin Li","hidden":false}],"publishedAt":"2026-04-08T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"MoZoo:Unleashing Video Diffusion power in animal fur and muscle simulation","submittedOnDailyBy":{"_id":"6455afeabda0fbba412d4922","avatarUrl":"/avatars/367731ce1c71d1e19ff415a52ae4067d.svg","isPro":false,"fullname":"Jun Liang","user":"utopiar","type":"user","name":"utopiar"},"summary":"The creation of cinematic-quality animal effects necessitates the precise modeling of muscle and fur dynamics, a process that remains both labor-intensive and computationally expensive within traditional production workflows. While generative diffusion models have shown promise in diverse artistic workflows, their capacity for high-fidelity animal simulation remains largely unexploited. We present MoZoo, a generative dynamics solver that bypasses conventional refinement to synthesize high-fidelity animal videos from coarse meshes under multimodal guidance. We propose Role-Aware RoPE (RAR-RoPE) which employs role-based index remapping to synchronize motion alignment while decoupling reference information via fixed temporal offsets. Complementing this, Asymmetric Decoupled Attention partitions the latent sequence to enforce a unidirectional information flow, effectively preventing feature interference and improving computational efficiency. To address the scarcity of high-quality training data, we introduce MoZoo-Data, a synthetic-to-real pipeline that leverages a rendering engine and an inverse mapping approach to construct a large-scale dataset of paired sequences. Furthermore, we establish MoZooBench, a comprehensive benchmark with 120 mesh-video pairs. Experimental results demonstrate that MoZoo achieves high-fidelity fur simulation across diverse animal skeletons and layouts, preserving superior temporal and structural consistency.","upvotes":1,"discussionId":"6a18f1a456b4bb14ec65ce23","projectPage":"https://orange-3dv-team.github.io/MoZoo/","githubRepo":"https://github.com/Orange-3DV-Team/MoZoo","githubRepoAddedBy":"user","ai_summary":"MoZoo generates high-fidelity animal videos from coarse meshes using diffusion models with novel attention mechanisms and a synthetic-to-real data pipeline.","ai_keywords":["generative diffusion models","motion alignment","role-based index remapping","temporal offsets","asymmetric decoupled attention","latent sequence","feature interference","synthetic-to-real pipeline","rendering engine","inverse mapping approach","comprehensive benchmark","mesh-video pairs","fur simulation","temporal consistency","structural consistency"],"githubStars":102,"organization":{"_id":"69670c472ad4f2c0e892575c","name":"Orange-Team","fullname":"Orange Team","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6455afeabda0fbba412d4922/Sy7zZn0kb-Q1SCSUwOn-9.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6455afeabda0fbba412d4922","avatarUrl":"/avatars/367731ce1c71d1e19ff415a52ae4067d.svg","isPro":false,"fullname":"Jun Liang","user":"utopiar","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"69670c472ad4f2c0e892575c","name":"Orange-Team","fullname":"Orange Team","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6455afeabda0fbba412d4922/Sy7zZn0kb-Q1SCSUwOn-9.png"}}">
Papers
arxiv:2605.13857

MoZoo:Unleashing Video Diffusion power in animal fur and muscle simulation

Published on Apr 8
· Submitted by
Jun Liang
on May 29
Authors:
,
,
,
,
,
,
,
,

Abstract

MoZoo generates high-fidelity animal videos from coarse meshes using diffusion models with novel attention mechanisms and a synthetic-to-real data pipeline.

AI-generated summary

The creation of cinematic-quality animal effects necessitates the precise modeling of muscle and fur dynamics, a process that remains both labor-intensive and computationally expensive within traditional production workflows. While generative diffusion models have shown promise in diverse artistic workflows, their capacity for high-fidelity animal simulation remains largely unexploited. We present MoZoo, a generative dynamics solver that bypasses conventional refinement to synthesize high-fidelity animal videos from coarse meshes under multimodal guidance. We propose Role-Aware RoPE (RAR-RoPE) which employs role-based index remapping to synchronize motion alignment while decoupling reference information via fixed temporal offsets. Complementing this, Asymmetric Decoupled Attention partitions the latent sequence to enforce a unidirectional information flow, effectively preventing feature interference and improving computational efficiency. To address the scarcity of high-quality training data, we introduce MoZoo-Data, a synthetic-to-real pipeline that leverages a rendering engine and an inverse mapping approach to construct a large-scale dataset of paired sequences. Furthermore, we establish MoZooBench, a comprehensive benchmark with 120 mesh-video pairs. Experimental results demonstrate that MoZoo achieves high-fidelity fur simulation across diverse animal skeletons and layouts, preserving superior temporal and structural consistency.

Community

Paper author Paper submitter 1 day ago

MoZoo is a pioneering generative dynamics solver that revolutionizes cinematic animal production by directly synthesizing high-fidelity fur and muscle simulations from coarse meshes, effectively bypassing the labor-intensive refinement stages of traditional CG pipelines.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.13857 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.13857 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.13857 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers