Hugging Face Daily Papers · May 26, 2026 · 4 min read

Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Meet Soap2Soap — Video-to-Video generation via multi-agent collaboration.\nTransform any video into a fully stylized animated version — Pixar, Disney, LEGO, anime, clay, and more — with consistent characters, environments, and cinematic composition preserved across every shot.\n","updatedAt":"2026-05-26T01:08:48.034Z","author":{"_id":"64440be5af034cdfd69ca3a7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64440be5af034cdfd69ca3a7/qmx24QiDFT29vleCxL9TX.jpeg","fullname":"Qinghong (Kevin) Lin","name":"KevinQHLin","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":46,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8257251977920532},"editors":["KevinQHLin"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64440be5af034cdfd69ca3a7/qmx24QiDFT29vleCxL9TX.jpeg"],"reactions":[],"isReport":false}},{"id":"6a1654a91ddb71b163e9f6eb","author":{"_id":"64440be5af034cdfd69ca3a7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64440be5af034cdfd69ca3a7/qmx24QiDFT29vleCxL9TX.jpeg","fullname":"Qinghong (Kevin) Lin","name":"KevinQHLin","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":46,"isUserFollowing":false},"createdAt":"2026-05-27T02:19:21.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration","html":"Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration\n","updatedAt":"2026-05-27T02:19:21.011Z","author":{"_id":"64440be5af034cdfd69ca3a7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64440be5af034cdfd69ca3a7/qmx24QiDFT29vleCxL9TX.jpeg","fullname":"Qinghong (Kevin) Lin","name":"KevinQHLin","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":46,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6175588965415955},"editors":["KevinQHLin"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64440be5af034cdfd69ca3a7/qmx24QiDFT29vleCxL9TX.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.17423","authors":[{"_id":"6a14f1bcb57a1823d5708997","name":"Yiren Song","hidden":false},{"_id":"6a14f1bcb57a1823d5708998","name":"Huilin Zhong","hidden":false},{"_id":"6a14f1bcb57a1823d5708999","name":"Kevin Qinghong Lin","hidden":false},{"_id":"6a14f1bcb57a1823d570899a","name":"Haofan Wang","hidden":false},{"_id":"6a14f1bcb57a1823d570899b","name":"Mike Zheng Shou","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/64440be5af034cdfd69ca3a7/Ixisl8uY4NUeTgZdFMSQj.mp4"],"publishedAt":"2026-05-17T00:00:00.000Z","submittedOnDailyAt":"2026-05-27T00:00:00.000Z","title":"Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration","submittedOnDailyBy":{"_id":"64440be5af034cdfd69ca3a7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64440be5af034cdfd69ca3a7/qmx24QiDFT29vleCxL9TX.jpeg","isPro":false,"fullname":"Qinghong (Kevin) Lin","user":"KevinQHLin","type":"user","name":"KevinQHLin"},"summary":"We study series-level cinematic remaking, a long-horizon video-to-video generation problem that localizes full episodes or films via stylization or actor replacement while strictly preserving narrative structure, motion choreography, and character identity across hundreds of shots. Existing video generation and editing pipelines often break down in this regime due to compounding identity drift, background mutation, and semantic erosion under large camera motions and viewpoint changes. We propose Soap2Soap, a multi-agent framework that enforces long-term language-visual consistency through a Dual-Bridge Consistency mechanism: a scene-aware JSON screenplay serving as a persistent semantic backbone, and dynamically allocated visual reference anchors at both scene and shot levels. To suppress drift before video synthesis, we introduce batch keyframe consistency, jointly generating multiple keyframes in a shared latent context via a grid-based formulation. A closed-loop verification agent further audits identity, stability, and alignment to trigger selective regeneration. Experiments on SoapBench demonstrate strong improvements over commercial video generation APIs in long-term consistency and narrative fidelity.","upvotes":14,"discussionId":"6a14f1bdb57a1823d570899c","githubRepo":"https://github.com/showlab/Soap2Soap","githubRepoAddedBy":"user","ai_summary":"A multi-agent framework called Soap2Soap is presented for long-horizon video-to-video generation that maintains narrative structure and character identity across extended sequences through consistent semantic backbone and visual reference anchors.","ai_keywords":["video-to-video generation","cinematic remaking","narrative structure","identity drift","background mutation","semantic erosion","multi-agent framework","Dual-Bridge Consistency","JSON screenplay","visual reference anchors","keyframe consistency","latent context","closed-loop verification","SoapBench"],"githubStars":31,"organization":{"_id":"63a553c4ce5763e06f78669c","name":"showlab","fullname":"Show Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1671779505215-63a55320ce5763e06f78519c.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65519eb532f278f503b3b2c3","avatarUrl":"/avatars/2e180f7b20189cd2d8a75e05c2913c5d.svg","isPro":false,"fullname":"QuanjianSong","user":"QuanjianSong","type":"user"},{"_id":"633407947eb49986ce070a6c","avatarUrl":"/avatars/84245495d36f605a900950a3a76d4386.svg","isPro":false,"fullname":"song yiren","user":"songyiren","type":"user"},{"_id":"64440be5af034cdfd69ca3a7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64440be5af034cdfd69ca3a7/qmx24QiDFT29vleCxL9TX.jpeg","isPro":false,"fullname":"Qinghong (Kevin) Lin","user":"KevinQHLin","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"65d9be67be18bfea69c63830","avatarUrl":"/avatars/fe68775d214b76f8812db0d066d5be63.svg","isPro":false,"fullname":"Jialong Sun","user":"Pillow-1","type":"user"},{"_id":"67e9fc3797cd6860c81d5838","avatarUrl":"/avatars/6c37731156bf52c123bd390823890d28.svg","isPro":false,"fullname":"Jangho Park","user":"jhpark96","type":"user"},{"_id":"6411c801e872ae3fb1e2c96e","avatarUrl":"/avatars/f8898dc13d700e545eedbbfab1c18353.svg","isPro":true,"fullname":"Franklin","user":"Franklinzhang","type":"user"},{"_id":"637f114c1dbae0919108987d","avatarUrl":"/avatars/23d73811b697261ceb80ef1b0806a633.svg","isPro":false,"fullname":"Zizhao Tong","user":"zizhaotong","type":"user"},{"_id":"6836a5aa14ebb7ff0c031871","avatarUrl":"/avatars/8bb04314e9b1cb3927b159f511796703.svg","isPro":false,"fullname":"Yang Pei","user":"yangpei-comp","type":"user"},{"_id":"623461fccd8a0462e55b3666","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1647600114080-noauth.jpeg","isPro":true,"fullname":"Guian Fang","user":"Enderfga","type":"user"},{"_id":"636b3f9ce3ad78bc68b67541","avatarUrl":"/avatars/2b7e745953ae39e01222e99fb63b279e.svg","isPro":false,"fullname":"yuxuan","user":"zzyx","type":"user"},{"_id":"68ea15e4d7b8e72eb586c1f9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/tRzkH43YZiPs3pIgUCWVd.png","isPro":false,"fullname":"Huilin Zhong","user":"ZhongLv","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"63a553c4ce5763e06f78669c","name":"showlab","fullname":"Show Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1671779505215-63a55320ce5763e06f78519c.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.17423.md"}">

Papers

arxiv:2605.17423

Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration

Published on May 17

· Submitted by

Qinghong (Kevin) Lin on May 27

Show Lab

Upvote

Authors:

Abstract

A multi-agent framework called Soap2Soap is presented for long-horizon video-to-video generation that maintains narrative structure and character identity across extended sequences through consistent semantic backbone and visual reference anchors.

AI-generated summary

We study series-level cinematic remaking, a long-horizon video-to-video generation problem that localizes full episodes or films via stylization or actor replacement while strictly preserving narrative structure, motion choreography, and character identity across hundreds of shots. Existing video generation and editing pipelines often break down in this regime due to compounding identity drift, background mutation, and semantic erosion under large camera motions and viewpoint changes. We propose Soap2Soap, a multi-agent framework that enforces long-term language-visual consistency through a Dual-Bridge Consistency mechanism: a scene-aware JSON screenplay serving as a persistent semantic backbone, and dynamically allocated visual reference anchors at both scene and shot levels. To suppress drift before video synthesis, we introduce batch keyframe consistency, jointly generating multiple keyframes in a shared latent context via a grid-based formulation. A closed-loop verification agent further audits identity, stability, and alignment to trigger selective regeneration. Experiments on SoapBench demonstrate strong improvements over commercial video generation APIs in long-term consistency and narrative fidelity.

View arXiv page View PDF GitHub 31 Add to collection

Community

KevinQHLin

Paper submitter 1 day ago

Meet Soap2Soap — Video-to-Video generation via multi-agent collaboration.

Transform any video into a fully stylized animated version — Pixar, Disney, LEGO, anime, clay, and more — with consistent characters, environments, and cinematic composition preserved across every shot.

KevinQHLin

Paper submitter about 9 hours ago

Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.17423

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.17423 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.17423 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.17423 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

No comments yet. Sign in and be the first to say something.

Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1

Discussion (0)

More from Hugging Face Daily Papers