Hugging Face Daily Papers · · 4 min read

FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

<video src=\"https://cdn-uploads.huggingface.co/production/uploads/63818941f496d57325c661c7/e5gICpyOlxpt4IEORTypk.mp4\" controls=\"\" class=\"max-w-full!\"></video></p>","updatedAt":"2026-05-18T03:07:17.905Z","author":{"_id":"63818941f496d57325c661c7","avatarUrl":"/avatars/2fd3f2846041f7273ccd18b4810f52d0.svg","fullname":"shen","name":"DukeShen","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.48262450098991394},"editors":["DukeShen"],"editorAvatarUrls":["/avatars/2fd3f2846041f7273ccd18b4810f52d0.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.15824","authors":[{"_id":"6a0a7aea75184a0d71e0263f","name":"Quanjian Song","hidden":false},{"_id":"6a0a7aea75184a0d71e02640","user":{"_id":"63818941f496d57325c661c7","avatarUrl":"/avatars/2fd3f2846041f7273ccd18b4810f52d0.svg","isPro":false,"fullname":"shen","user":"DukeShen","type":"user","name":"DukeShen"},"name":"Yefeng Shen","status":"claimed_verified","statusLastChangedAt":"2026-05-18T07:46:42.420Z","hidden":false},{"_id":"6a0a7aea75184a0d71e02641","name":"Mengting Chen","hidden":false},{"_id":"6a0a7aea75184a0d71e02642","name":"Hao Sun","hidden":false},{"_id":"6a0a7aea75184a0d71e02643","name":"Jinsong Lan","hidden":false},{"_id":"6a0a7aea75184a0d71e02644","name":"Xiaoyong Zhu","hidden":false},{"_id":"6a0a7aea75184a0d71e02645","name":"Bo Zheng","hidden":false},{"_id":"6a0a7aea75184a0d71e02646","name":"Liujuan Cao","hidden":false}],"publishedAt":"2026-05-15T00:00:00.000Z","submittedOnDailyAt":"2026-05-18T00:00:00.000Z","title":"FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization","submittedOnDailyBy":{"_id":"63818941f496d57325c661c7","avatarUrl":"/avatars/2fd3f2846041f7273ccd18b4810f52d0.svg","isPro":false,"fullname":"shen","user":"DukeShen","type":"user","name":"DukeShen"},"summary":"Human-centric video customization, particularly at the garment level, has shown significant commercial value. However, existing approaches cannot support low-latency and interactive garment control, which is crucial for applications such as e-commerce and content creation. This paper studies how to achieve interactive multi-garment video customization while preserving motion coherence using only single-garment video data. We present FashionChameleon, a real-time and interactive framework for human-garment customization in autoregressive video generation, where users can interactively switch garment during generation. FashionChameleon consists of three key techniques: (i) Instead of training on multi-garment video data, we train a Teacher Model with In-Context Learning on a single reference-garment pair. By retaining the image-to-video training paradigm while enforcing a mismatch between the reference and garment image, the model is encouraged to implicitly preserve coherence during single-garment switching. (ii) To achieve consistency and efficiency during generation, we introduce Streaming Distillation with In-Context Learning, which fine-tunes the model with in-context teacher forcing and improves extrapolation consistency via gradient-reweighted distribution matching distillation. (iii) To extend the model for interactive multi-garment video customization, we propose Training-Free KV Cache Rescheduling, which includes garment KV refresh, historical KV withdraw, and reference KV disentangle to achieve garment switching while preserving motion coherence. Our FashionChameleon uniquely supports interactive customization and consistent long-video extrapolation, while achieving real-time generation at 23.8 FPS on a single GPU, 30-180times faster than existing baselines.","upvotes":52,"discussionId":"6a0a7aea75184a0d71e02647","projectPage":"https://quanjiansong.github.io/projects/FashionChameleon/","githubRepo":"https://github.com/quanjiansong/FashionChameleon","githubRepoAddedBy":"user","ai_summary":"FashionChameleon enables real-time interactive multi-garment video customization through teacher-student distillation and in-context learning techniques while maintaining motion coherence.","ai_keywords":["autoregressive video generation","in-context learning","streaming distillation","gradient-reweighted distribution matching","kv cache rescheduling","garment switching","motion coherence","real-time generation","single-garment video data","teacher-student distillation"],"githubStars":60,"organization":{"_id":"64488b334988ee01f2a8d856","name":"alibaba-inc","fullname":"alibaba-inc","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/61ac8f8a00d01045fca0ad2f/MX4wxQVaFm1A1wqnrL2WU.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63818941f496d57325c661c7","avatarUrl":"/avatars/2fd3f2846041f7273ccd18b4810f52d0.svg","isPro":false,"fullname":"shen","user":"DukeShen","type":"user"},{"_id":"661f9a1b142a51d630dc165d","avatarUrl":"/avatars/b0d5325c4ce7cb632bc22766ebbecdec.svg","isPro":false,"fullname":"whalelin_Lin","user":"whalelin","type":"user"},{"_id":"68639a200b262058aa4b44cc","avatarUrl":"/avatars/b3f94124fcfd0a3d86e9db1da9a2b608.svg","isPro":false,"fullname":"Hao Sun","user":"sunhao242","type":"user"},{"_id":"633407947eb49986ce070a6c","avatarUrl":"/avatars/84245495d36f605a900950a3a76d4386.svg","isPro":false,"fullname":"song yiren","user":"songyiren","type":"user"},{"_id":"68583220f979cc06bbd13072","avatarUrl":"/avatars/f3710e9c45cceb4464eb2e54b86251a9.svg","isPro":false,"fullname":"Stefano Real","user":"stayreal1994","type":"user"},{"_id":"66fbb64725d0b1bf22bf0559","avatarUrl":"/avatars/7a0cffc7d15c755ff23a1452f32aa41d.svg","isPro":false,"fullname":"Cora Combe","user":"cocome","type":"user"},{"_id":"664b72f049350d5628fe623d","avatarUrl":"/avatars/94ad6b9895239259391199a1c610235c.svg","isPro":false,"fullname":"Absolution of Sinner","user":"abslab","type":"user"},{"_id":"66d9e56a6ef2523463e8864d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66d9e56a6ef2523463e8864d/8iEz_3dYogZxaf3M3tq1c.jpeg","isPro":false,"fullname":"AI Coder","user":"Codestral","type":"user"},{"_id":"66b4e706ff493858f554a3bc","avatarUrl":"/avatars/9140dfb293e84c9d6dd3f4025be713e7.svg","isPro":false,"fullname":"自助协力","user":"zumulun","type":"user"},{"_id":"689cc39fa5e276a382cf150e","avatarUrl":"/avatars/cfb3da86c3bea3cf627530792e7b6b65.svg","isPro":false,"fullname":"Xing Yangwei","user":"xing139565","type":"user"},{"_id":"68ccc02e22940ad759036eb1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68ccc02e22940ad759036eb1/YA_DNs9dfUgCLoNJWJZ6Q.png","isPro":false,"fullname":"Xqy","user":"Kytolly","type":"user"},{"_id":"664b82a78e6575f45ed9623e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/664b82a78e6575f45ed9623e/4sB0F1rObsAHHtDlLqpZx.jpeg","isPro":false,"fullname":"Prefecture","user":"DoubleTV","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"64488b334988ee01f2a8d856","name":"alibaba-inc","fullname":"alibaba-inc","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/61ac8f8a00d01045fca0ad2f/MX4wxQVaFm1A1wqnrL2WU.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.15824.md"}">
Papers
arxiv:2605.15824

FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization

Published on May 15
· Submitted by
shen
on May 18
Authors:
,
,
,
,
,
,

Abstract

FashionChameleon enables real-time interactive multi-garment video customization through teacher-student distillation and in-context learning techniques while maintaining motion coherence.

AI-generated summary

Human-centric video customization, particularly at the garment level, has shown significant commercial value. However, existing approaches cannot support low-latency and interactive garment control, which is crucial for applications such as e-commerce and content creation. This paper studies how to achieve interactive multi-garment video customization while preserving motion coherence using only single-garment video data. We present FashionChameleon, a real-time and interactive framework for human-garment customization in autoregressive video generation, where users can interactively switch garment during generation. FashionChameleon consists of three key techniques: (i) Instead of training on multi-garment video data, we train a Teacher Model with In-Context Learning on a single reference-garment pair. By retaining the image-to-video training paradigm while enforcing a mismatch between the reference and garment image, the model is encouraged to implicitly preserve coherence during single-garment switching. (ii) To achieve consistency and efficiency during generation, we introduce Streaming Distillation with In-Context Learning, which fine-tunes the model with in-context teacher forcing and improves extrapolation consistency via gradient-reweighted distribution matching distillation. (iii) To extend the model for interactive multi-garment video customization, we propose Training-Free KV Cache Rescheduling, which includes garment KV refresh, historical KV withdraw, and reference KV disentangle to achieve garment switching while preserving motion coherence. Our FashionChameleon uniquely supports interactive customization and consistent long-video extrapolation, while achieving real-time generation at 23.8 FPS on a single GPU, 30-180times faster than existing baselines.

Community

Paper author Paper submitter about 23 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.15824
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.15824 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.15824 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.15824 in a Space README.md to link it from this page.

Collections including this paper 3

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers