Hugging Face Daily Papers · June 25, 2026 · 6 min read

TryOnCrafter: Unleashing Camera Trajectories for Realistic Video Virtual Try-on via a Renderable 4D Try-on Proxy

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

While Video Virtual Try-on (VVT) has achieved remarkable progress in synthesizing realistic garment overlays on dynamic subjects, existing paradigms remains fundamentally constrained by a passive dependency on source camera trajectories, failing to accommodate the requisite interactive freedom for omnidirectional viewpoint exploration. To address this limitation, we define a pioneering research frontier: Camera-controllable Video Virtual Try-on (CaM-VVT). Unlike conventional VVT, CaM-VVT not only necessitates viewpoint-agnostic texture hallucination but also strict structural synchronization between non-rigid human dynamics and background contexts under arbitrary, unconstrained camera movements. To tackle these challenges, we present TryOnCrafter, the first unified DiT-based framework specifically architected for the CaM-VVT task. Departing from implicit pixel-space manipulation, we introduce a Renderable 4D Try-on Proxy that explicitly decouples the human subject from the environment. This is achieved by distilling high-fidelity 2D try-on priors into a clothed 3DGS-based avatar, which is subsequently animated via SMPL-X sequences and metric-aligned into a reconstructed background point cloud. This proxy establishes a robust structural foundation with superior texture density and motion integrity. Our Proxy-Anchored Video DiT leverages this robust structural foundation as a primary geometric anchor, ensuring that the synthesized photorealistic videos are strictly constrained by prescribed trajectories and physically plausible deformations. Benefiting from the inherent editability of the 4D proxy, TryOnCrafter facilitates diverse downstream applications, including human relocalization, ``bullet time'' effects, and 360-degree orbital viewing.</p>\n","updatedAt":"2026-06-25T03:29:51.386Z","author":{"_id":"65519eb532f278f503b3b2c3","avatarUrl":"/avatars/2e180f7b20189cd2d8a75e05c2913c5d.svg","fullname":"lalala","name":"QuanjianSong","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8487952947616577},"editors":["QuanjianSong"],"editorAvatarUrls":["/avatars/2e180f7b20189cd2d8a75e05c2913c5d.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.26092","authors":[{"_id":"6a3c9804f3facdb67e9ff0d5","name":"Hao Sun","hidden":false},{"_id":"6a3c9804f3facdb67e9ff0d6","name":"Hao Yan","hidden":false},{"_id":"6a3c9804f3facdb67e9ff0d7","name":"Mengting Chen","hidden":false},{"_id":"6a3c9804f3facdb67e9ff0d8","name":"Quanjian Song","hidden":false},{"_id":"6a3c9804f3facdb67e9ff0d9","name":"Yu Li","hidden":false},{"_id":"6a3c9804f3facdb67e9ff0da","name":"Juan Cao","hidden":false},{"_id":"6a3c9804f3facdb67e9ff0db","name":"Jinsong Lan","hidden":false},{"_id":"6a3c9804f3facdb67e9ff0dc","name":"Xiaoyong Zhu","hidden":false},{"_id":"6a3c9804f3facdb67e9ff0dd","name":"Bo Zheng","hidden":false},{"_id":"6a3c9804f3facdb67e9ff0de","name":"Sheng Tang","hidden":false}],"publishedAt":"2026-06-24T00:00:00.000Z","submittedOnDailyAt":"2026-06-25T00:00:00.000Z","title":"TryOnCrafter: Unleashing Camera Trajectories for Realistic Video Virtual Try-on via a Renderable 4D Try-on Proxy","submittedOnDailyBy":{"_id":"65519eb532f278f503b3b2c3","avatarUrl":"/avatars/2e180f7b20189cd2d8a75e05c2913c5d.svg","isPro":false,"fullname":"lalala","user":"QuanjianSong","type":"user","name":"QuanjianSong"},"summary":"While Video Virtual Try-on (VVT) has achieved remarkable progress in synthesizing realistic garment overlays on dynamic subjects, existing paradigms remains fundamentally constrained by a passive dependency on source camera trajectories, failing to accommodate the requisite interactive freedom for omnidirectional viewpoint exploration. To address this limitation, we define a pioneering research frontier: Camera-controllable Video Virtual Try-on (CaM-VVT). Unlike conventional VVT, CaM-VVT not only necessitates viewpoint-agnostic texture hallucination but also strict structural synchronization between non-rigid human dynamics and background contexts under arbitrary, unconstrained camera movements. To tackle these challenges, we present TryOnCrafter, the first unified DiT-based framework specifically architected for the CaM-VVT task. Departing from implicit pixel-space manipulation, we introduce a Renderable 4D Try-on Proxy that explicitly decouples the human subject from the environment. This is achieved by distilling high-fidelity 2D try-on priors into a clothed 3DGS-based avatar, which is subsequently animated via SMPL-X sequences and metric-aligned into a reconstructed background point cloud. This proxy establishes a robust structural foundation with superior texture density and motion integrity. Our Proxy-Anchored Video DiT leverages this robust structural foundation as a primary geometric anchor, ensuring that the synthesized photorealistic videos are strictly constrained by prescribed trajectories and physically plausible deformations. Benefiting from the inherent editability of the 4D proxy, TryOnCrafter facilitates diverse downstream applications, including human relocalization, ``bullet time'' effects, and 360-degree orbital viewing.","upvotes":5,"discussionId":"6a3c9805f3facdb67e9ff0df","ai_summary":"Camera-controllable video virtual try-on framework uses a 4D proxy with explicit human-environment decoupling and DiT-based video generation for omnidirectional viewing.","ai_keywords":["Video Virtual Try-on","DiT-based framework","4D proxy","3DGS-based avatar","SMPL-X sequences","background point cloud","photorealistic video synthesis","geometric anchor","structural synchronization","viewpoint-agnostic texture hallucination"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"6a1e77a8a56da3965b2deafe","name":"alibaba-inc","fullname":"Alibaba-Inc","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/651ac1ab28c2633de9600a76/XYWJ3OF5G3cAMf3Q0h3EF.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65519eb532f278f503b3b2c3","avatarUrl":"/avatars/2e180f7b20189cd2d8a75e05c2913c5d.svg","isPro":false,"fullname":"lalala","user":"QuanjianSong","type":"user"},{"_id":"68639a200b262058aa4b44cc","avatarUrl":"/avatars/b3f94124fcfd0a3d86e9db1da9a2b608.svg","isPro":false,"fullname":"Hao Sun","user":"sunhao242","type":"user"},{"_id":"6661ad9bc229e1da8e9b939d","avatarUrl":"/avatars/45d83ebdc1a2fc027309eb28f2584ef2.svg","isPro":false,"fullname":"chen","user":"mathildachen","type":"user"},{"_id":"697cd4971cac2994a79540f8","avatarUrl":"/avatars/61dc9e371c165a8f867eb1ef360f2269.svg","isPro":false,"fullname":"Jay CHOU","user":"zhoujieluno1","type":"user"},{"_id":"68a3eeb57aca8caa6e9ba41c","avatarUrl":"/avatars/692b066b41d99fc14a7c72226ed2cdc7.svg","isPro":false,"fullname":"JasonCocomo","user":"JasonCocomo001","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6a1e77a8a56da3965b2deafe","name":"alibaba-inc","fullname":"Alibaba-Inc","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/651ac1ab28c2633de9600a76/XYWJ3OF5G3cAMf3Q0h3EF.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.26092.md","query":{}}">

Papers

arxiv:2606.26092

TryOnCrafter: Unleashing Camera Trajectories for Realistic Video Virtual Try-on via a Renderable 4D Try-on Proxy

Published on Jun 24

· Submitted by

lalala on Jun 25

Alibaba-Inc

Upvote

Authors:

Abstract

Camera-controllable video virtual try-on framework uses a 4D proxy with explicit human-environment decoupling and DiT-based video generation for omnidirectional viewing.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

View arXiv page View PDF Add to collection

Community

QuanjianSong

Paper submitter about 6 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.26092

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.26092 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.26092 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.26092 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

TryOnCrafter: Unleashing Camera Trajectories for Realistic Video Virtual Try-on via a Renderable 4D Try-on Proxy

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers