Hugging Face Daily Papers · · 4 min read

HL-OutPaint: Coarse-to-Fine Video Outpainting for High-Resolution Long-Range Videos

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

We present HL-OutPaint, a coarse-to-fine video outpainting framework designed for high-resolution long-range videos.</p>\n<p>HL-OutPaint introduces Global Coarse Guidance (GCG), a low-resolution representation that captures global structure and dominant motion across long video sequences through a novel global-local frame swapping mechanism. Guided by GCG, the framework generates spatially detailed and temporally consistent outpainted videos while maintaining long-range coherence.</p>\n<p>Project page: <a href=\"https://koyy001.github.io/Publications/hl-outpaint\" rel=\"nofollow\">https://koyy001.github.io/Publications/hl-outpaint</a><br>Video: <a href=\"https://www.youtube.com/watch?v=C-XQCRkbv5E\" rel=\"nofollow\">https://www.youtube.com/watch?v=C-XQCRkbv5E</a><br>ArXiv: <a href=\"https://arxiv.org/abs/2605.17543\" rel=\"nofollow\">https://arxiv.org/abs/2605.17543</a></p>\n","updatedAt":"2026-06-01T14:52:25.898Z","author":{"_id":"65ad0940d63812c33e033124","avatarUrl":"/avatars/5eb495d8bc3ba72645c9a9aa331b0efa.svg","fullname":"Jeongeun Park","name":"koyy001","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7905218601226807},"editors":["koyy001"],"editorAvatarUrls":["/avatars/5eb495d8bc3ba72645c9a9aa331b0efa.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.17543","authors":[{"_id":"6a0ea609164dbbc68a26c6b3","user":{"_id":"65ad0940d63812c33e033124","avatarUrl":"/avatars/5eb495d8bc3ba72645c9a9aa331b0efa.svg","isPro":false,"fullname":"Jeongeun Park","user":"koyy001","type":"user","name":"koyy001"},"name":"Jeongeun Park","status":"claimed_verified","statusLastChangedAt":"2026-06-01T11:51:24.090Z","hidden":false},{"_id":"6a0ea609164dbbc68a26c6b4","name":"Janghyeok Han","hidden":false},{"_id":"6a0ea609164dbbc68a26c6b5","name":"Geonung Kim","hidden":false},{"_id":"6a0ea609164dbbc68a26c6b6","name":"Hyun-Seung Lee","hidden":false},{"_id":"6a0ea609164dbbc68a26c6b7","name":"Kyuha Choi","hidden":false},{"_id":"6a0ea609164dbbc68a26c6b8","name":"Youngseok Han","hidden":false},{"_id":"6a0ea609164dbbc68a26c6b9","name":"Sunghyun Cho","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/65ad0940d63812c33e033124/x3jsLcaO97OiU53VVUyDb.mp4"],"publishedAt":"2026-05-19T00:00:00.000Z","submittedOnDailyAt":"2026-06-01T00:00:00.000Z","title":"HL-OutPaint: Coarse-to-Fine Video Outpainting for High-Resolution Long-Range Videos","submittedOnDailyBy":{"_id":"65ad0940d63812c33e033124","avatarUrl":"/avatars/5eb495d8bc3ba72645c9a9aa331b0efa.svg","isPro":false,"fullname":"Jeongeun Park","user":"koyy001","type":"user","name":"koyy001"},"summary":"Video outpainting generates plausible visual content beyond the original spatial extent of a video, playing a key role in adapting videos to diverse display formats. To support such use cases, it must enable large spatial extrapolation over long sequences. However, most existing methods address only one of these challenges or lack explicit mechanisms for ensuring global spatio-temporal consistency, leading to notable limitations. In this paper, we propose HL-OutPaint, a high-resolution video outpainting framework for long sequences. Our approach follows a coarse-to-fine strategy with a two-stage pipeline. We first construct Global Coarse Guidance (GCG), a low-resolution representation that captures global structure and dominant motion across the video. Unlike naive downsampling, GCG is built via a novel global-local frame swapping mechanism that couples sparse global keyframes with local temporal windows and exchanges information during sampling. This enables GCG to encode both long-term structural consistency and short-term temporal dynamics in a unified representation. Guided by this representation, HL-OutPaint then performs high-resolution outpainting to generate spatially detailed and temporally consistent content. By separating global structure modeling from fine-grained synthesis, our framework achieves stable, coherent generation for large spatial expansion and long video sequences. Extensive experiments show that HL-OutPaint outperforms existing methods in challenging scenarios involving wide spatial extrapolation and long video sequences.","upvotes":2,"discussionId":"6a0ea609164dbbc68a26c6ba","projectPage":"https://koyy001.github.io/Publications/hl-outpaint","ai_summary":"HL-OutPaint is a high-resolution video outpainting framework that uses a coarse-to-fine strategy with global coarse guidance to enable large spatial extrapolation and long sequence generation while maintaining spatio-temporal consistency.","ai_keywords":["video outpainting","coarse-to-fine strategy","global coarse guidance","global-local frame swapping mechanism","spatio-temporal consistency","high-resolution generation","long video sequences","spatial extrapolation"],"organization":{"_id":"6a1d7b2c2c850ce78fcd590a","name":"postech-cglab","fullname":"POSTECH Computer Graphics Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65ad0940d63812c33e033124/v90h7lJrY0VAWqHLkMcxM.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"642cec5dfc341371b030adaa","avatarUrl":"/avatars/13886df0cf19d697d181a0691ac269fc.svg","isPro":false,"fullname":"Janghyeok Han","user":"Janghyeok","type":"user"},{"_id":"65ad0940d63812c33e033124","avatarUrl":"/avatars/5eb495d8bc3ba72645c9a9aa331b0efa.svg","isPro":false,"fullname":"Jeongeun Park","user":"koyy001","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6a1d7b2c2c850ce78fcd590a","name":"postech-cglab","fullname":"POSTECH Computer Graphics Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65ad0940d63812c33e033124/v90h7lJrY0VAWqHLkMcxM.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.17543.md"}">
Papers
arxiv:2605.17543

HL-OutPaint: Coarse-to-Fine Video Outpainting for High-Resolution Long-Range Videos

Published on May 19
· Submitted by
Jeongeun Park
on Jun 1
Authors:
,
,
,
,
,

Abstract

HL-OutPaint is a high-resolution video outpainting framework that uses a coarse-to-fine strategy with global coarse guidance to enable large spatial extrapolation and long sequence generation while maintaining spatio-temporal consistency.

AI-generated summary

Video outpainting generates plausible visual content beyond the original spatial extent of a video, playing a key role in adapting videos to diverse display formats. To support such use cases, it must enable large spatial extrapolation over long sequences. However, most existing methods address only one of these challenges or lack explicit mechanisms for ensuring global spatio-temporal consistency, leading to notable limitations. In this paper, we propose HL-OutPaint, a high-resolution video outpainting framework for long sequences. Our approach follows a coarse-to-fine strategy with a two-stage pipeline. We first construct Global Coarse Guidance (GCG), a low-resolution representation that captures global structure and dominant motion across the video. Unlike naive downsampling, GCG is built via a novel global-local frame swapping mechanism that couples sparse global keyframes with local temporal windows and exchanges information during sampling. This enables GCG to encode both long-term structural consistency and short-term temporal dynamics in a unified representation. Guided by this representation, HL-OutPaint then performs high-resolution outpainting to generate spatially detailed and temporally consistent content. By separating global structure modeling from fine-grained synthesis, our framework achieves stable, coherent generation for large spatial expansion and long video sequences. Extensive experiments show that HL-OutPaint outperforms existing methods in challenging scenarios involving wide spatial extrapolation and long video sequences.

Community

Paper author Paper submitter about 7 hours ago

We present HL-OutPaint, a coarse-to-fine video outpainting framework designed for high-resolution long-range videos.

HL-OutPaint introduces Global Coarse Guidance (GCG), a low-resolution representation that captures global structure and dominant motion across long video sequences through a novel global-local frame swapping mechanism. Guided by GCG, the framework generates spatially detailed and temporally consistent outpainted videos while maintaining long-range coherence.

Project page: https://koyy001.github.io/Publications/hl-outpaint
Video: https://www.youtube.com/watch?v=C-XQCRkbv5E
ArXiv: https://arxiv.org/abs/2605.17543

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.17543
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.17543 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.17543 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.17543 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers