Hugging Face Daily Papers · May 21, 2026 · 3 min read

OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

ICML 2026. We first construct SA-Z, a large-scale dataset enriched with explicit occlusion ordering and pixel-level annotations. Building upon our proposed dataset, we introduce OcclusionFormer, a novel occlusion-aware Diffusion Transformer framework that explicitly models Z-order priority by decoupling instances and compositing them via volume rendering.</p>\n","updatedAt":"2026-05-21T03:50:20.915Z","author":{"_id":"67ff29ecbf6889a333c69c7a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67ff29ecbf6889a333c69c7a/zilMQrxIgUKYvHBVCHaKL.jpeg","fullname":"Henghui Ding","name":"HenghuiDing","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8134542107582092},"editors":["HenghuiDing"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/67ff29ecbf6889a333c69c7a/zilMQrxIgUKYvHBVCHaKL.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.21343","authors":[{"_id":"6a0e808b164dbbc68a26c59c","name":"Ziye Li","hidden":false},{"_id":"6a0e808b164dbbc68a26c59d","name":"Henghui Ding","hidden":false}],"publishedAt":"2026-05-20T00:00:00.000Z","submittedOnDailyAt":"2026-05-21T00:00:00.000Z","title":"OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation","submittedOnDailyBy":{"_id":"67ff29ecbf6889a333c69c7a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67ff29ecbf6889a333c69c7a/zilMQrxIgUKYvHBVCHaKL.jpeg","isPro":false,"fullname":"Henghui Ding","user":"HenghuiDing","type":"user","name":"HenghuiDing"},"summary":"Recent layout-to-image models have achieved remarkable progress in spatial controllability. However, they still struggle with inter-object occlusion. When bounding boxes overlap, most existing methods lack explicit occlusion information, which makes the generation in intersection regions inherently ambiguous and hinders the determination of complex occlusion relationships. As a result, they often produce entangled textures or physically inconsistent layering in the overlapped areas. To address this issue, we first construct SA-Z, a large-scale dataset enriched with explicit occlusion ordering and pixel-level annotations. Building upon our proposed dataset, we introduce OcclusionFormer, a novel occlusion-aware Diffusion Transformer framework that explicitly models Z-order priority by decoupling instances and compositing them via volume rendering. Furthermore, to ensure fine-grained spatial precision, we introduce a queried alignment loss that explicitly supervises individual instances and enhances semantic consistency. The proposed method effectively reduces ambiguity in overlapping regions, enforces correct occlusion dependencies, and preserves structural integrity, leading to substantial accuracy gains across diverse scenes.","upvotes":6,"discussionId":"6a0e808b164dbbc68a26c59e","projectPage":"https://henghuiding.com/OcclusionFormer/","githubRepo":"https://github.com/FudanCVL/OcclusionFormer","githubRepoAddedBy":"user","ai_summary":"OcclusionFormer addresses inter-object occlusion challenges in layout-to-image generation by modeling explicit Z-order priority through diffusion transformers and volume rendering techniques.","ai_keywords":["layout-to-image models","inter-object occlusion","bounding boxes","diffusion transformer","Z-order priority","volume rendering","queried alignment loss","spatial controllability","occlusion-aware","instance decoupling"],"githubStars":14,"organization":{"_id":"68942389bd697013fd0c2df8","name":"FudanCVL","fullname":"FudanCVL","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/67ff29ecbf6889a333c69c7a/w_oRCf4rMPmNy62G-sI9p.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"67ff29ecbf6889a333c69c7a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67ff29ecbf6889a333c69c7a/zilMQrxIgUKYvHBVCHaKL.jpeg","isPro":false,"fullname":"Henghui Ding","user":"HenghuiDing","type":"user"},{"_id":"66fb03d6b505f1a04c39d935","avatarUrl":"/avatars/e9b830c460ec02037758c9b3469bb8ad.svg","isPro":false,"fullname":"Xuanlang Dai","user":"XuanlangDai","type":"user"},{"_id":"66aef8691dd7d0a8c6584724","avatarUrl":"/avatars/df9c2a56f3d0746cf64a330137a105b4.svg","isPro":false,"fullname":"Ziye Li","user":"TribeRinb","type":"user"},{"_id":"687f0efc664c6265a6fa37ee","avatarUrl":"/avatars/493ce89756b350646107b10647b4d599.svg","isPro":false,"fullname":"Kehan Lan","user":"lannn2333","type":"user"},{"_id":"6656e60de50d7c4088186e41","avatarUrl":"/avatars/5a6efb5835ae36762d3b4065538c73bc.svg","isPro":false,"fullname":"Zhaoyan Gong","user":"kakakanina","type":"user"},{"_id":"69a2c4f1816fc48aea554904","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/aURgGWS_ezoY9zI_g_Tcs.png","isPro":false,"fullname":"李嘉豪","user":"evelyn-jones4","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68942389bd697013fd0c2df8","name":"FudanCVL","fullname":"FudanCVL","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/67ff29ecbf6889a333c69c7a/w_oRCf4rMPmNy62G-sI9p.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.21343.md"}">

Papers

arxiv:2605.21343

OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation

Published on May 20

· Submitted by

Henghui Ding on May 21

FudanCVL

Upvote

Authors:

Abstract

OcclusionFormer addresses inter-object occlusion challenges in layout-to-image generation by modeling explicit Z-order priority through diffusion transformers and volume rendering techniques.

AI-generated summary

Recent layout-to-image models have achieved remarkable progress in spatial controllability. However, they still struggle with inter-object occlusion. When bounding boxes overlap, most existing methods lack explicit occlusion information, which makes the generation in intersection regions inherently ambiguous and hinders the determination of complex occlusion relationships. As a result, they often produce entangled textures or physically inconsistent layering in the overlapped areas. To address this issue, we first construct SA-Z, a large-scale dataset enriched with explicit occlusion ordering and pixel-level annotations. Building upon our proposed dataset, we introduce OcclusionFormer, a novel occlusion-aware Diffusion Transformer framework that explicitly models Z-order priority by decoupling instances and compositing them via volume rendering. Furthermore, to ensure fine-grained spatial precision, we introduce a queried alignment loss that explicitly supervises individual instances and enhances semantic consistency. The proposed method effectively reduces ambiguity in overlapping regions, enforces correct occlusion dependencies, and preserves structural integrity, leading to substantial accuracy gains across diverse scenes.

View arXiv page View PDF Project page GitHub 14 Add to collection

Community

HenghuiDing

Paper submitter about 9 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.21343

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.21343 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.21343 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.21343 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers