Hey everyone, today we share our newest work:</p>\n<p>Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation</p>\n<p>Current 3D generation methods can create 3D objects from text prompts, but this often behaves like a slot machine. You ask for an object, but you do not know whether it will satisfy the spatial requirements needed for production. For movies, games, animation, or asset design, this is a problem: an object may need to fit a fixed envelope, leave space for motion, or touch a specific surface.</p>\n<p>Arbor addresses this by adding explicit geometry constraints to text-to-3D generation. Users provide constraint meshes that mark:</p>\n<ul>\n<li>HULL regions where geometry should exist</li>\n<li>AVOID regions that should remain empty</li>\n<li>TOUCH regions the object should contact</li>\n</ul>\n<p>The method builds on the TRELLIS family. It keeps the text generator and geometry encoders frozen, turns the constraint meshes into compact geometry tokens, and routes local constraint evidence into the generator.</p>\n","updatedAt":"2026-06-23T12:10:49.629Z","author":{"_id":"631b6dbcbf1351ed2bd05be1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/631b6dbcbf1351ed2bd05be1/_kzcDD4qCifsb-651_LmY.png","fullname":"Jan-Niklas Dihlmann","name":"JDihlmann","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8507729768753052},"editors":["JDihlmann"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/631b6dbcbf1351ed2bd05be1/_kzcDD4qCifsb-651_LmY.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.23514","authors":[{"_id":"6a3a7721fdcd3514343bb848","name":"Jan-Niklas Dihlmann","hidden":false},{"_id":"6a3a7721fdcd3514343bb849","name":"Andreas Engelhardt","hidden":false},{"_id":"6a3a7721fdcd3514343bb84a","name":"Simon Donne","hidden":false},{"_id":"6a3a7721fdcd3514343bb84b","name":"Hendrik P. A. Lensch","hidden":false},{"_id":"6a3a7721fdcd3514343bb84c","name":"Mark Boss","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/631b6dbcbf1351ed2bd05be1/Hq0vL2LwJhXJejeHcpqNR.mp4"],"publishedAt":"2026-06-22T00:00:00.000Z","submittedOnDailyAt":"2026-06-23T00:00:00.000Z","title":"Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation","submittedOnDailyBy":{"_id":"631b6dbcbf1351ed2bd05be1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/631b6dbcbf1351ed2bd05be1/_kzcDD4qCifsb-651_LmY.png","isPro":false,"fullname":"Jan-Niklas Dihlmann","user":"JDihlmann","type":"user","name":"JDihlmann"},"summary":"Text and image conditioned 3D models now generate convincing assets, but they still offer little direct control over the space an object should occupy or avoid. In authoring, this spatial intent is often known before generation starts. A chair should fit a seating envelope, a prop should leave clearance for motion, or a part should expose a contact surface. Prompts and image views are poor carriers for such constraints, requiring the need for an explicit control interface.\n We present Arbor, a trainable attachment for text conditioned latent 3D generation. Arbor introduces constraint meshes as a native 3D control interface. The interface uses hull regions where geometry should exist, avoidance regions that should remain empty, and touch regions the object should contact. Unlike completion or whole object scaffold control, these meshes are not target evidence. They are local typed requirements and can include regions where no surface should appear. Arbor keeps this signal as geometry by converting constraint meshes into tokens and learning a routed attachment inside a frozen denoiser. Each latent region can therefore receive the part of the constraint that matters for its spatial location.\n We evaluate Arbor on automatic and artist curated control benchmarks with hull, avoidance, and touch constraints, and compare the metric trends to a user preference study. Even without dedicated compliance losses, Arbor improves constraint obedience while preserving object quality and variation under fixed constraints.","upvotes":2,"discussionId":"6a3a7721fdcd3514343bb84d","projectPage":"https://arbor.jdihlmann.com/","githubRepo":"https://github.com/Stability-AI/arbor","githubRepoAddedBy":"user","ai_summary":"Arbor enables explicit 3D spatial control in text-conditioned latent generation through constraint meshes that define occupancy, avoidance, and contact regions, maintaining object quality while improving constraint adherence.","ai_keywords":["text conditioned latent 3D generation","constraint meshes","hull regions","avoidance regions","touch regions","denoiser","latent region","constraint obedience","object quality","variation"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":13,"organization":{"_id":"698e3addc6f3f2a77fbee2e1","name":"StabilityLabs","fullname":"Stability Labs ","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6916060941e1323e1090c365/bQKK82igemxCUZLMdL0nZ.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6947b3ec774219e16669aa4b","avatarUrl":"/avatars/ce61ca60e5df9f313f87d11387716303.svg","isPro":false,"fullname":"Anthony Chen","user":"anton-ltx","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"698e3addc6f3f2a77fbee2e1","name":"StabilityLabs","fullname":"Stability Labs ","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6916060941e1323e1090c365/bQKK82igemxCUZLMdL0nZ.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.23514.md","query":{}}">
Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation
Abstract
Arbor enables explicit 3D spatial control in text-conditioned latent generation through constraint meshes that define occupancy, avoidance, and contact regions, maintaining object quality while improving constraint adherence.
Text and image conditioned 3D models now generate convincing assets, but they still offer little direct control over the space an object should occupy or avoid. In authoring, this spatial intent is often known before generation starts. A chair should fit a seating envelope, a prop should leave clearance for motion, or a part should expose a contact surface. Prompts and image views are poor carriers for such constraints, requiring the need for an explicit control interface.
We present Arbor, a trainable attachment for text conditioned latent 3D generation. Arbor introduces constraint meshes as a native 3D control interface. The interface uses hull regions where geometry should exist, avoidance regions that should remain empty, and touch regions the object should contact. Unlike completion or whole object scaffold control, these meshes are not target evidence. They are local typed requirements and can include regions where no surface should appear. Arbor keeps this signal as geometry by converting constraint meshes into tokens and learning a routed attachment inside a frozen denoiser. Each latent region can therefore receive the part of the constraint that matters for its spatial location.
We evaluate Arbor on automatic and artist curated control benchmarks with hull, avoidance, and touch constraints, and compare the metric trends to a user preference study. Even without dedicated compliance losses, Arbor improves constraint obedience while preserving object quality and variation under fixed constraints.
Community
Hey everyone, today we share our newest work:
Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation
Current 3D generation methods can create 3D objects from text prompts, but this often behaves like a slot machine. You ask for an object, but you do not know whether it will satisfy the spatial requirements needed for production. For movies, games, animation, or asset design, this is a problem: an object may need to fit a fixed envelope, leave space for motion, or touch a specific surface.
Arbor addresses this by adding explicit geometry constraints to text-to-3D generation. Users provide constraint meshes that mark:
- HULL regions where geometry should exist
- AVOID regions that should remain empty
- TOUCH regions the object should contact
The method builds on the TRELLIS family. It keeps the text generator and geometry encoders frozen, turns the constraint meshes into compact geometry tokens, and routes local constraint evidence into the generator.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.23514 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.23514 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.23514 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.