Hugging Face Daily Papers · May 22, 2026 · 3 min read

SceneAligner: 3D-Grounded Floorplan Localization in the Wild

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Project Page: <a href=\"https://Cornell-VAILab.github.io/SceneAligner\" rel=\"nofollow\">https://Cornell-VAILab.github.io/SceneAligner</a></p>\n","updatedAt":"2026-05-22T03:18:50.273Z","author":{"_id":"69cc24f9a245f2c5f7128866","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/69cc24f9a245f2c5f7128866/ODYV_8ao0qPa67jagmXJw.jpeg","fullname":"Junhyeong Cho","name":"jhcho99","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.3799143433570862},"editors":["jhcho99"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/69cc24f9a245f2c5f7128866/ODYV_8ao0qPa67jagmXJw.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.22581","authors":[{"_id":"6a0fca27a53a61ce2e422d16","name":"Junhyeong Cho","hidden":false},{"_id":"6a0fca27a53a61ce2e422d17","name":"Ruojin Cai","hidden":false},{"_id":"6a0fca27a53a61ce2e422d18","name":"Hadar Averbuch-Elor","hidden":false}],"publishedAt":"2026-05-21T00:00:00.000Z","submittedOnDailyAt":"2026-05-22T00:00:00.000Z","title":"SceneAligner: 3D-Grounded Floorplan Localization in the Wild","submittedOnDailyBy":{"_id":"69cc24f9a245f2c5f7128866","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/69cc24f9a245f2c5f7128866/ODYV_8ao0qPa67jagmXJw.jpeg","isPro":false,"fullname":"Junhyeong Cho","user":"jhcho99","type":"user","name":"jhcho99"},"summary":"Many public buildings provide floorplans with a \"you are here\" indicator to help visitors orient themselves. Floorplan localization seeks to computationally replicate this capability by determining where visual observations were captured within a floorplan. However, existing methods typically assume controlled small-scale environments and precise vectorized floorplans, limiting their ability to operate in large-scale buildings and rasterized floorplans. In this work, we present an approach for performing floorplan localization in the wild by grounding the task in a reconstructed 3D representation of the scene. Given an unconstrained image collection, our method reconstructs a gravity-aligned 3D scene and projects it into a 2D density map that serves as a floorplan proxy. Floorplan localization is then formulated as aligning this proxy with the input floorplan via a 2D similarity transform. To bridge the appearance gap between density maps and architectural floorplans, we adapt a 2D foundation model to learn cross-modal correspondences, introducing a fine-tuning scheme that encourages semantically aligned matches while preserving structural consistency. Extensive experiments demonstrate substantial improvements over prior methods, including in extremely sparse settings with as little as a single input image. Our code and data will be publicly available.","upvotes":2,"discussionId":"6a0fca27a53a61ce2e422d19","projectPage":"https://Cornell-VAILab.github.io/SceneAligner","githubRepo":"https://github.com/Cornell-VAILab/SceneAligner","githubRepoAddedBy":"user","ai_summary":"Deep learning approach for floorplan localization that uses 3D scene reconstruction and cross-modal correspondence learning to work in real-world environments with limited data.","ai_keywords":["3D scene reconstruction","2D similarity transform","2D foundation model","cross-modal correspondences","density map","floorplan localization","gravity-aligned","semantic alignment","structural consistency"],"githubStars":1,"organization":{"_id":"681dd2e9a61bb228fae1702b","name":"cornell","fullname":"Cornell University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/652303d0974423bd3ef70468/4ZbVAynBI2QThFWmlWE-b.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"69cc24f9a245f2c5f7128866","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/69cc24f9a245f2c5f7128866/ODYV_8ao0qPa67jagmXJw.jpeg","isPro":false,"fullname":"Junhyeong Cho","user":"jhcho99","type":"user"},{"_id":"699eb1eb66c089e52b5d8793","avatarUrl":"/avatars/032453bcb0370359bb61fd09ebc566b6.svg","isPro":false,"fullname":"Ethan Nguyen","user":"ethannguyen79","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"681dd2e9a61bb228fae1702b","name":"cornell","fullname":"Cornell University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/652303d0974423bd3ef70468/4ZbVAynBI2QThFWmlWE-b.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.22581.md"}">

Papers

arxiv:2605.22581

SceneAligner: 3D-Grounded Floorplan Localization in the Wild

Published on May 21

· Submitted by

Junhyeong Cho on May 22

Cornell University

Upvote

Authors:

Abstract

Deep learning approach for floorplan localization that uses 3D scene reconstruction and cross-modal correspondence learning to work in real-world environments with limited data.

AI-generated summary

Many public buildings provide floorplans with a "you are here" indicator to help visitors orient themselves. Floorplan localization seeks to computationally replicate this capability by determining where visual observations were captured within a floorplan. However, existing methods typically assume controlled small-scale environments and precise vectorized floorplans, limiting their ability to operate in large-scale buildings and rasterized floorplans. In this work, we present an approach for performing floorplan localization in the wild by grounding the task in a reconstructed 3D representation of the scene. Given an unconstrained image collection, our method reconstructs a gravity-aligned 3D scene and projects it into a 2D density map that serves as a floorplan proxy. Floorplan localization is then formulated as aligning this proxy with the input floorplan via a 2D similarity transform. To bridge the appearance gap between density maps and architectural floorplans, we adapt a 2D foundation model to learn cross-modal correspondences, introducing a fine-tuning scheme that encourages semantically aligned matches while preserving structural consistency. Extensive experiments demonstrate substantial improvements over prior methods, including in extremely sparse settings with as little as a single input image. Our code and data will be publicly available.

View arXiv page View PDF Project page GitHub 1 Add to collection

Community

jhcho99

Paper submitter about 9 hours ago

Project Page: https://Cornell-VAILab.github.io/SceneAligner

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.22581

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.22581 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.22581 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.22581 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

SceneAligner: 3D-Grounded Floorplan Localization in the Wild

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers