TL;DR: We introduce Raster2Seq, an approach that transforms rasterized floorplan images to vectorized format using a labeled polygon sequence representation.<br><a href=\"https://cdn-uploads.huggingface.co/production/uploads/63048220ce6b12280b189006/v4FS4S_AbMP-Dk5CijftY.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/63048220ce6b12280b189006/v4FS4S_AbMP-Dk5CijftY.png\" alt=\"raster2seq_system-Page-7\"></a></p>\n","updatedAt":"2026-05-18T16:48:08.413Z","author":{"_id":"63048220ce6b12280b189006","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/GAqEPs-_4wNI6wfRrxOri.png","fullname":"Hao Phung","name":"haopt","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6363704204559326},"editors":["haopt"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/GAqEPs-_4wNI6wfRrxOri.png"],"reactions":[],"isReport":false}},{"id":"6a0bc0dcfd46e6245407ef21","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false},"createdAt":"2026-05-19T01:46:04.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Unified Vector Floorplan Generation via Markup Representation](https://huggingface.co/papers/2604.04859) (2026)\n* [GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation](https://huggingface.co/papers/2603.26661) (2026)\n* [FAST3DIS: Feed-forward Anchored Scene Transformer for 3D Instance Segmentation](https://huggingface.co/papers/2603.25993) (2026)\n* [Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images](https://huggingface.co/papers/2604.10573) (2026)\n* [Seen2Scene: Completing Realistic 3D Scenes with Visibility-Guided Flow](https://huggingface.co/papers/2603.28548) (2026)\n* [Unposed-to-3D: Learning Simulation-Ready Vehicles from Real-World Images](https://huggingface.co/papers/2604.19257) (2026)\n* [FurnSet: Exploiting Repeats for 3D Scene Reconstruction](https://huggingface.co/papers/2604.20093) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2604.04859\">Unified Vector Floorplan Generation via Markup Representation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2603.26661\">GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2603.25993\">FAST3DIS: Feed-forward Anchored Scene Transformer for 3D Instance Segmentation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.10573\">Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2603.28548\">Seen2Scene: Completing Realistic 3D Scenes with Visibility-Guided Flow</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.19257\">Unposed-to-3D: Learning Simulation-Ready Vehicles from Real-World Images</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.20093\">FurnSet: Exploiting Repeats for 3D Scene Reconstruction</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{"user":"librarian-bot"}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-19T01:46:04.194Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7275229096412659},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.09016","authors":[{"_id":"6a0a5a0875184a0d71e02576","user":{"_id":"63048220ce6b12280b189006","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/GAqEPs-_4wNI6wfRrxOri.png","isPro":false,"fullname":"Hao Phung","user":"haopt","type":"user","name":"haopt"},"name":"Hao Phung","status":"claimed_verified","statusLastChangedAt":"2026-05-18T09:41:05.320Z","hidden":false},{"_id":"6a0a5a0875184a0d71e02577","name":"Hadar Averbuch-Elor","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/63048220ce6b12280b189006/2XDJUdBPbrq-6xHMjdxWZ.png","https://cdn-uploads.huggingface.co/production/uploads/63048220ce6b12280b189006/w27cnjdIeNxrsC4XwLHt5.png","https://cdn-uploads.huggingface.co/production/uploads/63048220ce6b12280b189006/nN2nBqlHaiXM1sHcLcbMr.jpeg","https://cdn-uploads.huggingface.co/production/uploads/63048220ce6b12280b189006/FPgLtuYVDgSus6v7toMzM.png"],"publishedAt":"2026-05-11T00:00:00.000Z","submittedOnDailyAt":"2026-05-18T00:00:00.000Z","title":"Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction","submittedOnDailyBy":{"_id":"63048220ce6b12280b189006","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/GAqEPs-_4wNI6wfRrxOri.png","isPro":false,"fullname":"Hao Phung","user":"haopt","type":"user","name":"haopt"},"summary":"Reconstructing a structured vector-graphics representation from a rasterized floorplan image is typically an important prerequisite for computational tasks involving floorplans such as automated understanding or CAD workflows. However, existing techniques struggle in faithfully generating the structure and semantics conveyed by complex floorplans that depict large indoor spaces with many rooms and a varying numbers of polygon corners. To this end, we propose Raster2Seq, framing floorplan reconstruction as a sequence-to-sequence task in which floorplan elements--such as rooms, windows, and doors--are represented as labeled polygon sequences that jointly encode geometry and semantics. Our approach introduces an autoregressive decoder that learns to predict the next corner conditioned on image features and previously generated corners using guidance from learnable anchors. These anchors represent spatial coordinates in image space, hence allowing for effectively directing the attention mechanism to focus on informative image regions. By embracing the autoregressive mechanism, our method offers flexibility in the output format, enabling for efficiently handling complex floorplans with numerous rooms and diverse polygon structures. Our method achieves state-of-the-art performance on standard benchmarks such as Structure3D, CubiCasa5K, and Raster2Graph, while also demonstrating strong generalization to more challenging datasets like WAFFLE, which contain diverse room structures and complex geometric variations.","upvotes":1,"discussionId":"6a0a5a0875184a0d71e02578","projectPage":"https://cornell-vailab.github.io/Raster2Seq/","githubRepo":"https://github.com/Cornell-VAILab/Raster2Seq","githubRepoAddedBy":"user","ai_summary":"Raster2Seq reconstructs floorplan vector graphics from raster images using sequence-to-sequence modeling with autoregressive decoding guided by learnable anchors for spatial attention.","ai_keywords":["sequence-to-sequence task","autoregressive decoder","learnable anchors","spatial coordinates","attention mechanism","polygon sequences","geometric encoding","semantic encoding"],"githubStars":4,"organization":{"_id":"681dd2e9a61bb228fae1702b","name":"cornell","fullname":"Cornell University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/652303d0974423bd3ef70468/4ZbVAynBI2QThFWmlWE-b.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63048220ce6b12280b189006","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/GAqEPs-_4wNI6wfRrxOri.png","isPro":false,"fullname":"Hao Phung","user":"haopt","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"681dd2e9a61bb228fae1702b","name":"cornell","fullname":"Cornell University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/652303d0974423bd3ef70468/4ZbVAynBI2QThFWmlWE-b.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2602/2602.09016.md"}">
Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction
Abstract
Raster2Seq reconstructs floorplan vector graphics from raster images using sequence-to-sequence modeling with autoregressive decoding guided by learnable anchors for spatial attention.
AI-generated summary
Reconstructing a structured vector-graphics representation from a rasterized floorplan image is typically an important prerequisite for computational tasks involving floorplans such as automated understanding or CAD workflows. However, existing techniques struggle in faithfully generating the structure and semantics conveyed by complex floorplans that depict large indoor spaces with many rooms and a varying numbers of polygon corners. To this end, we propose Raster2Seq, framing floorplan reconstruction as a sequence-to-sequence task in which floorplan elements--such as rooms, windows, and doors--are represented as labeled polygon sequences that jointly encode geometry and semantics. Our approach introduces an autoregressive decoder that learns to predict the next corner conditioned on image features and previously generated corners using guidance from learnable anchors. These anchors represent spatial coordinates in image space, hence allowing for effectively directing the attention mechanism to focus on informative image regions. By embracing the autoregressive mechanism, our method offers flexibility in the output format, enabling for efficiently handling complex floorplans with numerous rooms and diverse polygon structures. Our method achieves state-of-the-art performance on standard benchmarks such as Structure3D, CubiCasa5K, and Raster2Graph, while also demonstrating strong generalization to more challenging datasets like WAFFLE, which contain diverse room structures and complex geometric variations.
Community
TL;DR: We introduce Raster2Seq, an approach that transforms rasterized floorplan images to vectorized format using a labeled polygon sequence representation.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2602.09016 in a model README.md to link it from this page.
Cite arxiv.org/abs/2602.09016 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2602.09016 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.