Hugging Face Daily Papers · · 6 min read

HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

HorizonStream addresses long-sequence streaming 3D reconstruction under causal and bounded-memory constraints. It models reconstruction as geometric evidence propagation and introduces a long-horizon Transformer with Geometric Linear Attention, Geometric Local Attention, and Metric Readout Tokens.</p>\n<p>Trained on 48-frame clips, HorizonStream generalizes to 10K+ frame streams with constant memory and linear time, achieving stable scale, pose, and geometry over long horizons.</p>\n","updatedAt":"2026-05-26T08:10:51.381Z","author":{"_id":"648e1028c92367eecaad47fe","avatarUrl":"/avatars/9a4ff0ec65d1008a53ba2f67e02679ee.svg","fullname":"cc","name":"NicolasCC","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.8569533228874207},"editors":["NicolasCC"],"editorAvatarUrls":["/avatars/9a4ff0ec65d1008a53ba2f67e02679ee.svg"],"reactions":[],"isReport":false}},{"id":"6a16004a91a9e2d4068824f5","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false},"createdAt":"2026-05-26T20:19:22.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"the core move here is decoupling long-horizon memory from short-range matching by factorizing the evidence influence kernel into two specialized attention modules. the idea of learning channel-wise retention rates in Geometric Linear Attention to bound memory while keeping multi-timescale propagation feels like the right fix for drift on long runs. i'd love to see an ablation where the decay profiles are replaced with fixed priors to test whether per-channel rates really drive stability or if the gain mainly comes from the gating in the short window. btw, the arxivlens breakdown helped me parse the method details and clarifies how the two blocks interact, see https://arxivlens.com/PaperView/Details/horizonstream-long-horizon-attention-for-streaming-3d-reconstruction-8916-eddb76b2","html":"<p>the core move here is decoupling long-horizon memory from short-range matching by factorizing the evidence influence kernel into two specialized attention modules. the idea of learning channel-wise retention rates in Geometric Linear Attention to bound memory while keeping multi-timescale propagation feels like the right fix for drift on long runs. i'd love to see an ablation where the decay profiles are replaced with fixed priors to test whether per-channel rates really drive stability or if the gain mainly comes from the gating in the short window. btw, the arxivlens breakdown helped me parse the method details and clarifies how the two blocks interact, see <a href=\"https://arxivlens.com/PaperView/Details/horizonstream-long-horizon-attention-for-streaming-3d-reconstruction-8916-eddb76b2\" rel=\"nofollow\">https://arxivlens.com/PaperView/Details/horizonstream-long-horizon-attention-for-streaming-3d-reconstruction-8916-eddb76b2</a></p>\n","updatedAt":"2026-05-26T20:19:22.972Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8539311289787292},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.23889","authors":[{"_id":"6a150b01b57a1823d5708aa0","user":{"_id":"648e1028c92367eecaad47fe","avatarUrl":"/avatars/9a4ff0ec65d1008a53ba2f67e02679ee.svg","isPro":false,"fullname":"cc","user":"NicolasCC","type":"user","name":"NicolasCC"},"name":"Chong Cheng","status":"claimed_verified","statusLastChangedAt":"2026-05-26T07:47:24.600Z","hidden":false},{"_id":"6a150b01b57a1823d5708aa1","name":"Peilin Tao","hidden":false},{"_id":"6a150b01b57a1823d5708aa2","name":"Nanjie Yao","hidden":false},{"_id":"6a150b01b57a1823d5708aa3","name":"Guanzhi Ding","hidden":false},{"_id":"6a150b01b57a1823d5708aa4","name":"Xianda Chen","hidden":false},{"_id":"6a150b01b57a1823d5708aa5","name":"Yuansen Du","hidden":false},{"_id":"6a150b01b57a1823d5708aa6","name":"Xiaoyang Guo","hidden":false},{"_id":"6a150b01b57a1823d5708aa7","name":"Wei Yin","hidden":false},{"_id":"6a150b01b57a1823d5708aa8","name":"Weiqiang Ren","hidden":false},{"_id":"6a150b01b57a1823d5708aa9","name":"Qian Zhang","hidden":false},{"_id":"6a150b01b57a1823d5708aaa","name":"Zhengqing Chen","hidden":false},{"_id":"6a150b01b57a1823d5708aab","name":"Hao Wang","hidden":false}],"publishedAt":"2026-05-22T00:00:00.000Z","submittedOnDailyAt":"2026-05-26T00:00:00.000Z","title":"HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction","submittedOnDailyBy":{"_id":"648e1028c92367eecaad47fe","avatarUrl":"/avatars/9a4ff0ec65d1008a53ba2f67e02679ee.svg","isPro":false,"fullname":"cc","user":"NicolasCC","type":"user","name":"NicolasCC"},"summary":"Online 3D reconstruction requires estimating camera pose and scene geometry under strict causal and bounded-memory constraints. Existing methods often suffer from drift, jitter, or collapse on long sequences. We trace these failures to a fundamental mismatch. Streaming geometry is inherently temporally heterogeneous, with evidence ranging from short-lived correspondences to persistent global scale. However, current architectures impose uniform and pathological influence patterns. For example, sliding windows enforce hard cutoffs, while ungated recurrence and causal attention cause cache saturation and spike-like attention sinks. To resolve this, we formalize geometric propagation as an evidence influence kernel and propose HorizonStream, a long-horizon Transformer that explicitly factorizes this kernel. For the long-range temporal factor, Geometric Linear Attention learns channel-wise decay rates to enable bounded, multi-timescale propagation of geometric evidence. For the short-range spatial factor, Geometric Local Attention with Spatiotemporal RoPE performs reliable 3D matching while suppressing attention sinks. Finally, Metric Readout Tokens recover stable scale and rigid pose directly from the persistent geometric state. Extensive experiments show that HorizonStream, trained on only 48-frame clips, generalizes stably to sequences exceeding 10,000\\ frames with constant memory and linear time, achieving state-of-the-art streaming 3D reconstruction performance. Project Page: https://3dagentworld.github.io/horizonstream/","upvotes":1,"discussionId":"6a150b01b57a1823d5708aac","projectPage":"https://3dagentworld.github.io/horizonstream/","githubRepo":"https://github.com/3DAgentWorld/HorizonStream","githubRepoAddedBy":"user","ai_summary":"HorizonStream addresses long-term 3D reconstruction challenges by modeling geometric propagation through an evidence influence kernel, enabling stable, scalable streaming reconstruction with constant memory and linear time complexity.","ai_keywords":["geometric propagation","evidence influence kernel","HorizonStream","long-horizon Transformer","geometric linear attention","channel-wise decay rates","geometric local attention","spatiotemporal RoPE","metric readout tokens","streaming 3D reconstruction","causal attention","cache saturation","attention sinks"],"githubStars":12,"organization":{"_id":"665abecde9121df9e6e43e33","name":"HKUST-GZ2","fullname":"Hong Kong University of Science and Technology(GuangZhou)","avatar":"https://www.gravatar.com/avatar/df3d0f963dc7e28f1d51efe4f494cd32?d=retro&size=100"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"648e1028c92367eecaad47fe","avatarUrl":"/avatars/9a4ff0ec65d1008a53ba2f67e02679ee.svg","isPro":false,"fullname":"cc","user":"NicolasCC","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"665abecde9121df9e6e43e33","name":"HKUST-GZ2","fullname":"Hong Kong University of Science and Technology(GuangZhou)","avatar":"https://www.gravatar.com/avatar/df3d0f963dc7e28f1d51efe4f494cd32?d=retro&size=100"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.23889.md"}">
Papers
arxiv:2605.23889

HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction

Published on May 22
· Submitted by
cc
on May 26
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

HorizonStream addresses long-term 3D reconstruction challenges by modeling geometric propagation through an evidence influence kernel, enabling stable, scalable streaming reconstruction with constant memory and linear time complexity.

AI-generated summary

Online 3D reconstruction requires estimating camera pose and scene geometry under strict causal and bounded-memory constraints. Existing methods often suffer from drift, jitter, or collapse on long sequences. We trace these failures to a fundamental mismatch. Streaming geometry is inherently temporally heterogeneous, with evidence ranging from short-lived correspondences to persistent global scale. However, current architectures impose uniform and pathological influence patterns. For example, sliding windows enforce hard cutoffs, while ungated recurrence and causal attention cause cache saturation and spike-like attention sinks. To resolve this, we formalize geometric propagation as an evidence influence kernel and propose HorizonStream, a long-horizon Transformer that explicitly factorizes this kernel. For the long-range temporal factor, Geometric Linear Attention learns channel-wise decay rates to enable bounded, multi-timescale propagation of geometric evidence. For the short-range spatial factor, Geometric Local Attention with Spatiotemporal RoPE performs reliable 3D matching while suppressing attention sinks. Finally, Metric Readout Tokens recover stable scale and rigid pose directly from the persistent geometric state. Extensive experiments show that HorizonStream, trained on only 48-frame clips, generalizes stably to sequences exceeding 10,000\ frames with constant memory and linear time, achieving state-of-the-art streaming 3D reconstruction performance. Project Page: https://3dagentworld.github.io/horizonstream/

Community

Paper author Paper submitter about 17 hours ago
edited about 17 hours ago

HorizonStream addresses long-sequence streaming 3D reconstruction under causal and bounded-memory constraints. It models reconstruction as geometric evidence propagation and introduces a long-horizon Transformer with Geometric Linear Attention, Geometric Local Attention, and Metric Readout Tokens.

Trained on 48-frame clips, HorizonStream generalizes to 10K+ frame streams with constant memory and linear time, achieving stable scale, pose, and geometry over long horizons.

the core move here is decoupling long-horizon memory from short-range matching by factorizing the evidence influence kernel into two specialized attention modules. the idea of learning channel-wise retention rates in Geometric Linear Attention to bound memory while keeping multi-timescale propagation feels like the right fix for drift on long runs. i'd love to see an ablation where the decay profiles are replaced with fixed priors to test whether per-channel rates really drive stability or if the gain mainly comes from the gating in the short window. btw, the arxivlens breakdown helped me parse the method details and clarifies how the two blocks interact, see https://arxivlens.com/PaperView/Details/horizonstream-long-horizon-attention-for-streaming-3d-reconstruction-8916-eddb76b2

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.23889
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.23889 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.23889 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers