Paper: <a href=\"https://arxiv.org/abs/2606.02482\" rel=\"nofollow\">https://arxiv.org/abs/2606.02482</a><br>Code: <a href=\"https://github.com/PeiwenSun2000/X-Stream\" rel=\"nofollow\">https://github.com/PeiwenSun2000/X-Stream</a><br>Data: <a href=\"https://huggingface.co/datasets/spw2000/X-stream\">https://huggingface.co/datasets/spw2000/X-stream</a></p>\n","updatedAt":"2026-06-02T02:40:02.476Z","author":{"_id":"667a518d58120f1b6ac579e8","avatarUrl":"/avatars/3e7d0e3d1e659ec29c0fca3e79df798e.svg","fullname":"Peiwen Sun","name":"spw2000","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.6401005387306213},"editors":["spw2000"],"editorAvatarUrls":["/avatars/3e7d0e3d1e659ec29c0fca3e79df798e.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.02482","authors":[{"_id":"6a1e3ff3808ddbc3c7d43bfc","name":"Peiwen Sun","hidden":false},{"_id":"6a1e3ff3808ddbc3c7d43bfd","name":"Xudong Lu","hidden":false},{"_id":"6a1e3ff3808ddbc3c7d43bfe","name":"Huadai Liu","hidden":false},{"_id":"6a1e3ff3808ddbc3c7d43bff","name":"Yang Bo","hidden":false},{"_id":"6a1e3ff3808ddbc3c7d43c00","name":"Dongming Wu","hidden":false},{"_id":"6a1e3ff3808ddbc3c7d43c01","name":"Huankang Guan","hidden":false},{"_id":"6a1e3ff3808ddbc3c7d43c02","name":"Minghong Cai","hidden":false},{"_id":"6a1e3ff3808ddbc3c7d43c03","name":"Jinpeng Chen","hidden":false},{"_id":"6a1e3ff3808ddbc3c7d43c04","name":"Xintong Guo","hidden":false},{"_id":"6a1e3ff3808ddbc3c7d43c05","name":"Shuhan Li","hidden":false},{"_id":"6a1e3ff3808ddbc3c7d43c06","name":"Rui Liu","hidden":false},{"_id":"6a1e3ff3808ddbc3c7d43c07","name":"Xiangyu Yue","hidden":false}],"publishedAt":"2026-06-01T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding","submittedOnDailyBy":{"_id":"667a518d58120f1b6ac579e8","avatarUrl":"/avatars/3e7d0e3d1e659ec29c0fca3e79df798e.svg","isPro":false,"fullname":"Peiwen Sun","user":"spw2000","type":"user","name":"spw2000"},"summary":"While video streaming understanding has made significant strides, real-world applications, such as live sports broadcasting, autonomous driving, and multi-screen collaboration, inherently demand continuous, multi-stream interactions. However, existing benchmarks are confined to single-stream paradigms, leaving a critical gap in evaluating online, cross-stream reasoning. To bridge this, we introduce X-Stream, the first benchmark dedicated to multi-stream streaming understanding. Comprising 4,220 rigorously curated QA pairs across 932 videos, X-Stream evaluates 11 subtasks across multi-window, multi-view, and multi-device scenarios. Crucially, our dataset is constructed using a novel dual-verification pipeline that prevents over-reliance on a single stream. Furthermore, we pioneer the conceptualization of multi-modal large language models (MLLMs) as naive multiplexers, systematically evaluating their performance through the lens of Signal Multiplexing Theory. Our extensive online inference experiments reveal a stark reality: state-of-the-art MLLMs struggle significantly with concurrent streams, achieving only about 50% score and exhibiting poor proactive ability. Ultimately, X-Stream exposes the trade-off of current multiplexing schemes, providing both a practical evaluation protocol and empirical guidance for next-generation multi-stream agents.","upvotes":19,"discussionId":"6a1e3ff3808ddbc3c7d43c08","projectPage":"https://peiwensun2000.github.io/xstream/","githubRepo":"https://github.com/PeiwenSun2000/X-Stream","githubRepoAddedBy":"user","ai_summary":"X-Stream introduces the first benchmark for multi-stream streaming understanding, revealing significant limitations of current MLLMs in handling concurrent streams.","ai_keywords":["multi-modal large language models","Signal Multiplexing Theory","multi-stream reasoning","concurrent streams","dual-verification pipeline","multi-window","multi-view","multi-device scenarios"],"githubStars":17},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"667a518d58120f1b6ac579e8","avatarUrl":"/avatars/3e7d0e3d1e659ec29c0fca3e79df798e.svg","isPro":false,"fullname":"Peiwen Sun","user":"spw2000","type":"user"},{"_id":"669a02fa61278f96d87902f5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/669a02fa61278f96d87902f5/UsqqKUizH2w0mlBk0z5xK.jpeg","isPro":true,"fullname":"Jinpeng Chen","user":"jinpeng0528","type":"user"},{"_id":"664c555037978db71bb60cc4","avatarUrl":"/avatars/ad2486dd4336fb77f86f2577fda8fb05.svg","isPro":false,"fullname":"Dunyuan XU","user":"JasonXU-1998","type":"user"},{"_id":"65768065b238c76bba24a835","avatarUrl":"/avatars/e1e6f3a627d3a08dc62b3faa652f0aea.svg","isPro":false,"fullname":"Yibo Ma","user":"yabel","type":"user"},{"_id":"642e686bbe01b88c9446db8b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/642e686bbe01b88c9446db8b/tb1DKe5xt50ykOeXiUuTE.jpeg","isPro":false,"fullname":"Lu Xudong","user":"lucky-lance","type":"user"},{"_id":"63d8c0d3da4f72339241c7dd","avatarUrl":"/avatars/c5852fa7d2b8ffb7a76f0143faa453ef.svg","isPro":false,"fullname":"liuhuadai","user":"liuhuadai","type":"user"},{"_id":"69021db7c47395aa55f51502","avatarUrl":"/avatars/951dcc6a80a32fd27a467b47fe943c54.svg","isPro":false,"fullname":"phen","user":"qwdaxc","type":"user"},{"_id":"645f172d7c6bff8577353d1a","avatarUrl":"/avatars/a83682e1343809257b082b78d58c582a.svg","isPro":false,"fullname":"ZhenYE","user":"ZhenYe234","type":"user"},{"_id":"69af7d90164b3dcc95c96cdf","avatarUrl":"/avatars/7fed3d8a2124910bef30fb7df9140422.svg","isPro":false,"fullname":"kak","user":"Kaowai","type":"user"},{"_id":"69a3f3addf4d32a46da0689a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/pARysXHvFD7tN16Fva76d.png","isPro":false,"fullname":"Новиков Наталья","user":"JosephRamirez","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"66c76445ac384b32b9d5cb31","avatarUrl":"/avatars/d499f13b27511a3490545ba8fe68f0f2.svg","isPro":false,"fullname":"wudongming","user":"wudongming","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.02482.md"}">
X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding
Authors: ,
,
,
,
,
,
,
,
,
,
,
Abstract
X-Stream introduces the first benchmark for multi-stream streaming understanding, revealing significant limitations of current MLLMs in handling concurrent streams.
AI-generated summary
While video streaming understanding has made significant strides, real-world applications, such as live sports broadcasting, autonomous driving, and multi-screen collaboration, inherently demand continuous, multi-stream interactions. However, existing benchmarks are confined to single-stream paradigms, leaving a critical gap in evaluating online, cross-stream reasoning. To bridge this, we introduce X-Stream, the first benchmark dedicated to multi-stream streaming understanding. Comprising 4,220 rigorously curated QA pairs across 932 videos, X-Stream evaluates 11 subtasks across multi-window, multi-view, and multi-device scenarios. Crucially, our dataset is constructed using a novel dual-verification pipeline that prevents over-reliance on a single stream. Furthermore, we pioneer the conceptualization of multi-modal large language models (MLLMs) as naive multiplexers, systematically evaluating their performance through the lens of Signal Multiplexing Theory. Our extensive online inference experiments reveal a stark reality: state-of-the-art MLLMs struggle significantly with concurrent streams, achieving only about 50% score and exhibiting poor proactive ability. Ultimately, X-Stream exposes the trade-off of current multiplexing schemes, providing both a practical evaluation protocol and empirical guidance for next-generation multi-stream agents.
Community
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.02482 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.02482 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.