Streaming reasoning steps between multi-agents makes the pipeline both faster and more accurate, and reveals a new step-level scaling law.</p>\n<p> We warmly welcome feedback, comments, and constructive criticism from the community.</p>\n","updatedAt":"2026-06-04T02:22:49.344Z","author":{"_id":"63f58403fcf95ecac2b33d78","avatarUrl":"/avatars/a77ea80784896502ae1cfa086a78ce66.svg","fullname":"Zhen Yang","name":"YZCS","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9485181570053101},"editors":["YZCS"],"editorAvatarUrls":["/avatars/a77ea80784896502ae1cfa086a78ce66.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.05158","authors":[{"_id":"6a20e01615100c5272a84684","name":"Zhen Yang","hidden":false},{"_id":"6a20e01615100c5272a84685","name":"Xiaogang Xu","hidden":false},{"_id":"6a20e01615100c5272a84686","name":"Wen Wang","hidden":false},{"_id":"6a20e01615100c5272a84687","name":"Cong Chen","hidden":false},{"_id":"6a20e01615100c5272a84688","name":"Xander Xu","hidden":false},{"_id":"6a20e01615100c5272a84689","name":"Ying-Cong Chen","hidden":false}],"publishedAt":"2026-06-03T00:00:00.000Z","submittedOnDailyAt":"2026-06-04T00:00:00.000Z","title":"Streaming Communication in Multi-Agent Reasoning","submittedOnDailyBy":{"_id":"63f58403fcf95ecac2b33d78","avatarUrl":"/avatars/a77ea80784896502ae1cfa086a78ce66.svg","isPro":false,"fullname":"Zhen Yang","user":"YZCS","type":"user","name":"YZCS"},"summary":"Multi-agent reasoning systems adopt a \"generate-then-transfer\" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to downstream agents as soon as it is generated, pipelining adjacent agents and thus reducing latency. Surprisingly, this pipelining also improves effectiveness: because multi-step reasoning quality is non-uniform and early steps are more reliable than later ones, working with these reliable early steps instead of the full chain prevents error-prone late steps from misleading downstream agents. We formalize both advantages with the first closed-form joint analysis of stream, serial, and single protocols, deriving the effectiveness ordering, speedup upper bound, and cost ratio. Across eight reasoning benchmarks spanning mathematics, science, and code, two frontier LLMs (Claude Opus 4.6 and GPT-5.4), and three topologies (Chain, Tree, Graph), StreamMA outperforms both baselines (avg. +7.3 pp, max +22.4 pp on HMMT 2026; Claude Opus 4.6-high). Beyond these contributions, we discover a \"step-level scaling law\": increasing per-agent steps consistently improves both effectiveness and efficiency, a new scaling dimension orthogonal to and composable with agent-count scaling.","upvotes":11,"discussionId":"6a20e01615100c5272a8468a","projectPage":"https://zhenyangcs.github.io/StreamMA-website/","githubRepo":"https://github.com/EnVision-Research/StreamMA","githubRepoAddedBy":"user","ai_summary":"StreamMA enables efficient multi-agent reasoning by streaming intermediate results and leveraging reliable early steps to improve both latency and effectiveness across various reasoning tasks.","ai_keywords":["multi-agent reasoning systems","generate-then-transfer paradigm","end-to-end latency","pipelining","reasoning steps","stream protocol","serial protocol","single protocol","effectiveness ordering","speedup upper bound","cost ratio","reasoning benchmarks","LLMs","topology","Chain","Tree","Graph","step-level scaling law","agent-count scaling"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":2},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63f58403fcf95ecac2b33d78","avatarUrl":"/avatars/a77ea80784896502ae1cfa086a78ce66.svg","isPro":false,"fullname":"Zhen Yang","user":"YZCS","type":"user"},{"_id":"63f089456309c84d5f47f951","avatarUrl":"/avatars/04b926a7f2ad091ee00fef0c59903492.svg","isPro":false,"fullname":"Wen Wang","user":"wwen1997","type":"user"},{"_id":"6842ba3b6e6910066599fa31","avatarUrl":"/avatars/b4c0180ef42a020cf73b93c0be0a63a8.svg","isPro":false,"fullname":"Xu","user":"xander23333","type":"user"},{"_id":"6622f3e1c80be2cc569fb5e1","avatarUrl":"/avatars/f4bacc6e090ec6e6d9f89b279783f1bd.svg","isPro":false,"fullname":"LeyiWu","user":"YUEVII","type":"user"},{"_id":"65214c46f6ceb915cc790275","avatarUrl":"/avatars/c30541fd8ea55d479740f534a49e6248.svg","isPro":false,"fullname":"Yihua Du","user":"Duyh","type":"user"},{"_id":"658d768fc45ea3c5f99c546d","avatarUrl":"/avatars/112af25b75bf5cef6aeda2172f586339.svg","isPro":false,"fullname":"Garland Zhou","user":"garlandchou","type":"user"},{"_id":"66699aa8a33847217b5a49c7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/u8Z-6U8U7ARXOpdBDI7Qm.png","isPro":false,"fullname":"Weijie Wang","user":"lhmd","type":"user"},{"_id":"69bcef984df1e2c004bdeb60","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/5xRaS3znnp-fn3t2FuzhR.png","isPro":false,"fullname":"于 若曦","user":"cyuming","type":"user"},{"_id":"63ebc290d64e6436e2311074","avatarUrl":"/avatars/13f08fbf3736e471e10bfc417377575e.svg","isPro":false,"fullname":"Akide Liu","user":"Akide","type":"user"},{"_id":"64eb2bacc2bcaa4525d14ef1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/kFv9E5CTlzuzbi4hnVShs.jpeg","isPro":false,"fullname":"pythagoras","user":"dingangui","type":"user"},{"_id":"65d5aa45dca2a85f0fe895f3","avatarUrl":"/avatars/a3cbcade6ea101e99f58641aa409fdfe.svg","isPro":false,"fullname":"Guibao SHEN","user":"PaulSHEN1","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0}">
Streaming Communication in Multi-Agent Reasoning
Abstract
StreamMA enables efficient multi-agent reasoning by streaming intermediate results and leveraging reliable early steps to improve both latency and effectiveness across various reasoning tasks.
Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to downstream agents as soon as it is generated, pipelining adjacent agents and thus reducing latency. Surprisingly, this pipelining also improves effectiveness: because multi-step reasoning quality is non-uniform and early steps are more reliable than later ones, working with these reliable early steps instead of the full chain prevents error-prone late steps from misleading downstream agents. We formalize both advantages with the first closed-form joint analysis of stream, serial, and single protocols, deriving the effectiveness ordering, speedup upper bound, and cost ratio. Across eight reasoning benchmarks spanning mathematics, science, and code, two frontier LLMs (Claude Opus 4.6 and GPT-5.4), and three topologies (Chain, Tree, Graph), StreamMA outperforms both baselines (avg. +7.3 pp, max +22.4 pp on HMMT 2026; Claude Opus 4.6-high). Beyond these contributions, we discover a "step-level scaling law": increasing per-agent steps consistently improves both effectiveness and efficiency, a new scaling dimension orthogonal to and composable with agent-count scaling.
Community
Streaming reasoning steps between multi-agents makes the pipeline both faster and more accurate, and reveals a new step-level scaling law.
We warmly welcome feedback, comments, and constructive criticism from the community.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.05158 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.05158 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.05158 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.