Hugging Face Daily Papers · May 13, 2026 · 6 min read

Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

<a href=\"https://cdn-uploads.huggingface.co/production/uploads/63d86dbf3130cadcaf8bdd11/Ypg5z5dgF_ETn67zMpupI.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/63d86dbf3130cadcaf8bdd11/Ypg5z5dgF_ETn67zMpupI.png\" alt=\"main_new8\"></a>\nCode at <a href=\"https://github.com/seal-rg/streaming\" rel=\"nofollow\">https://github.com/seal-rg/streaming</a>\n","updatedAt":"2026-05-13T09:45:07.134Z","author":{"_id":"63d86dbf3130cadcaf8bdd11","avatarUrl":"/avatars/29d79a0c6dcec01111ef192fecd0fa7a.svg","fullname":"Jonas Geiping","name":"JonasGeiping","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":37,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6222724914550781},"editors":["JonasGeiping"],"editorAvatarUrls":["/avatars/29d79a0c6dcec01111ef192fecd0fa7a.svg"],"reactions":[],"isReport":false}},{"id":"6a0484a37f70e807af16a0ec","author":{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","fullname":"Urro","name":"urroxyz","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false},"createdAt":"2026-05-13T14:03:15.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"Oh my God!\n\nThis is one of the best papers I've read in the last month. It's novel, useful, and practical.\n\nCoT was a big leap for AI, sure, but ever since it became the standard, I've been looking for ways to escape it, because, despite its utility, it adds so much overhead and weird training semantics.\n\nThe original sin is that CoT made language carry too many roles at once: working memory, algorithm trace, explanation, self-supervision target, debugging surface, inference-time compute, and sometimes user-facing justification. That was useful because it required no architectural change. But it also meant every extra bit of cognition had to be paid for as extra tokens, and every internal computation became entangled with the semantics of natural-language text.\n\nMulti-Stream LLMs is important because it attacks the interface bottleneck directly.\n\nI've been working on a similar project, and am excited to see this kind of research happening.","html":"Oh my God!\nThis is one of the best papers I've read in the last month. It's novel, useful, and practical.\nCoT was a big leap for AI, sure, but ever since it became the standard, I've been looking for ways to escape it, because, despite its utility, it adds so much overhead and weird training semantics.\nThe original sin is that CoT made language carry too many roles at once: working memory, algorithm trace, explanation, self-supervision target, debugging surface, inference-time compute, and sometimes user-facing justification. That was useful because it required no architectural change. But it also meant every extra bit of cognition had to be paid for as extra tokens, and every internal computation became entangled with the semantics of natural-language text.\nMulti-Stream LLMs is important because it attacks the interface bottleneck directly.\nI've been working on a similar project, and am excited to see this kind of research happening.\n","updatedAt":"2026-05-13T14:03:52.456Z","author":{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","fullname":"Urro","name":"urroxyz","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9708133935928345},"editors":["urroxyz"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.12460","authors":[{"_id":"6a0446b286b054ce2fa410ff","name":"Guinan Su","hidden":false},{"_id":"6a0446b286b054ce2fa41100","name":"Yanwu Yang","hidden":false},{"_id":"6a0446b286b054ce2fa41101","name":"Xueyan Li","hidden":false},{"_id":"6a0446b286b054ce2fa41102","name":"Jonas Geiping","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/63d86dbf3130cadcaf8bdd11/q1A_WmtL_K31UeSWmLjho.gif","https://cdn-uploads.huggingface.co/production/uploads/63d86dbf3130cadcaf8bdd11/SDvSRy9TsLWptHwoy2VOr.gif","https://cdn-uploads.huggingface.co/production/uploads/63d86dbf3130cadcaf8bdd11/WU6PBDRpfnaYe_oMh1U2L.gif"],"publishedAt":"2026-05-12T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs","submittedOnDailyBy":{"_id":"63d86dbf3130cadcaf8bdd11","avatarUrl":"/avatars/29d79a0c6dcec01111ef192fecd0fa7a.svg","isPro":false,"fullname":"Jonas Geiping","user":"JonasGeiping","type":"user","name":"JonasGeiping"},"summary":"The continued improvements in language model capability have unlocked their widespread use as drivers of autonomous agents, for example in coding or computer use applications. However, the core of these systems has not changed much since early instruction-tuned models like ChatGPT. Even advanced AI agents function on message exchange formats, successively exchanging messages with users, systems, with itself (i.e. chain-of-thought) and tools in a single stream of computation. This bottleneck to a single stream in chat models leads to a number of limitations: the agent cannot act (generate output) while reading, and in reverse, cannot react to new information while writing. Similarly, the agent cannot act while thinking and cannot think while reading or acting on information.\n In this work, we show that models can be unblocked by switching from instruction-tuning for sequential message formats to instruction-tuning for multiple, parallel streams of computation, splitting each role into a separate stream. Every forward pass of the language model then simultaneously reads from multiple input streams and generates tokens in multiple output streams, all of which causally depend on earlier timesteps. We argue that this data-driven change remedies a number of usability limitations as outlined above, improves model efficiency through parallelization, improves model security through better separation of concerns and can further improve model monitorability.","upvotes":12,"discussionId":"6a0446b386b054ce2fa41103","projectPage":"https://huggingface.co/JonasGeiping/stream-qwen3.5-27b","githubRepo":"https://github.com/seal-rg/streaming","githubRepoAddedBy":"user","ai_summary":"Language models can be enhanced by transitioning from sequential message-based instruction-tuning to parallel stream processing, enabling simultaneous reading and generation across multiple concurrent data flows.","ai_keywords":["instruction-tuning","sequential message formats","parallel streams of computation","forward pass","causal dependencies","model efficiency","separation of concerns","monitorability"],"githubStars":16,"organization":{"_id":"68f2384a5e53d7d6240bd063","name":"ELLIS-Institute-Tuebingen","fullname":"ELLIS Institute Tübingen","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63d86dbf3130cadcaf8bdd11/yXv5xW2lR52xL8s_hKWC6.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63d86dbf3130cadcaf8bdd11","avatarUrl":"/avatars/29d79a0c6dcec01111ef192fecd0fa7a.svg","isPro":false,"fullname":"Jonas Geiping","user":"JonasGeiping","type":"user"},{"_id":"67ae2de52f89d658be916ff0","avatarUrl":"/avatars/55f6bea7b6727f5f3f152cd8659f75f6.svg","isPro":false,"fullname":"David Miller","user":"dymil","type":"user"},{"_id":"658298b3f7111631de567cdc","avatarUrl":"/avatars/7c975ad62fde052c996f886e9998fb3a.svg","isPro":false,"fullname":"Juzheng Zhang","user":"juzhengz","type":"user"},{"_id":"65ba767bcf3af972e0ffb4d0","avatarUrl":"/avatars/2ed443fe94dc8a68b180932fc3b2222c.svg","isPro":false,"fullname":"Monte Hoover","user":"montehoover","type":"user"},{"_id":"6455a8b0072347dde2e3f561","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6455a8b0072347dde2e3f561/zdbTKZGrczg4YNm9l8xqb.jpeg","isPro":true,"fullname":"Kevin David Hayes","user":"KevinDavidHayes","type":"user"},{"_id":"65255f1073a043e50d043641","avatarUrl":"/avatars/257085f01c439d7c84787a4e6d085b3d.svg","isPro":true,"fullname":"Sean McLeish","user":"smcleish","type":"user"},{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","isPro":true,"fullname":"Urro","user":"urroxyz","type":"user"},{"_id":"69fc1ebea5fe2c0acc274676","avatarUrl":"/avatars/44cbe6304e887edd706c7badaf68d743.svg","isPro":false,"fullname":"Benchmaxxer","user":"Benchmax","type":"user"},{"_id":"640d0dbc8036cc2142273a83","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/640d0dbc8036cc2142273a83/cicTWJVqqvQv_DgDucWgY.jpeg","isPro":false,"fullname":"Kaiyu Yue","user":"kaiyuyue","type":"user"},{"_id":"69d4025bcfa09c8bb2de3dfe","avatarUrl":"/avatars/7efe10e3b686ed2956b31f36a0a1e633.svg","isPro":false,"fullname":"Pedro Sandoval-Segura","user":"psandovalseg","type":"user"},{"_id":"630ed6cfc9af0163b96a71aa","avatarUrl":"/avatars/bf984bb1c2f68336bcf042c6fc7e0d9d.svg","isPro":false,"fullname":"Guinan-Su","user":"guinansu","type":"user"},{"_id":"6489c06c08b6a836a6dbd6aa","avatarUrl":"/avatars/6ed1928992fc9d24bad0f12ebd2d84fe.svg","isPro":true,"fullname":"Abhimanyu Hans","user":"ahans1","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68f2384a5e53d7d6240bd063","name":"ELLIS-Institute-Tuebingen","fullname":"ELLIS Institute Tübingen","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63d86dbf3130cadcaf8bdd11/yXv5xW2lR52xL8s_hKWC6.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.12460.md"}">

Papers

arxiv:2605.12460

Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

Published on May 12

· Submitted by

Jonas Geiping on May 13

ELLIS Institute Tübingen

Upvote

Authors:

Abstract

Language models can be enhanced by transitioning from sequential message-based instruction-tuning to parallel stream processing, enabling simultaneous reading and generation across multiple concurrent data flows.

AI-generated summary

The continued improvements in language model capability have unlocked their widespread use as drivers of autonomous agents, for example in coding or computer use applications. However, the core of these systems has not changed much since early instruction-tuned models like ChatGPT. Even advanced AI agents function on message exchange formats, successively exchanging messages with users, systems, with itself (i.e. chain-of-thought) and tools in a single stream of computation. This bottleneck to a single stream in chat models leads to a number of limitations: the agent cannot act (generate output) while reading, and in reverse, cannot react to new information while writing. Similarly, the agent cannot act while thinking and cannot think while reading or acting on information. In this work, we show that models can be unblocked by switching from instruction-tuning for sequential message formats to instruction-tuning for multiple, parallel streams of computation, splitting each role into a separate stream. Every forward pass of the language model then simultaneously reads from multiple input streams and generates tokens in multiple output streams, all of which causally depend on earlier timesteps. We argue that this data-driven change remedies a number of usability limitations as outlined above, improves model efficiency through parallelization, improves model security through better separation of concerns and can further improve model monitorability.

View arXiv page View PDF Project page GitHub 16 Add to collection

Community

JonasGeiping

Paper submitter about 11 hours ago

Code at https://github.com/seal-rg/streaming

urroxyz

about 7 hours ago

•

edited about 7 hours ago

Oh my God!

This is one of the best papers I've read in the last month. It's novel, useful, and practical.

CoT was a big leap for AI, sure, but ever since it became the standard, I've been looking for ways to escape it, because, despite its utility, it adds so much overhead and weird training semantics.

The original sin is that CoT made language carry too many roles at once: working memory, algorithm trace, explanation, self-supervision target, debugging surface, inference-time compute, and sometimes user-facing justification. That was useful because it required no architectural change. But it also meant every extra bit of cognition had to be paid for as extra tokens, and every internal computation became entangled with the semantics of natural-language text.

Multi-Stream LLMs is important because it attacks the interface bottleneck directly.

I've been working on a similar project, and am excited to see this kind of research happening.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.12460

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.12460 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.12460 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.12460 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

No comments yet. Sign in and be the first to say something.

Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1

Discussion (0)

More from Hugging Face Daily Papers