Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignment; existing heterogeneous methods are also restrictive, typically assuming shared input and using transferred caches mainly for steering. We study a more fundamental question: can heterogeneous agents be aligned well enough to perform real \"mind reading\" and transfer both what one agent sees and how it thinks? Our information-structure analysis reveals a duality: context-aware transfer is driven by sparse reasoning signals, while context-unaware transfer, where the receiver sees no input, requires dense contextual knowledge preservation. Motivated by this, we propose dense alignment for heterogeneous KV-cache communication via a lightweight cross-model cache transformation and two-phase training: reconstruction followed by generation. Across all six directions of {Qwen3-4B, 8B, 14B} and six in-domain and out-of-domain benchmarks, our method outperforms prior heterogeneous baselines, matches or exceeds text communication in context-aware settings at roughly 2 to 3 times lower compute, and remains effective in context-unaware transfer where prior methods collapse.</p>\n","updatedAt":"2026-06-12T18:46:31.740Z","author":{"_id":"65f327c5761cd77e9411e303","avatarUrl":"/avatars/2c6c66e54bb2b31923c24929be5e5936.svg","fullname":"Siyi Chen","name":"siyich","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8899080157279968},"editors":["siyich"],"editorAvatarUrls":["/avatars/2c6c66e54bb2b31923c24929be5e5936.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.13594","authors":[{"_id":"6a2c53cea0d4daae4285efe0","name":"Siyi Chen","hidden":false},{"_id":"6a2c53cea0d4daae4285efe1","name":"Xiaoyan Zhang","hidden":false},{"_id":"6a2c53cea0d4daae4285efe2","name":"Meng Wu","hidden":false},{"_id":"6a2c53cea0d4daae4285efe3","name":"Jonathan Tremblay","hidden":false},{"_id":"6a2c53cea0d4daae4285efe4","name":"Valts Blukis","hidden":false},{"_id":"6a2c53cea0d4daae4285efe5","name":"Stan Birchfield","hidden":false},{"_id":"6a2c53cea0d4daae4285efe6","name":"Rene Vidal","hidden":false},{"_id":"6a2c53cea0d4daae4285efe7","name":"Alvaro Velasquez","hidden":false},{"_id":"6a2c53cea0d4daae4285efe8","name":"Sijia Liu","hidden":false},{"_id":"6a2c53cea0d4daae4285efe9","name":"Qing Qu","hidden":false}],"publishedAt":"2026-06-11T00:00:00.000Z","submittedOnDailyAt":"2026-06-12T00:00:00.000Z","title":"See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents","submittedOnDailyBy":{"_id":"65f327c5761cd77e9411e303","avatarUrl":"/avatars/2c6c66e54bb2b31923c24929be5e5936.svg","isPro":false,"fullname":"Siyi Chen","user":"siyich","type":"user","name":"siyich"},"summary":"Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignment; existing heterogeneous methods are also restrictive, typically assuming shared input and using transferred caches mainly for steering. We study a more fundamental question: can heterogeneous agents be aligned well enough to perform real \"mind reading\" and transfer both what one agent sees and how it thinks? Our information-structure analysis reveals a duality: context-aware transfer is driven by sparse reasoning signals, while context-unaware transfer, where the receiver sees no input, requires dense contextual knowledge preservation. Motivated by this, we propose dense alignment for heterogeneous KV-cache communication via a lightweight cross-model cache transformation and two-phase training: reconstruction followed by generation. Across all six directions of {Qwen3-4B, 8B, 14B} and six in-domain and out-of-domain benchmarks, our method outperforms prior heterogeneous baselines, matches or exceeds text communication in context-aware settings at roughly 2 to 3 times lower compute, and remains effective in context-unaware transfer where prior methods collapse.","upvotes":1,"discussionId":"6a2c53cea0d4daae4285efea","projectPage":"https://chicychen.github.io/dense-hetero-latent-mas/","ai_summary":"Heterogeneous multi-agent systems can effectively transfer knowledge through aligned KV-cache communication, achieving better performance than text-based methods with reduced computational costs.","ai_keywords":["KV-cache communication","heterogeneous agents","cross-model latent alignment","dense alignment","cross-model cache transformation","two-phase training","reconstruction","generation","context-aware transfer","context-unaware transfer"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"63df4874e742e86dc925d67c","name":"umich","fullname":"University of Michigan","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1675577443573-63df328115266dd945fc01f4.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65f327c5761cd77e9411e303","avatarUrl":"/avatars/2c6c66e54bb2b31923c24929be5e5936.svg","isPro":false,"fullname":"Siyi Chen","user":"siyich","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"63df4874e742e86dc925d67c","name":"umich","fullname":"University of Michigan","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1675577443573-63df328115266dd945fc01f4.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.13594.md","query":{}}">
See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents
Authors: ,
,
,
,
,
,
,
,
,
Abstract
Heterogeneous multi-agent systems can effectively transfer knowledge through aligned KV-cache communication, achieving better performance than text-based methods with reduced computational costs.
Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignment; existing heterogeneous methods are also restrictive, typically assuming shared input and using transferred caches mainly for steering. We study a more fundamental question: can heterogeneous agents be aligned well enough to perform real "mind reading" and transfer both what one agent sees and how it thinks? Our information-structure analysis reveals a duality: context-aware transfer is driven by sparse reasoning signals, while context-unaware transfer, where the receiver sees no input, requires dense contextual knowledge preservation. Motivated by this, we propose dense alignment for heterogeneous KV-cache communication via a lightweight cross-model cache transformation and two-phase training: reconstruction followed by generation. Across all six directions of {Qwen3-4B, 8B, 14B} and six in-domain and out-of-domain benchmarks, our method outperforms prior heterogeneous baselines, matches or exceeds text communication in context-aware settings at roughly 2 to 3 times lower compute, and remains effective in context-unaware transfer where prior methods collapse.
Community
Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignment; existing heterogeneous methods are also restrictive, typically assuming shared input and using transferred caches mainly for steering. We study a more fundamental question: can heterogeneous agents be aligned well enough to perform real "mind reading" and transfer both what one agent sees and how it thinks? Our information-structure analysis reveals a duality: context-aware transfer is driven by sparse reasoning signals, while context-unaware transfer, where the receiver sees no input, requires dense contextual knowledge preservation. Motivated by this, we propose dense alignment for heterogeneous KV-cache communication via a lightweight cross-model cache transformation and two-phase training: reconstruction followed by generation. Across all six directions of {Qwen3-4B, 8B, 14B} and six in-domain and out-of-domain benchmarks, our method outperforms prior heterogeneous baselines, matches or exceeds text communication in context-aware settings at roughly 2 to 3 times lower compute, and remains effective in context-unaware transfer where prior methods collapse.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.13594 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.13594 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.13594 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.