We argue that multi-agent test-time evolution is not single-agent evolution replicated N times. A single-agent learner can only evolve its own context and memory. A multi-agent system additionally evolves who collaborates, how they collaborate, and how knowledge flows across the population. These components have no single-agent counterpart and can produce phenomena such as emergent specialization. Yet prior test-time methods either confine experiences to individual agents, forfeiting cross-agent learning, or broadcast symmetrically to all agents, erasing the specialization that makes collaboration valuable. We present EVOCHAMBER, a training-free framework that instantiates test-time evolution at three levels over a coevolving agent pool. At its core is CODREAM (Collaborative Dreaming), a post-task protocol triggered on team failure or disagreement, in which agents collaboratively reflect, distill insights, and route them asymmetrically from strong to weak agents on the failed niche, preserving specialization while filling knowledge gaps. Team-level operators assemble niche-conditioned teams and select collaboration structures online. Population-level lifecycle operators fork, merge, prune, and seed agents under performance pressure. On three heterogeneous task streams with Qwen3-8B, EVOCHAMBER reaches 63.9% on competition math, 75.7% on code, and 87.1% on multi-domain reasoning, outperforming the best baseline by 32% relative on math and confirming asymmetric cross-agent transfer as the primary driver in ablation. Starting from several identically initialized agents, four to five stable niche specialists spontaneously emerge, a structural signature of multi-agent evolution that no single-agent learner can express.</p>\n","updatedAt":"2026-05-13T23:39:11.981Z","author":{"_id":"648d2e2e514bf0ce32ba729f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/648d2e2e514bf0ce32ba729f/VPL1rehLxkvixz5oRD6u_.jpeg","fullname":"Yaolun Zhang","name":"Mercury7353","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9094582796096802},"editors":["Mercury7353"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/648d2e2e514bf0ce32ba729f/VPL1rehLxkvixz5oRD6u_.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.11136","authors":[{"_id":"6a050b58b1a8cbabc9f08625","name":"Yaolun Zhang","hidden":false},{"_id":"6a050b58b1a8cbabc9f08626","name":"Tianyi Xu","hidden":false},{"_id":"6a050b58b1a8cbabc9f08627","name":"Shengyu Dai","hidden":false},{"_id":"6a050b58b1a8cbabc9f08628","name":"Zhenwen Shao","hidden":false},{"_id":"6a050b58b1a8cbabc9f08629","name":"Qingyun Wu","hidden":false},{"_id":"6a050b58b1a8cbabc9f0862a","name":"Huazheng Wang","hidden":false}],"publishedAt":"2026-05-11T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"EVOCHAMBER: Test-Time Co-evolution of Multi-Agent System at Individual, Team, and Population Scales","submittedOnDailyBy":{"_id":"648d2e2e514bf0ce32ba729f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/648d2e2e514bf0ce32ba729f/VPL1rehLxkvixz5oRD6u_.jpeg","isPro":false,"fullname":"Yaolun Zhang","user":"Mercury7353","type":"user","name":"Mercury7353"},"summary":"We argue that multi-agent test-time evolution is not single-agent evolution replicated N times. A single-agent learner can only evolve its own context and memory. A multi-agent system additionally evolves who collaborates, how they collaborate, and how knowledge flows across the population. These components have no single-agent counterpart and can produce phenomena such as emergent specialization. Yet prior test-time methods either confine experiences to individual agents, forfeiting cross-agent learning, or broadcast symmetrically to all agents, erasing the specialization that makes collaboration valuable. We present EVOCHAMBER, a training-free framework that instantiates test-time evolution at three levels over a coevolving agent pool. At its core is CODREAM (Collaborative Dreaming), a post-task protocol triggered on team failure or disagreement, in which agents collaboratively reflect, distill insights, and route them asymmetrically from strong to weak agents on the failed niche, preserving specialization while filling knowledge gaps. Team-level operators assemble niche-conditioned teams and select collaboration structures online. Population-level lifecycle operators fork, merge, prune, and seed agents under performance pressure. On three heterogeneous task streams with Qwen3-8B, EVOCHAMBER reaches 63.9% on competition math, 75.7% on code, and 87.1% on multi-domain reasoning, outperforming the best baseline by 32% relative on math and confirming asymmetric cross-agent transfer as the primary driver in ablation. Starting from several identically initialized agents, four to five stable niche specialists spontaneously emerge, a structural signature of multi-agent evolution that no single-agent learner can express. See our code at: https://github.com/Mercury7353/EvoChamber","upvotes":5,"discussionId":"6a050b59b1a8cbabc9f0862b","githubRepo":"https://github.com/Mercury7353/EvoChamber","githubRepoAddedBy":"user","ai_summary":"Multi-agent test-time evolution framework EVOCHAMBER enables emergent specialization through collaborative reflection and asymmetric knowledge transfer across coevolving agents.","ai_keywords":["multi-agent test-time evolution","single-agent evolution","coevolving agent pool","CODREAM","collaborative dreaming","asymmetric knowledge transfer","niche-conditioned teams","collaboration structures","population-level lifecycle operators","emergent specialization"],"githubStars":1},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"648d2e2e514bf0ce32ba729f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/648d2e2e514bf0ce32ba729f/VPL1rehLxkvixz5oRD6u_.jpeg","isPro":false,"fullname":"Yaolun Zhang","user":"Mercury7353","type":"user"},{"_id":"6245285af59b8d262df3321b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6245285af59b8d262df3321b/dvy__dTf-miJ60IbveDg4.jpeg","isPro":false,"fullname":"Yifan Zeng","user":"yokey","type":"user"},{"_id":"63b6c4adccebeadccc8783b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1672922226272-noauth.jpeg","isPro":false,"fullname":"Jose Efraim Aguilar Escamilla","user":"aguilarjose11","type":"user"},{"_id":"6518a144a28f86d3e9e67c34","avatarUrl":"/avatars/f2aed39e971cffe6c9d0b9c2f7a0df70.svg","isPro":false,"fullname":"Tianyi Xu","user":"tianyi0216","type":"user"},{"_id":"67e617d4470f96a302734e16","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/QHrYmNlTRxKR1KRS50pkf.png","isPro":false,"fullname":"Xuan Ouyang","user":"YoungXuan","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.11136.md"}">
EVOCHAMBER: Test-Time Co-evolution of Multi-Agent System at Individual, Team, and Population Scales
Abstract
Multi-agent test-time evolution framework EVOCHAMBER enables emergent specialization through collaborative reflection and asymmetric knowledge transfer across coevolving agents.
AI-generated summary
We argue that multi-agent test-time evolution is not single-agent evolution replicated N times. A single-agent learner can only evolve its own context and memory. A multi-agent system additionally evolves who collaborates, how they collaborate, and how knowledge flows across the population. These components have no single-agent counterpart and can produce phenomena such as emergent specialization. Yet prior test-time methods either confine experiences to individual agents, forfeiting cross-agent learning, or broadcast symmetrically to all agents, erasing the specialization that makes collaboration valuable. We present EVOCHAMBER, a training-free framework that instantiates test-time evolution at three levels over a coevolving agent pool. At its core is CODREAM (Collaborative Dreaming), a post-task protocol triggered on team failure or disagreement, in which agents collaboratively reflect, distill insights, and route them asymmetrically from strong to weak agents on the failed niche, preserving specialization while filling knowledge gaps. Team-level operators assemble niche-conditioned teams and select collaboration structures online. Population-level lifecycle operators fork, merge, prune, and seed agents under performance pressure. On three heterogeneous task streams with Qwen3-8B, EVOCHAMBER reaches 63.9% on competition math, 75.7% on code, and 87.1% on multi-domain reasoning, outperforming the best baseline by 32% relative on math and confirming asymmetric cross-agent transfer as the primary driver in ablation. Starting from several identically initialized agents, four to five stable niche specialists spontaneously emerge, a structural signature of multi-agent evolution that no single-agent learner can express. See our code at: https://github.com/Mercury7353/EvoChamber
Community
We argue that multi-agent test-time evolution is not single-agent evolution replicated N times. A single-agent learner can only evolve its own context and memory. A multi-agent system additionally evolves who collaborates, how they collaborate, and how knowledge flows across the population. These components have no single-agent counterpart and can produce phenomena such as emergent specialization. Yet prior test-time methods either confine experiences to individual agents, forfeiting cross-agent learning, or broadcast symmetrically to all agents, erasing the specialization that makes collaboration valuable. We present EVOCHAMBER, a training-free framework that instantiates test-time evolution at three levels over a coevolving agent pool. At its core is CODREAM (Collaborative Dreaming), a post-task protocol triggered on team failure or disagreement, in which agents collaboratively reflect, distill insights, and route them asymmetrically from strong to weak agents on the failed niche, preserving specialization while filling knowledge gaps. Team-level operators assemble niche-conditioned teams and select collaboration structures online. Population-level lifecycle operators fork, merge, prune, and seed agents under performance pressure. On three heterogeneous task streams with Qwen3-8B, EVOCHAMBER reaches 63.9% on competition math, 75.7% on code, and 87.1% on multi-domain reasoning, outperforming the best baseline by 32% relative on math and confirming asymmetric cross-agent transfer as the primary driver in ablation. Starting from several identically initialized agents, four to five stable niche specialists spontaneously emerge, a structural signature of multi-agent evolution that no single-agent learner can express.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.11136 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.11136 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.11136 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.