DarkForest is a controlled-communication framework for multi-agent LLM reasoning. Instead of letting agents exchange raw reasoning traces, it keeps agents independent, clusters their candidate answers, estimates a calibrated belief distribution, and only passes policy-permitted evidence to the coordinator.</p>\n<p>The goal is to reduce error propagation while preserving useful diversity. Experiments across six reasoning benchmarks show stronger accuracy and much lower token consumption than communication-heavy multi-agent baselines.</p>\n","updatedAt":"2026-05-27T07:05:41.981Z","author":{"_id":"680f20f5f3cd7c68f689e156","avatarUrl":"/avatars/b572737cbf6b14223770e497dc3ac895.svg","fullname":"dj","name":"dj220001","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8729788064956665},"editors":["dj220001"],"editorAvatarUrls":["/avatars/b572737cbf6b14223770e497dc3ac895.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.25188","authors":[{"_id":"6a16974de9aa3c8e322db5de","name":"Yi Li","hidden":false},{"_id":"6a16974de9aa3c8e322db5df","name":"Songtao Wei","hidden":false},{"_id":"6a16974de9aa3c8e322db5e0","name":"Dongming Jiang","hidden":false},{"_id":"6a16974de9aa3c8e322db5e1","name":"Zhichun Guo","hidden":false},{"_id":"6a16974de9aa3c8e322db5e2","name":"Qiannan Li","hidden":false},{"_id":"6a16974de9aa3c8e322db5e3","name":"Bingzhe Li","hidden":false}],"publishedAt":"2026-05-24T00:00:00.000Z","submittedOnDailyAt":"2026-05-27T00:00:00.000Z","title":"DarkForest: Less Talk, Higher Accuracy for Multi-Agent LLMs","submittedOnDailyBy":{"_id":"680f20f5f3cd7c68f689e156","avatarUrl":"/avatars/b572737cbf6b14223770e497dc3ac895.svg","isPro":false,"fullname":"dj","user":"dj220001","type":"user","name":"dj220001"},"summary":"Multi-agent LLM systems improve reasoning by combining outputs from multiple agents, but interaction-heavy methods can introduce error propagation and high communication overhead. When agents exchange raw responses or reasoning traces, incorrect intermediate reasoning may be adopted and amplified, leading to confident but wrong consensus; multi-round communication also increases token consumption, latency, and inference cost. In this paper, we propose a controlled-communication coordination framework named DarkForest. DarkForest first keeps agents independent, so each agent produces an answer without seeing the others' outputs. It then parses the raw responses into structured candidate records, groups semantically equivalent candidates into clusters, and estimates a calibrated belief distribution over these clusters using agent reliability, confidence, parse quality, support-pattern reliability, and independence corrections. A coordinator receives only policy-permitted evidence from this belief state with controlled communication. Experiments on six reasoning benchmarks show that DarkForest achieves leading overall quality, improves the strongest baseline by up to 30.7\\% on benchmark metrics, and reduces token consumption by up to 6.5times compared with communication-heavy baselines.","upvotes":1,"discussionId":"6a16974de9aa3c8e322db5e4","projectPage":"https://github.com/PearLoveTana/DarkForest_Review","githubRepo":"https://github.com/PearLoveTana/DarkForest_Review","githubRepoAddedBy":"user","ai_summary":"DarkForest is a controlled-communication framework that enhances multi-agent LLM reasoning by clustering semantic candidates and using calibrated belief distributions to reduce error propagation and communication overhead.","ai_keywords":["multi-agent LLM systems","error propagation","communication overhead","reasoning traces","agent reliability","confidence","parse quality","support-pattern reliability","independence corrections","belief distribution","controlled communication"],"githubStars":5},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64763dca71d420a1f6be634e","avatarUrl":"/avatars/e11dc1f3ff7c61883f16e8c04cc0871d.svg","isPro":false,"fullname":"PearLi","user":"PearMath","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.25188.md"}">
DarkForest: Less Talk, Higher Accuracy for Multi-Agent LLMs
Published on May 24
· Submitted by dj on May 27 Abstract
DarkForest is a controlled-communication framework that enhances multi-agent LLM reasoning by clustering semantic candidates and using calibrated belief distributions to reduce error propagation and communication overhead.
AI-generated summary
Multi-agent LLM systems improve reasoning by combining outputs from multiple agents, but interaction-heavy methods can introduce error propagation and high communication overhead. When agents exchange raw responses or reasoning traces, incorrect intermediate reasoning may be adopted and amplified, leading to confident but wrong consensus; multi-round communication also increases token consumption, latency, and inference cost. In this paper, we propose a controlled-communication coordination framework named DarkForest. DarkForest first keeps agents independent, so each agent produces an answer without seeing the others' outputs. It then parses the raw responses into structured candidate records, groups semantically equivalent candidates into clusters, and estimates a calibrated belief distribution over these clusters using agent reliability, confidence, parse quality, support-pattern reliability, and independence corrections. A coordinator receives only policy-permitted evidence from this belief state with controlled communication. Experiments on six reasoning benchmarks show that DarkForest achieves leading overall quality, improves the strongest baseline by up to 30.7\% on benchmark metrics, and reduces token consumption by up to 6.5times compared with communication-heavy baselines.
Community
DarkForest is a controlled-communication framework for multi-agent LLM reasoning. Instead of letting agents exchange raw reasoning traces, it keeps agents independent, clusters their candidate answers, estimates a calibrated belief distribution, and only passes policy-permitted evidence to the coordinator.
The goal is to reduce error propagation while preserving useful diversity. Experiments across six reasoning benchmarks show stronger accuracy and much lower token consumption than communication-heavy multi-agent baselines.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.25188 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.25188 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.25188 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.