Experience-driven self-evolution is essential for large language model (LLM) agents to improve through interaction with open-world environments. However, existing experience learning methods largely rely on single-agent loops, in which the same agent executes tasks, summarizes outcomes, and decides what should be written into memory. In such settings, agents are prone to the Self-Confirmation Trap, where wrong-but-self-consistent trajectories are mistakenly treated as successful experience, leading to error accumulation through later retrieval and reuse. To address this challenge, we propose EDV, an Execute-Distill-Verify framework for reliable experience learning. In the Execute stage, multiple heterogeneous agents explore the same task space in parallel, generating diverse candidate trajectories. In the Distill stage, a designated third-party distillation agent comparatively analyzes these trajectories and produces candidate experiences, reducing the bias of executor-centric self-summarization. In the Verify stage, the execution group jointly validates candidate experiences through a consensus-based mechanism, and only experiences that pass strict validation are written into shared or private memory. By decoupling execution, distillation, and validation, EDV turns experience learning from an isolated self-reflection loop into a collaborative experience construction process that suppresses erroneous and noisy experi- ence before memory insertion. We evaluate EDV on challenging long-horizon benchmarks, including τ2-bench, Mind2Web, and MMTB. Experimental results show that EDV consis- tently outperforms strong baselines, demonstrating the value of improving the reliability of experience construction for agent self-evolution. These findings suggest that robust agent improvement depends not only on richer memory, but also on how experience is constructed before it enters memory. Our code is available at <a href=\"https://github.com/shidingz/EDV\" rel=\"nofollow\">https://github.com/shidingz/EDV</a>.</p>\n","updatedAt":"2026-06-24T03:47:14.932Z","author":{"_id":"63a17825a0e2f15fea63fccb","avatarUrl":"/avatars/2ddc866e8953387849f22436684eac6b.svg","fullname":"zhushiding","name":"zhushiding","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9160891175270081},"editors":["zhushiding"],"editorAvatarUrls":["/avatars/2ddc866e8953387849f22436684eac6b.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.24428","authors":[{"_id":"6a3b52300a86ac3098d5d74c","name":"Shiding Zhu","hidden":false},{"_id":"6a3b52300a86ac3098d5d74d","name":"Yudi Qi","hidden":false},{"_id":"6a3b52300a86ac3098d5d74e","name":"Yajie Wang","hidden":false},{"_id":"6a3b52300a86ac3098d5d74f","name":"Jiaze Li","hidden":false},{"_id":"6a3b52300a86ac3098d5d750","name":"Chao Song","hidden":false},{"_id":"6a3b52300a86ac3098d5d751","name":"Yaorui Shi","hidden":false},{"_id":"6a3b52300a86ac3098d5d752","name":"Yibo Miao","hidden":false},{"_id":"6a3b52300a86ac3098d5d753","name":"Hanqi Gao","hidden":false},{"_id":"6a3b52300a86ac3098d5d754","name":"Kai Zhang","hidden":false}],"publishedAt":"2026-06-23T00:00:00.000Z","submittedOnDailyAt":"2026-06-24T00:00:00.000Z","title":"Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning","submittedOnDailyBy":{"_id":"63a17825a0e2f15fea63fccb","avatarUrl":"/avatars/2ddc866e8953387849f22436684eac6b.svg","isPro":false,"fullname":"zhushiding","user":"zhushiding","type":"user","name":"zhushiding"},"summary":"Experience-driven self-evolution is critical for large language model (LLM) agents to improve through open-world interaction. However, existing experience learning methods mostly rely on single-agent loops, where the same agent executes tasks, summarizes outcomes, and determines memory content. This setup makes agents vulnerable to the Self-Confirmation Trap: wrong-but-self-consistent trajectories are misidentified as successful experience, leading to cumulative errors during retrieval and reuse. To address this issue, we propose EDV, an Execute-Distill-Verify framework for reliable experience learning. In the Execute stage, multiple heterogeneous agents explore the same task space in parallel to generate diverse candidate trajectories. In the Distill stage, a dedicated third-party agent comparatively analyzes these trajectories to produce candidate experiences, reducing executor-centric summarization bias. In the Verify stage, the execution group validates candidates via a consensus mechanism, and only approved experiences are written into shared or private memory. By decoupling the three stages, EDV transforms experience learning from isolated self-reflection into collaborative construction, filtering erroneous and noisy content before memory insertion. We evaluate EDV on three challenging long-horizon benchmarks: tau2-bench, Mind2Web and MMTB. Results show EDV consistently outperforms strong baselines, validating that reliable experience construction is essential for robust agent self-evolution. Our code is available at https://github.com/shidingz/EDV.","upvotes":3,"discussionId":"6a3b52310a86ac3098d5d755","ai_summary":"EDV is a three-stage framework that uses multiple heterogeneous agents to collaboratively construct reliable experiences for LLM agents, preventing self-confirmatory errors through execute-distill-verify processes.","ai_keywords":["large language model agents","self-confirmatory errors","execute-distill-verify","heterogeneous agents","collaborative construction","experience learning","memory insertion","long-horizon benchmarks","tau2-bench","Mind2Web","MMTB"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"6878f1158a96055b30c4e802","name":"zju-community","fullname":"Zhejiang University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/660c2d134ba2fcc848b03e21/Yom0mtdRlos3bpIk7eLcz.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63a17825a0e2f15fea63fccb","avatarUrl":"/avatars/2ddc866e8953387849f22436684eac6b.svg","isPro":false,"fullname":"zhushiding","user":"zhushiding","type":"user"},{"_id":"66f238f6478b3fc91794567f","avatarUrl":"/avatars/814489b43d8c14a12846aa5298be2c61.svg","isPro":false,"fullname":"Shiding Zhu","user":"shidzhu","type":"user"},{"_id":"659e61c6c53ac2897a308054","avatarUrl":"/avatars/84872611cca8f370bdc616b619309bba.svg","isPro":false,"fullname":"WOlivia","user":"iOlivia","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6878f1158a96055b30c4e802","name":"zju-community","fullname":"Zhejiang University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/660c2d134ba2fcc848b03e21/Yom0mtdRlos3bpIk7eLcz.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.24428.md","query":{}}">
Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning
Abstract
EDV is a three-stage framework that uses multiple heterogeneous agents to collaboratively construct reliable experiences for LLM agents, preventing self-confirmatory errors through execute-distill-verify processes.
Experience-driven self-evolution is critical for large language model (LLM) agents to improve through open-world interaction. However, existing experience learning methods mostly rely on single-agent loops, where the same agent executes tasks, summarizes outcomes, and determines memory content. This setup makes agents vulnerable to the Self-Confirmation Trap: wrong-but-self-consistent trajectories are misidentified as successful experience, leading to cumulative errors during retrieval and reuse. To address this issue, we propose EDV, an Execute-Distill-Verify framework for reliable experience learning. In the Execute stage, multiple heterogeneous agents explore the same task space in parallel to generate diverse candidate trajectories. In the Distill stage, a dedicated third-party agent comparatively analyzes these trajectories to produce candidate experiences, reducing executor-centric summarization bias. In the Verify stage, the execution group validates candidates via a consensus mechanism, and only approved experiences are written into shared or private memory. By decoupling the three stages, EDV transforms experience learning from isolated self-reflection into collaborative construction, filtering erroneous and noisy content before memory insertion. We evaluate EDV on three challenging long-horizon benchmarks: tau2-bench, Mind2Web and MMTB. Results show EDV consistently outperforms strong baselines, validating that reliable experience construction is essential for robust agent self-evolution. Our code is available at https://github.com/shidingz/EDV.
Community
Experience-driven self-evolution is essential for large language model (LLM) agents to improve through interaction with open-world environments. However, existing experience learning methods largely rely on single-agent loops, in which the same agent executes tasks, summarizes outcomes, and decides what should be written into memory. In such settings, agents are prone to the Self-Confirmation Trap, where wrong-but-self-consistent trajectories are mistakenly treated as successful experience, leading to error accumulation through later retrieval and reuse. To address this challenge, we propose EDV, an Execute-Distill-Verify framework for reliable experience learning. In the Execute stage, multiple heterogeneous agents explore the same task space in parallel, generating diverse candidate trajectories. In the Distill stage, a designated third-party distillation agent comparatively analyzes these trajectories and produces candidate experiences, reducing the bias of executor-centric self-summarization. In the Verify stage, the execution group jointly validates candidate experiences through a consensus-based mechanism, and only experiences that pass strict validation are written into shared or private memory. By decoupling execution, distillation, and validation, EDV turns experience learning from an isolated self-reflection loop into a collaborative experience construction process that suppresses erroneous and noisy experi- ence before memory insertion. We evaluate EDV on challenging long-horizon benchmarks, including τ2-bench, Mind2Web, and MMTB. Experimental results show that EDV consis- tently outperforms strong baselines, demonstrating the value of improving the reliability of experience construction for agent self-evolution. These findings suggest that robust agent improvement depends not only on richer memory, but also on how experience is constructed before it enters memory. Our code is available at https://github.com/shidingz/EDV.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.24428 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.24428 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.24428 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.