Hugging Face Daily Papers · June 9, 2026 · 5 min read

Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

\n\t<a id=\"bayesian-vs-frequentist-for-skill-evolving-injecting-a-cumulative-auditable-and-transferable-belief-state\" class=\"block pr-1.5 text-lg md:absolute md:p-1.5 md:opacity-0 md:group-hover:opacity-100 md:right-full\" href=\"#bayesian-vs-frequentist-for-skill-evolving-injecting-a-cumulative-auditable-and-transferable-belief-state\" rel=\"nofollow\">\n\t\t<span class=\"header-link\"><svg class=\"text-gray-500 hover:text-black dark:hover:text-gray-200 w-4\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" aria-hidden=\"true\" role=\"img\" width=\"1em\" height=\"1em\" preserveAspectRatio=\"xMidYMid meet\" viewBox=\"0 0 256 256\"><path d=\"M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z\" fill=\"currentColor\"></path></svg></span>\n\t</a>\n\t<span>\n\t\tBayesian vs. Frequentist for Skill Evolving: Injecting a Cumulative, Auditable, and Transferable Belief State\n\t</span>\n</h1>\n<p>The greatest advantage of the Bayesian approach for skill evolving is that it goes beyond the stateless \"observe failure → patch\" cycle. Instead, it injects a cumulative, auditable, and transferable belief state into the entire process — each skill's reliability is no longer a simple frequency statistic (e.g., 1/1 = 100%), but a full belief distribution with priors, posteriors, and quantified uncertainty. <strong>This allows the agent to remain robust when data is scarce, transfer prior knowledge when the environment changes, and keep every update traceable and explainable</strong> — whereas the frequentist approach remains stuck at the level of \"count-from-zero, point-estimate, memoryless\" patching.</p>\n","updatedAt":"2026-06-09T03:59:12.434Z","author":{"_id":"6352637d0f9bdb641c44e52d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6352637d0f9bdb641c44e52d/mSBRPzcH5pIV68PUmcsHV.png","fullname":"wuxiaojun","name":"wuxiaojun","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8556920289993286},"editors":["wuxiaojun"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6352637d0f9bdb641c44e52d/mSBRPzcH5pIV68PUmcsHV.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.08348","authors":[{"_id":"6a278d7d6dde1c5ef75bcfe2","name":"Xiaojun Wu","hidden":false},{"_id":"6a278d7d6dde1c5ef75bcfe3","name":"Cehao Yang","hidden":false},{"_id":"6a278d7d6dde1c5ef75bcfe4","name":"Honghao Liu","hidden":false},{"_id":"6a278d7d6dde1c5ef75bcfe5","name":"Xueyuan Lin","hidden":false},{"_id":"6a278d7d6dde1c5ef75bcfe6","name":"Wenjie Zhang","hidden":false},{"_id":"6a278d7d6dde1c5ef75bcfe7","name":"Zhichao Shi","hidden":false},{"_id":"6a278d7d6dde1c5ef75bcfe8","name":"Xuhui Jiang","hidden":false},{"_id":"6a278d7d6dde1c5ef75bcfe9","name":"Chengjin Xu","hidden":false},{"_id":"6a278d7d6dde1c5ef75bcfea","name":"Jia Li","hidden":false},{"_id":"6a278d7d6dde1c5ef75bcfeb","name":"Jian Guo","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6352637d0f9bdb641c44e52d/7QuSqu6skkGiheSuSOFlJ.png","https://cdn-uploads.huggingface.co/production/uploads/6352637d0f9bdb641c44e52d/lEsP1dLqxodrecHwe9w8F.png"],"publishedAt":"2026-06-06T00:00:00.000Z","submittedOnDailyAt":"2026-06-09T00:00:00.000Z","title":"Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses","submittedOnDailyBy":{"_id":"6352637d0f9bdb641c44e52d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6352637d0f9bdb641c44e52d/mSBRPzcH5pIV68PUmcsHV.png","isPro":false,"fullname":"wuxiaojun","user":"wuxiaojun","type":"user","name":"wuxiaojun"},"summary":"LLM agents increasingly rely on external inference conditions: prompts, tools, memory, SOPs, skills, and harness feedback. These assets can improve task execution without changing model weights, but they are often revised by heuristic reflection or by reusing observed successes and failures as if counts alone were reliable belief. We introduce Bayesian-Agent, a native and cross-harness framework that treats reusable skills and SOPs as hypotheses about whether a frozen model will succeed under a particular prompt, context, and harness environment. Bayesian-Agent records verified trajectory evidence, maintains a feature-conditioned categorical posterior over each skill, and maps posterior state into inspectable actions such as patch, split, compress, retire, and explore. Model-facing prompts receive executable guardrails and failure-mode patches, while posterior summaries remain available for audit. With deepseek-v4-flash, incremental repair improves SOP-Bench from 80\\% to 95\\%, Lifelong AgentBench from 90\\% to 100\\%, and RealFin-Bench from 45\\% to 65\\%. We further evaluate Bayesian-Agent's native backend and optional GenericAgent, mini-swe-agent, and Claude Code backends. The results include positive, negative, saturated, and case-study settings, suggesting that agent skill evolution is best viewed as posterior-guided harness optimization rather than uncalibrated prompt accumulation. The source code is available at https://github.com/DataArcTech/Bayesian-Agent.","upvotes":11,"discussionId":"6a278d7d6dde1c5ef75bcfec","projectPage":"https://dataarctech.github.io/Bayesian-Agent/","githubRepo":"https://github.com/DataArcTech/Bayesian-Agent","githubRepoAddedBy":"user","ai_summary":"Bayesian-Agent presents a framework that treats reusable skills and SOPs as hypotheses for model success, using Bayesian inference to guide agent behavior and improve task performance through posterior-guided harness optimization.","ai_keywords":["Bayesian-Agent","Bayesian inference","skill hypotheses","posterior distribution","agent skill evolution","harness optimization","SOP-Bench","Lifelong AgentBench","RealFin-Bench","deepseek-v4-flash"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":19,"organization":{"_id":"66b1946b4cc2ff5816f2fe1b","name":"IDEA-FinAI","fullname":"IDEA FinAI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6352637d0f9bdb641c44e52d/dvkdHJotCLSka6LOsFDAJ.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6352637d0f9bdb641c44e52d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6352637d0f9bdb641c44e52d/mSBRPzcH5pIV68PUmcsHV.png","isPro":false,"fullname":"wuxiaojun","user":"wuxiaojun","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"68da4f45ac0dc5ea4fe1d027","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/PU8lLMUk0hh4VcyQmBZmy.jpeg","isPro":false,"fullname":"shawnhf","user":"shawnhf","type":"user"},{"_id":"67cfbd54693eda7faf4ff2a1","avatarUrl":"/avatars/72a167797cb018246a5e5e373eff0d83.svg","isPro":false,"fullname":"Huanyi Su","user":"huanyiidea","type":"user"},{"_id":"68d9fbf0646abb43021afc5a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/_UKGtHuRI4_R0NtNBZ9ho.jpeg","isPro":false,"fullname":"XiaojunWU_PKU","user":"XiaojunWUPKU","type":"user"},{"_id":"68d9fabf0f72f1552353cacb","avatarUrl":"/avatars/675d779f6d3d05d5f4ef0746e7f1089c.svg","isPro":false,"fullname":"7173cn","user":"7173cn","type":"user"},{"_id":"662286a311772517e52f2800","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662286a311772517e52f2800/zasPzcsm_lwwrmYBtZwAy.jpeg","isPro":false,"fullname":"Mars","user":"mars12138","type":"user"},{"_id":"66b56d96a6566c6039b7f1d8","avatarUrl":"/avatars/8f6488984b085be46e3dd456cdd66020.svg","isPro":false,"fullname":"ChengjinXu","user":"WadeXu","type":"user"},{"_id":"6563f5db2c14555119e8e44e","avatarUrl":"/avatars/e13c9a73fb63dca1cf16a3903c3fdd75.svg","isPro":false,"fullname":"123","user":"victcn","type":"user"},{"_id":"6698870b7ba302917b347619","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6698870b7ba302917b347619/pY8IKVxeNZiwQ28AAdmu4.jpeg","isPro":false,"fullname":"Zhichao Shi","user":"Shizc","type":"user"},{"_id":"6432b11e82ca403c44d8475e","avatarUrl":"/avatars/89102b098d0e15e1b3d4c5ad25476fe5.svg","isPro":false,"fullname":"cehao","user":"cehao","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"66b1946b4cc2ff5816f2fe1b","name":"IDEA-FinAI","fullname":"IDEA FinAI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6352637d0f9bdb641c44e52d/dvkdHJotCLSka6LOsFDAJ.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.08348.md"}">

Papers

arxiv:2606.08348

Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses

Published on Jun 6

· Submitted by

wuxiaojun on Jun 9

IDEA FinAI

Upvote

Authors:

Abstract

Bayesian-Agent presents a framework that treats reusable skills and SOPs as hypotheses for model success, using Bayesian inference to guide agent behavior and improve task performance through posterior-guided harness optimization.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

LLM agents increasingly rely on external inference conditions: prompts, tools, memory, SOPs, skills, and harness feedback. These assets can improve task execution without changing model weights, but they are often revised by heuristic reflection or by reusing observed successes and failures as if counts alone were reliable belief. We introduce Bayesian-Agent, a native and cross-harness framework that treats reusable skills and SOPs as hypotheses about whether a frozen model will succeed under a particular prompt, context, and harness environment. Bayesian-Agent records verified trajectory evidence, maintains a feature-conditioned categorical posterior over each skill, and maps posterior state into inspectable actions such as patch, split, compress, retire, and explore. Model-facing prompts receive executable guardrails and failure-mode patches, while posterior summaries remain available for audit. With deepseek-v4-flash, incremental repair improves SOP-Bench from 80\% to 95\%, Lifelong AgentBench from 90\% to 100\%, and RealFin-Bench from 45\% to 65\%. We further evaluate Bayesian-Agent's native backend and optional GenericAgent, mini-swe-agent, and Claude Code backends. The results include positive, negative, saturated, and case-study settings, suggesting that agent skill evolution is best viewed as posterior-guided harness optimization rather than uncalibrated prompt accumulation. The source code is available at https://github.com/DataArcTech/Bayesian-Agent.

View arXiv page View PDF Project page GitHub 19 Add to collection

Community

wuxiaojun

Paper submitter about 4 hours ago

Bayesian vs. Frequentist for Skill Evolving: Injecting a Cumulative, Auditable, and Transferable Belief State

The greatest advantage of the Bayesian approach for skill evolving is that it goes beyond the stateless "observe failure → patch" cycle. Instead, it injects a cumulative, auditable, and transferable belief state into the entire process — each skill's reliability is no longer a simple frequency statistic (e.g., 1/1 = 100%), but a full belief distribution with priors, posteriors, and quantified uncertainty. This allows the agent to remain robust when data is scarce, transfer prior knowledge when the environment changes, and keep every update traceable and explainable — whereas the frequentist approach remains stuck at the level of "count-from-zero, point-estimate, memoryless" patching.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.08348

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.08348 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.08348 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.08348 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

No comments yet. Sign in and be the first to say something.

Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses

Abstract

Community

Bayesian vs. Frequentist for Skill Evolving: Injecting a Cumulative, Auditable, and Transferable Belief State

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1

Discussion (0)

More from Hugging Face Daily Papers