Hugging Face Daily Papers · May 25, 2026 · 6 min read

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

<video src=\"https://cdn-uploads.huggingface.co/production/uploads/62d18eb81e36881a57f29bf4/Re-3-XEU4VDFweVgeZuR6.mp4\" controls=\"\" class=\"max-w-full!\"></video> \n","updatedAt":"2026-05-25T04:27:57.713Z","author":{"_id":"62d18eb81e36881a57f29bf4","avatarUrl":"/avatars/104851421b4ee9641daaf15942fa7ea1.svg","fullname":"Yif Yang","name":"Yif29","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.40903735160827637},"editors":["Yif29"],"editorAvatarUrls":["/avatars/104851421b4ee9641daaf15942fa7ea1.svg"],"reactions":[],"isReport":false}},{"id":"6a13dc47b01210cacbe07068","author":{"_id":"62d18eb81e36881a57f29bf4","avatarUrl":"/avatars/104851421b4ee9641daaf15942fa7ea1.svg","fullname":"Yif Yang","name":"Yif29","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false},"createdAt":"2026-05-25T05:21:11.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"🚀 **SkillOpt: Train agent skills like neural networks — without touching model weights.**\n\nWhat if an agent could improve not by finetuning the LLM, but by **self-optimizing its own skill document**? 🧠✨\n\nSkillOpt treats a natural-language skill as the agent’s trainable external state:\n🧪 rollout → 🔍 reflect → ✍️ edit → ✅ validate → 📈 improve\n\nAcross **6 benchmarks, 7 models, and 3 agent harnesses**, SkillOpt achieves **best or tied-best results in 52/52 settings**.\n\nThe key idea is simple but powerful:\nas AI moves from **assistant** to **worker**, the bottleneck is no longer just knowledge — it is **procedural capability**: tool use, intermediate-state inspection, domain conventions, and recovery from failure. 🛠️🤖\n\nWe believe optimized, reusable, and inspectable skills could become a new adaptation layer for future agents.\n\n🌐 Project: https://microsoft.github.io/SkillOpt/\n📄 Paper: https://arxiv.org/pdf/2605.23904\n","html":"🚀 SkillOpt: Train agent skills like neural networks — without touching model weights.\nWhat if an agent could improve not by finetuning the LLM, but by self-optimizing its own skill document? 🧠✨\nSkillOpt treats a natural-language skill as the agent’s trainable external state: 🧪 rollout → 🔍 reflect → ✍️ edit → ✅ validate → 📈 improve\nAcross 6 benchmarks, 7 models, and 3 agent harnesses, SkillOpt achieves best or tied-best results in 52/52 settings.\nThe key idea is simple but powerful: as AI moves from assistant to worker, the bottleneck is no longer just knowledge — it is procedural capability: tool use, intermediate-state inspection, domain conventions, and recovery from failure. 🛠️🤖\nWe believe optimized, reusable, and inspectable skills could become a new adaptation layer for future agents.\n🌐 Project: <a href=\"https://microsoft.github.io/SkillOpt/\" rel=\"nofollow\">https://microsoft.github.io/SkillOpt/</a> 📄 Paper: <a href=\"https://arxiv.org/pdf/2605.23904\" rel=\"nofollow\">https://arxiv.org/pdf/2605.23904</a>\n","updatedAt":"2026-05-25T05:21:11.852Z","author":{"_id":"62d18eb81e36881a57f29bf4","avatarUrl":"/avatars/104851421b4ee9641daaf15942fa7ea1.svg","fullname":"Yif Yang","name":"Yif29","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8393383026123047},"editors":["Yif29"],"editorAvatarUrls":["/avatars/104851421b4ee9641daaf15942fa7ea1.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.23904","authors":[{"_id":"6a13aad74d9e8d8602d201a8","name":"Yifan Yang","hidden":false},{"_id":"6a13aad74d9e8d8602d201a9","name":"Ziyang Gong","hidden":false},{"_id":"6a13aad74d9e8d8602d201aa","name":"Weiquan Huang","hidden":false},{"_id":"6a13aad74d9e8d8602d201ab","name":"Qihao Yang","hidden":false},{"_id":"6a13aad74d9e8d8602d201ac","name":"Ziwei Zhou","hidden":false},{"_id":"6a13aad74d9e8d8602d201ad","name":"Zisu Huang","hidden":false},{"_id":"6a13aad74d9e8d8602d201ae","name":"Yan Li","hidden":false},{"_id":"6a13aad74d9e8d8602d201af","name":"Xuemei Gao","hidden":false},{"_id":"6a13aad74d9e8d8602d201b0","name":"Qi Dai","hidden":false},{"_id":"6a13aad74d9e8d8602d201b1","name":"Bei Liu","hidden":false},{"_id":"6a13aad74d9e8d8602d201b2","name":"Kai Qiu","hidden":false},{"_id":"6a13aad74d9e8d8602d201b3","name":"Yuqing Yang","hidden":false},{"_id":"6a13aad74d9e8d8602d201b4","name":"Dongdong Chen","hidden":false},{"_id":"6a13aad74d9e8d8602d201b5","name":"Xue Yang","hidden":false},{"_id":"6a13aad74d9e8d8602d201b6","name":"Chong Luo","hidden":false}],"publishedAt":"2026-05-22T00:00:00.000Z","submittedOnDailyAt":"2026-05-25T00:00:00.000Z","title":"SkillOpt: Executive Strategy for Self-Evolving Agent Skills","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user","name":"taesiri"},"summary":"Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state of a frozen agent, with the same discipline that makes weight-space optimization reproducible. SkillOpt is, to our knowledge, the first systematic controllable text-space optimizer for agent skills: a separate optimizer model turns scored rollouts into bounded add/delete/replace edits on a single skill document, and an edit is accepted only when it strictly improves a held-out validation score. A textual learning-rate budget, rejected-edit buffer, and epoch-wise slow/meta update make skill training stable while adding zero inference-time model calls at deployment. Across six benchmarks, seven target models, and three execution harnesses (direct chat, Codex, Claude Code), SkillOpt is best or tied on all 52 evaluated (model, benchmark, harness) cells and beats every per-cell competitor among human, one-shot LLM, Trace2Skill, TextGrad, GEPA, and EvoSkill skills. On GPT-5.5 it lifts the average no-skill accuracy by +23.5 points in direct chat, by +24.8 inside the Codex agentic loop, and by +19.1 inside Claude Code. Transfer experiments further show that optimized skill artifacts retain value when moved across model scales, between Codex and Claude Code execution environments, and to a nearby math benchmark without further optimization.","upvotes":96,"discussionId":"6a13aad74d9e8d8602d201b7","projectPage":"https://microsoft.github.io/SkillOpt/","githubRepo":"https://github.com/microsoft/SkillOpt","githubRepoAddedBy":"user","ai_summary":"SkillOpt introduces a systematic text-space optimizer for agent skills that trains skills as external agent state with stable updates and zero deployment inference overhead, achieving superior performance across multiple benchmarks and execution environments.","ai_keywords":["agent skills","skill training","text-space optimizer","rollouts","add/delete/replace edits","validation score","textual learning-rate budget","rejected-edit buffer","epoch-wise slow/meta update","skill optimization","transfer experiments","agent state","reproducible optimization"],"githubStars":8,"organization":{"_id":"68151d0f51add3813f3f7d1b","name":"MicrosoftResearch","fullname":"Microsoft Research","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6529a4f2f1205983224fa513/PeuVr7jSuJflmDBBGxoDX.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62d18eb81e36881a57f29bf4","avatarUrl":"/avatars/104851421b4ee9641daaf15942fa7ea1.svg","isPro":false,"fullname":"Yif Yang","user":"Yif29","type":"user"},{"_id":"6911effd4cdaf097dd0573d2","avatarUrl":"/avatars/e628c2bc32a32216015e95c4eb422f87.svg","isPro":false,"fullname":"Ningjing Liu","user":"Woodserenity","type":"user"},{"_id":"668ff5ba3202b01a43992843","avatarUrl":"/avatars/aad679f6d01af42930289780f45b91bf.svg","isPro":false,"fullname":"hhh","user":"yeahhhh326","type":"user"},{"_id":"69deeaca5813014ed7221ba3","avatarUrl":"/avatars/cdcdf9c0d6351a3cc449399633a30241.svg","isPro":false,"fullname":"Yi-Chao Chen","user":"yichao0319","type":"user"},{"_id":"6a13c5a7980b93ff9493da5c","avatarUrl":"/avatars/383903dc7f449357e7e0853e65a5a34e.svg","isPro":false,"fullname":"Yida Wang","user":"dozenw","type":"user"},{"_id":"66ed24843b250e9eca41ee10","avatarUrl":"/avatars/209c9f884832c9e345e2fb209bda347a.svg","isPro":false,"fullname":"JIAHAO","user":"YEppppppppp","type":"user"},{"_id":"6a13c9ccfbc21b4d8b2811f7","avatarUrl":"/avatars/d9bdf8692c00878b9269644861fb1373.svg","isPro":false,"fullname":"Dian Ding","user":"Dian0102","type":"user"},{"_id":"66b6f34422ca889082f2caf4","avatarUrl":"/avatars/ca8f39f85a9be50f136856807717bb02.svg","isPro":false,"fullname":"XIN GAO","user":"Sanjin2024","type":"user"},{"_id":"662852bf8bfc90408a97acdf","avatarUrl":"/avatars/008e6faf51e1be4f0c0d17008dfe2f49.svg","isPro":false,"fullname":"cao","user":"tingcao","type":"user"},{"_id":"642653df22bddcea3d284b4d","avatarUrl":"/avatars/2a328c8f1fe5dc30bef59b6bd6dc4a79.svg","isPro":false,"fullname":"Hao Wu","user":"HakoWu","type":"user"},{"_id":"662b6a8f0b7f23f3c000559e","avatarUrl":"/avatars/0a5b4e09ac9a8e40342131319ff32b29.svg","isPro":false,"fullname":"Zisu Huang","user":"zisuh","type":"user"},{"_id":"686f8417326edb5fba4598ef","avatarUrl":"/avatars/900b6e006788a739d318febf9879c5a0.svg","isPro":false,"fullname":"Kejun Gao","user":"Arran1025","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":1,"organization":{"_id":"68151d0f51add3813f3f7d1b","name":"MicrosoftResearch","fullname":"Microsoft Research","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6529a4f2f1205983224fa513/PeuVr7jSuJflmDBBGxoDX.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.23904.md"}">

Papers

arxiv:2605.23904

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Published on May 22

· Submitted by

taesiri on May 25

#1 Paper of the day

Microsoft Research

Upvote

Authors:

Abstract

SkillOpt introduces a systematic text-space optimizer for agent skills that trains skills as external agent state with stable updates and zero deployment inference overhead, achieving superior performance across multiple benchmarks and execution environments.

AI-generated summary

Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state of a frozen agent, with the same discipline that makes weight-space optimization reproducible. SkillOpt is, to our knowledge, the first systematic controllable text-space optimizer for agent skills: a separate optimizer model turns scored rollouts into bounded add/delete/replace edits on a single skill document, and an edit is accepted only when it strictly improves a held-out validation score. A textual learning-rate budget, rejected-edit buffer, and epoch-wise slow/meta update make skill training stable while adding zero inference-time model calls at deployment. Across six benchmarks, seven target models, and three execution harnesses (direct chat, Codex, Claude Code), SkillOpt is best or tied on all 52 evaluated (model, benchmark, harness) cells and beats every per-cell competitor among human, one-shot LLM, Trace2Skill, TextGrad, GEPA, and EvoSkill skills. On GPT-5.5 it lifts the average no-skill accuracy by +23.5 points in direct chat, by +24.8 inside the Codex agentic loop, and by +19.1 inside Claude Code. Transfer experiments further show that optimized skill artifacts retain value when moved across model scales, between Codex and Claude Code execution environments, and to a nearby math benchmark without further optimization.

View arXiv page View PDF Project page GitHub 8 Add to collection

Community

Yif29

about 7 hours ago

Yif29

about 6 hours ago

🚀 SkillOpt: Train agent skills like neural networks — without touching model weights.

What if an agent could improve not by finetuning the LLM, but by self-optimizing its own skill document? 🧠✨

SkillOpt treats a natural-language skill as the agent’s trainable external state:
🧪 rollout → 🔍 reflect → ✍️ edit → ✅ validate → 📈 improve

Across 6 benchmarks, 7 models, and 3 agent harnesses, SkillOpt achieves best or tied-best results in 52/52 settings.

The key idea is simple but powerful:
as AI moves from assistant to worker, the bottleneck is no longer just knowledge — it is procedural capability: tool use, intermediate-state inspection, domain conventions, and recovery from failure. 🛠️🤖

We believe optimized, reusable, and inspectable skills could become a new adaptation layer for future agents.

🌐 Project: https://microsoft.github.io/SkillOpt/
📄 Paper: https://arxiv.org/pdf/2605.23904

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.23904

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.23904 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.23904 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.23904 in a Space README.md to link it from this page.

Collections including this paper 4

Discussion (0)

No comments yet. Sign in and be the first to say something.

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 4

Discussion (0)

More from Hugging Face Daily Papers