Hugging Face Daily Papers · June 19, 2026 · 5 min read

FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Multi-step LLM pipelines fail through interactions among retrieval, reasoning, and formatting steps, so prompt-only optimization can miss bottlenecks in the chain. We present FAPO (Fully Autonomous Prompt Optimization), a framework that lets Claude Code optimize an LLM pipeline inside a standardized codebase. FAPO evaluates a pipeline, inspects intermediate steps, diagnoses failures, proposes scoped changes, and validates variants repeatedly to optimize against a score function. It first tries prompt edits and, only when prompt optimization appears insufficient, changes chain structure within the permitted scope when attribution identifies a structural bottleneck. Across six benchmarks and three task models, FAPO beats the baseline GEPA in 15 of 18 model-benchmark comparisons. In 11 model-benchmark comparisons, FAPO wins with non-overlapping mean trial-standard-deviation ranges, and the mean FAPO-GEPA gain is +14.1 pp. In the six HoVer and IFBench comparisons where prompt-first search escalated to structural changes, FAPO wins all six with a mean gain of +33.8 pp. FAPO also improves performance on security tasks: on CTIBench-RCM, a security CVE-to-CWE task, prompt-only FAPO lifts test accuracy by +4.0 pp on GPT-5, +7.1 pp on Foundation-Sec-8B-Instruct, and +2.0 pp on Foundation-Sec-8B-Reasoning. These results position FAPO as a state-of-the-art pipeline optimization technique for both general-purpose and security-focused tasks.</p>\n","updatedAt":"2026-06-19T03:05:38.668Z","author":{"_id":"6573a9fe769f3ee9bdf4d9c7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/xC41F7Vp9SVzVHc3cUiRU.jpeg","fullname":"Paul Kassianik","name":"paulkass","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8342642188072205},"editors":["paulkass"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/xC41F7Vp9SVzVHc3cUiRU.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.19605","authors":[{"_id":"6a34b1ca4c5c5e0d69bf1cab","name":"Paul Kassianik","hidden":false},{"_id":"6a34b1ca4c5c5e0d69bf1cac","name":"Baturay Saglam","hidden":false},{"_id":"6a34b1ca4c5c5e0d69bf1cad","name":"Huaibo Zhao","hidden":false},{"_id":"6a34b1ca4c5c5e0d69bf1cae","name":"Blaine Nelson","hidden":false},{"_id":"6a34b1ca4c5c5e0d69bf1caf","name":"Supriti Vijay","hidden":false},{"_id":"6a34b1ca4c5c5e0d69bf1cb0","name":"Aman Priyanshu","hidden":false},{"_id":"6a34b1ca4c5c5e0d69bf1cb1","name":"Amin Karbasi","hidden":false}],"publishedAt":"2026-06-17T00:00:00.000Z","submittedOnDailyAt":"2026-06-19T00:00:00.000Z","title":"FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines","submittedOnDailyBy":{"_id":"6573a9fe769f3ee9bdf4d9c7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/xC41F7Vp9SVzVHc3cUiRU.jpeg","isPro":false,"fullname":"Paul Kassianik","user":"paulkass","type":"user","name":"paulkass"},"summary":"Multi-step LLM pipelines fail through interactions among retrieval, reasoning, and formatting steps, so prompt-only optimization can miss bottlenecks in the chain. We present FAPO (Fully Autonomous Prompt Optimization), a framework that lets Claude Code optimize an LLM pipeline inside a standardized codebase. FAPO evaluates a pipeline, inspects intermediate steps, diagnoses failures, proposes scoped changes, and validates variants repeatedly to optimize against a score function. It first tries prompt edits and, only when prompt optimization appears insufficient, changes chain structure within the permitted scope when attribution identifies a structural bottleneck. Across six benchmarks and three task models, FAPO beats the baseline GEPA in 15 of 18 model-benchmark comparisons. In 11 model-benchmark comparisons, FAPO wins with non-overlapping mean pm trial-standard-deviation ranges, and the mean FAPO-GEPA gain is +14.1 pp. In the six HoVer and IFBench comparisons where prompt-first search escalated to structural changes, FAPO wins all six with a mean gain of +33.8 pp. FAPO also improves performance on security tasks: on CTIBench-RCM, a security CVE-to-CWE task, prompt-only FAPO lifts test accuracy by +4.0 pp on GPT-5, +7.1 pp on Foundation-Sec-8B-Instruct, and +2.0 pp on Foundation-Sec-8B-Reasoning. These results position FAPO as a state-of-the-art pipeline optimization technique for both general-purpose and security-focused tasks.","upvotes":2,"discussionId":"6a34b1ca4c5c5e0d69bf1cb2","githubRepo":"https://github.com/cisco-foundation-ai/fully-automated-prompt-optimization","githubRepoAddedBy":"user","ai_summary":"FAPO optimizes LLM pipelines by combining prompt editing with structural changes, demonstrating superior performance across multiple benchmarks and security tasks.","ai_keywords":["prompt optimization","LLM pipelines","structured prompting","pipeline optimization","prompt-only optimization","structural changes","chain structure","prompt-first search","FAPO","Claude Code","GEPA","CTIBench-RCM","security CVE-to-CWE task"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":8,"organization":{"_id":"67cb6bcf560c3dcbb1a9c8b6","name":"fdtn-ai","fullname":"Cisco Foundation AI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6573a9fe769f3ee9bdf4d9c7/MfBxEGubvNKGKnWcmR_Cu.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6573a9fe769f3ee9bdf4d9c7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/xC41F7Vp9SVzVHc3cUiRU.jpeg","isPro":false,"fullname":"Paul Kassianik","user":"paulkass","type":"user"},{"_id":"66d8512c54209e9101811e8e","avatarUrl":"/avatars/62dfd8e6261108f2508efe678d5a2a57.svg","isPro":false,"fullname":"M Saad Salman","user":"MSS444","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"67cb6bcf560c3dcbb1a9c8b6","name":"fdtn-ai","fullname":"Cisco Foundation AI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6573a9fe769f3ee9bdf4d9c7/MfBxEGubvNKGKnWcmR_Cu.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.19605.md","query":{}}">

Papers

arxiv:2606.19605

FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines

Published on Jun 17

· Submitted by

Paul Kassianik on Jun 19

Cisco Foundation AI

Upvote

Authors:

Abstract

FAPO optimizes LLM pipelines by combining prompt editing with structural changes, demonstrating superior performance across multiple benchmarks and security tasks.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

View arXiv page View PDF GitHub 8 Add to collection

Community

paulkass

Paper submitter about 5 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.19605

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.19605 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.19605 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.19605 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers