FlowCompile is an optimizing compiler for structured LLM workflows. Given a workflow graph, a validation/profile set, and a design space over sub-agent models, reasoning budgets, and optional workflow structure choices, FlowCompile performs compile-time design-space exploration and emits a reusable set of workflow-level configurations spanning accuracy-latency trade-offs.</p>\n","updatedAt":"2026-05-14T21:42:10.346Z","author":{"_id":"62d09eb86a61a88ea0d83918","avatarUrl":"/avatars/81b511d94cced304ffca058caff662d4.svg","fullname":"Junyan Li","name":"senfu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7661470174789429},"editors":["senfu"],"editorAvatarUrls":["/avatars/81b511d94cced304ffca058caff662d4.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.13647","authors":[{"_id":"6a064183b1a8cbabc9f09717","name":"Junyan Li","hidden":false},{"_id":"6a064183b1a8cbabc9f09718","name":"Zhang-Wei Hong","hidden":false},{"_id":"6a064183b1a8cbabc9f09719","name":"Maohao Shen","hidden":false},{"_id":"6a064183b1a8cbabc9f0971a","name":"Yang Zhang","hidden":false},{"_id":"6a064183b1a8cbabc9f0971b","name":"Chuang Gan","hidden":false}],"publishedAt":"2026-05-13T00:00:00.000Z","submittedOnDailyAt":"2026-05-14T00:00:00.000Z","title":"FlowCompile: An Optimizing Compiler for Structured LLM Workflows","submittedOnDailyBy":{"_id":"62d09eb86a61a88ea0d83918","avatarUrl":"/avatars/81b511d94cced304ffca058caff662d4.svg","isPro":false,"fullname":"Junyan Li","user":"senfu","type":"user","name":"senfu"},"summary":"Structured LLM workflows, where specialized LLM sub-agents execute according to a predefined graph, have become a powerful abstraction for solving complex tasks. Optimizing such workflows, i.e., selecting configurations for each sub-agent to balance accuracy and latency, is challenging due to the combinatorial design space over model choices, reasoning budgets, and workflow structures. Existing cost-aware methods largely treat workflow optimization as a routing problem, selecting a configuration at inference time for each query according to the accuracy-latency objective used during training. We argue that structured LLM workflows can also be optimized from a compilation perspective: before deployment, the system can globally explore the workflow design space and construct a reusable set of workflow-level configurations spanning diverse accuracy-latency trade-offs. Drawing inspiration from machine learning compilers, we introduce FlowCompile, a structured LLM workflow compiler that performs compile-time design space exploration to identify a high-quality, reusable trade-off set. FlowCompile decomposes a workflow into sub-agents, profiles each sub-agent under diverse configurations, and composes these measurements through a structure-aware proxy to estimate workflow-level accuracy and latency. It then identifies diverse high-quality configurations in a single compile-time pass, without retraining or online adaptation. Experiments across diverse workflows and challenging benchmarks show that FlowCompile consistently outperforms heuristically optimized workflow configurations and routing-based baselines, delivering up to 6.4x speedup. The compiled configuration set further serves as a reusable optimization artifact, enabling flexible deployment under varying runtime preferences and supporting downstream selection or routing.","upvotes":0,"discussionId":"6a064183b1a8cbabc9f0971c","githubRepo":"https://github.com/UMass-Embodied-AGI/FlowCompile","githubRepoAddedBy":"user","ai_summary":"FlowCompile is a structured LLM workflow compiler that optimizes complex multi-agent tasks by performing compile-time exploration of workflow configurations to balance accuracy and latency without retraining.","ai_keywords":["structured LLM workflows","sub-agents","workflow optimization","accuracy-latency trade-offs","compile-time design space exploration","machine learning compilers","workflow-level configurations","structure-aware proxy","runtime preferences"],"githubStars":0,"organization":{"_id":"65205233fe5881ad35a318a4","name":"UMassAmherst","fullname":"University of Massachusetts Amherst","avatar":"https://www.gravatar.com/avatar/0126d0062c96fe5c76a9a41ebb11daff?d=retro&size=100"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"organization":{"_id":"65205233fe5881ad35a318a4","name":"UMassAmherst","fullname":"University of Massachusetts Amherst","avatar":"https://www.gravatar.com/avatar/0126d0062c96fe5c76a9a41ebb11daff?d=retro&size=100"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.13647.md"}">
FlowCompile: An Optimizing Compiler for Structured LLM Workflows
Abstract
FlowCompile is a structured LLM workflow compiler that optimizes complex multi-agent tasks by performing compile-time exploration of workflow configurations to balance accuracy and latency without retraining.
AI-generated summary
Structured LLM workflows, where specialized LLM sub-agents execute according to a predefined graph, have become a powerful abstraction for solving complex tasks. Optimizing such workflows, i.e., selecting configurations for each sub-agent to balance accuracy and latency, is challenging due to the combinatorial design space over model choices, reasoning budgets, and workflow structures. Existing cost-aware methods largely treat workflow optimization as a routing problem, selecting a configuration at inference time for each query according to the accuracy-latency objective used during training. We argue that structured LLM workflows can also be optimized from a compilation perspective: before deployment, the system can globally explore the workflow design space and construct a reusable set of workflow-level configurations spanning diverse accuracy-latency trade-offs. Drawing inspiration from machine learning compilers, we introduce FlowCompile, a structured LLM workflow compiler that performs compile-time design space exploration to identify a high-quality, reusable trade-off set. FlowCompile decomposes a workflow into sub-agents, profiles each sub-agent under diverse configurations, and composes these measurements through a structure-aware proxy to estimate workflow-level accuracy and latency. It then identifies diverse high-quality configurations in a single compile-time pass, without retraining or online adaptation. Experiments across diverse workflows and challenging benchmarks show that FlowCompile consistently outperforms heuristically optimized workflow configurations and routing-based baselines, delivering up to 6.4x speedup. The compiled configuration set further serves as a reusable optimization artifact, enabling flexible deployment under varying runtime preferences and supporting downstream selection or routing.
Community
FlowCompile is an optimizing compiler for structured LLM workflows. Given a workflow graph, a validation/profile set, and a design space over sub-agent models, reasoning budgets, and optional workflow structure choices, FlowCompile performs compile-time design-space exploration and emits a reusable set of workflow-level configurations spanning accuracy-latency trade-offs.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.13647 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.13647 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.13647 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.