Hugging Face Daily Papers · · 4 min read

AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Excited to share AdaPlanBench, a benchmark for studying how LLM agents adaptively re-plan as hidden world constraints and user preferences emerge.</p>\n","updatedAt":"2026-06-05T05:15:36.093Z","author":{"_id":"66783baec3f824dde8f783ac","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66783baec3f824dde8f783ac/oqFYUrgs2vnGRhAMSrQpC.jpeg","fullname":"Jeff","name":"JiayuJeff","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8919764757156372},"editors":["JiayuJeff"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/66783baec3f824dde8f783ac/oqFYUrgs2vnGRhAMSrQpC.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.05622","authors":[{"_id":"6a2239483490a593e87b14b1","name":"Jiayu Liu","hidden":false},{"_id":"6a2239483490a593e87b14b2","name":"Cheng Qian","hidden":false},{"_id":"6a2239483490a593e87b14b3","name":"Zhenhailong Wang","hidden":false},{"_id":"6a2239483490a593e87b14b4","name":"Bingxuan Li","hidden":false},{"_id":"6a2239483490a593e87b14b5","name":"Jiateng Liu","hidden":false},{"_id":"6a2239483490a593e87b14b6","name":"Heng Wang","hidden":false},{"_id":"6a2239483490a593e87b14b7","name":"Jeonghwan Kim","hidden":false},{"_id":"6a2239483490a593e87b14b8","name":"Yumeng Wang","hidden":false},{"_id":"6a2239483490a593e87b14b9","name":"Xiusi Chen","hidden":false},{"_id":"6a2239483490a593e87b14ba","name":"Yi R. Fung","hidden":false},{"_id":"6a2239483490a593e87b14bb","name":"Heng Ji","hidden":false}],"publishedAt":"2026-06-04T00:00:00.000Z","submittedOnDailyAt":"2026-06-05T00:00:00.000Z","title":"AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints","submittedOnDailyBy":{"_id":"66783baec3f824dde8f783ac","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66783baec3f824dde8f783ac/oqFYUrgs2vnGRhAMSrQpC.jpeg","isPro":false,"fullname":"Jeff","user":"JiayuJeff","type":"user","name":"JiayuJeff"},"summary":"Planning for real-world problems by language models often involves both world and user constraints, which may not be fully specified upfront and are progressively disclosed through interaction. However, existing benchmarks still underexplore adaptive planning under such progressively revealed dual constraints. To address this gap, we introduce AdaPlanBench, a dynamic interactive benchmark for evaluating whether Large Language Model (LLM) agents can adaptively plan and re-plan under progressively revealed world and user constraints. AdaPlanBench is built on 307 household tasks, with a scalable constraint construction pipeline that augments each task with dual constraints. At runtime, agents interact with the environment in a multi-turn protocol where hidden constraints are revealed only when the agent proposes a plan that violates them, requiring iterative plan revision under accumulating feedback. This makes planning challenging, as agents must infer and track constraints from feedback while re-planning effectively. Experiments on ten leading LLMs show that adaptive planning under dual constraints remains challenging, with the best model reaching only 67.75% accuracy. We further observe that performance degrades as more constraints accumulate, with user constraints posing a particularly large challenge and failures often stemming from weaker physical grounding and reduced effectiveness. These results establish AdaPlanBench as a testbed for dual-constrained interactive planning and highlight the challenge of reliable adaptation to dynamically revealed constraints in LLM agents.","upvotes":24,"discussionId":"6a2239483490a593e87b14bc","githubRepo":"https://github.com/JiayuJeff/AdaPlanBench","githubRepoAddedBy":"user","ai_summary":"AdaPlanBench presents a dynamic interactive benchmark for evaluating LLM agents' ability to adaptively plan under progressively revealed world and user constraints through multi-turn interactions.","ai_keywords":["Large Language Model","adaptive planning","dual constraints","interactive benchmark","multi-turn protocol","plan revision","constraint inference","physical grounding"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":12,"organization":{"_id":"65448bef5b5d9185ba3202b9","name":"UIUC-CS","fullname":"University of Illinois at Urbana-Champaign","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65448b21fcb96b8b48733729/ycqcXFayMTTD_KpE37067.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66783baec3f824dde8f783ac","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66783baec3f824dde8f783ac/oqFYUrgs2vnGRhAMSrQpC.jpeg","isPro":false,"fullname":"Jeff","user":"JiayuJeff","type":"user"},{"_id":"665e121c6007027038fd4005","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/sIVBJAGM-Kneq9KMf8aXb.png","isPro":false,"fullname":"Cheng Qian","user":"chengq9","type":"user"},{"_id":"6449dbd8df4e6cb7eaef943e","avatarUrl":"/avatars/41a549a7b1cfe1d59ea16b3cbd2168cc.svg","isPro":false,"fullname":"ChengQ","user":"0Cheng0","type":"user"},{"_id":"6789d843c417d858f4fbefb3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/BBq5gEowXVXthEOHSy6AE.png","isPro":false,"fullname":"WANG Rui","user":"Roryaccout","type":"user"},{"_id":"684f8512aeaf14df648ce3f9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/IF3Tf9_TNua0yJF7TPsm-.png","isPro":false,"fullname":"Cheng Qian","user":"chengq-sfr","type":"user"},{"_id":"6587e349f8b453e1f54b1370","avatarUrl":"/avatars/9d8c4f36151f0ac66de7a884b76f4a10.svg","isPro":false,"fullname":"zongqing","user":"zongqing0068","type":"user"},{"_id":"6970f278aa5af823d084c000","avatarUrl":"/avatars/b4f768a02e417c461034a27d13859b74.svg","isPro":false,"fullname":"Zishen-LAI","user":"Ryson-L","type":"user"},{"_id":"6621abac1ee354927a8e0f79","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/2c88f2PzqGsdxMw9x9aHt.jpeg","isPro":false,"fullname":"bingxuan li","user":"bx6d","type":"user"},{"_id":"68400c7b50cb0ac62e5fd9f2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68400c7b50cb0ac62e5fd9f2/UqFfQFbFsCxjLIcwIwdFx.png","isPro":false,"fullname":"Qihan Lin","user":"tunaaa126","type":"user"},{"_id":"67c81bd7e3f9241ddebc5333","avatarUrl":"/avatars/89b928e05ef8a7b91beba17eb1a9eae1.svg","isPro":false,"fullname":"Deng Benyong","user":"Watcher12","type":"user"},{"_id":"66ff7c1132224f5ddc8a05eb","avatarUrl":"/avatars/f6dde58de368fc9606ea2f4439dc0ce0.svg","isPro":false,"fullname":"LYU,Zongwei","user":"LYUZongwei","type":"user"},{"_id":"64c157355fcc1b62eea1b93f","avatarUrl":"/avatars/c147e3965ddf35c18671c93b46b77545.svg","isPro":false,"fullname":"Runchu Tian","user":"Rtian","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":3,"organization":{"_id":"65448bef5b5d9185ba3202b9","name":"UIUC-CS","fullname":"University of Illinois at Urbana-Champaign","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65448b21fcb96b8b48733729/ycqcXFayMTTD_KpE37067.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.05622.md"}">
Papers
arxiv:2606.05622

AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

Published on Jun 4
· Submitted by
Jeff
on Jun 5
#3 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

AdaPlanBench presents a dynamic interactive benchmark for evaluating LLM agents' ability to adaptively plan under progressively revealed world and user constraints through multi-turn interactions.

Planning for real-world problems by language models often involves both world and user constraints, which may not be fully specified upfront and are progressively disclosed through interaction. However, existing benchmarks still underexplore adaptive planning under such progressively revealed dual constraints. To address this gap, we introduce AdaPlanBench, a dynamic interactive benchmark for evaluating whether Large Language Model (LLM) agents can adaptively plan and re-plan under progressively revealed world and user constraints. AdaPlanBench is built on 307 household tasks, with a scalable constraint construction pipeline that augments each task with dual constraints. At runtime, agents interact with the environment in a multi-turn protocol where hidden constraints are revealed only when the agent proposes a plan that violates them, requiring iterative plan revision under accumulating feedback. This makes planning challenging, as agents must infer and track constraints from feedback while re-planning effectively. Experiments on ten leading LLMs show that adaptive planning under dual constraints remains challenging, with the best model reaching only 67.75% accuracy. We further observe that performance degrades as more constraints accumulate, with user constraints posing a particularly large challenge and failures often stemming from weaker physical grounding and reduced effectiveness. These results establish AdaPlanBench as a testbed for dual-constrained interactive planning and highlight the challenge of reliable adaptation to dynamically revealed constraints in LLM agents.

Community

Paper submitter about 6 hours ago

Excited to share AdaPlanBench, a benchmark for studying how LLM agents adaptively re-plan as hidden world constraints and user preferences emerge.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.05622
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.05622 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.05622 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers