Hugging Face Daily Papers · · 4 min read

Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Code: <a href=\"https://github.com/joykirat18/Agent-BRACE\" rel=\"nofollow\">https://github.com/joykirat18/Agent-BRACE</a></p>\n","updatedAt":"2026-05-13T15:35:15.944Z","author":{"_id":"61ffaa2943eb0913fa2df74a","avatarUrl":"/avatars/a19971f830abb8a8ae95e5800beb9fcd.svg","fullname":"Singh","name":"joykirat","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7476662397384644},"editors":["joykirat"],"editorAvatarUrls":["/avatars/a19971f830abb8a8ae95e5800beb9fcd.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.11436","authors":[{"_id":"6a04992bb1a8cbabc9f0848b","name":"Joykirat Singh","hidden":false},{"_id":"6a04992bb1a8cbabc9f0848c","name":"Zaid Khan","hidden":false},{"_id":"6a04992bb1a8cbabc9f0848d","name":"Archiki Prasad","hidden":false},{"_id":"6a04992bb1a8cbabc9f0848e","name":"Justin Chih-Yao Chen","hidden":false},{"_id":"6a04992bb1a8cbabc9f0848f","name":"Akshay Nambi","hidden":false},{"_id":"6a04992bb1a8cbabc9f08490","name":"Hyunji Lee","hidden":false},{"_id":"6a04992bb1a8cbabc9f08491","name":"Elias Stengel-Eskin","hidden":false},{"_id":"6a04992bb1a8cbabc9f08492","name":"Mohit Bansal","hidden":false}],"publishedAt":"2026-05-12T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty","submittedOnDailyBy":{"_id":"61ffaa2943eb0913fa2df74a","avatarUrl":"/avatars/a19971f830abb8a8ae95e5800beb9fcd.svg","isPro":false,"fullname":"Singh","user":"joykirat","type":"user","name":"joykirat"},"summary":"Large language models (LLMs) are increasingly deployed on long-horizon tasks in partially observable environments, where they must act while inferring and tracking a complex environment state over many steps. This leads to two challenges: partial observability requires maintaining uncertainty over unobserved world attributes, and long interaction history causes context to grow without bound, diluting task-relevant information. A principled solution to both challenges is a belief state: a posterior distribution over environment states given past observations and actions, which compactly encodes history for decision making regardless of episode length. In LLM agents, however, the open-ended nature of text makes it unclear how to represent such a distribution. Therefore, we introduce Agent-BRACE: Agent Belief state Representation via Abstraction and Confidence Estimation, a method that decouples an LLM agent into a belief state model and a policy model, jointly optimized via reinforcement learning. The belief state model produces a structured approximation of the belief distribution: a set of atomic natural language claims about the environment, each annotated with an ordinal verbalized certainty label ranging from certain to unknown. The policy model conditions on this compact, structured approximate belief rather than the full history, learning to select actions under explicit uncertainty. Across long-horizon, partially observable embodied language environments, Agent-BRACE achieves an average absolute improvement of +14.5% (Qwen2.5-3B-Instruct) and +5.3% (Qwen3-4B-Instruct), outperforming strong RL baselines while maintaining a near-constant context window independent of episode length. Further analysis shows that the learned belief becomes increasingly calibrated over the course of an episode as evidence accumulates.","upvotes":0,"discussionId":"6a04992cb1a8cbabc9f08493","githubRepo":"https://github.com/joykirat18/Agent-BRACE","githubRepoAddedBy":"user","ai_summary":"Agent-BRACE decomposes LLM agents into belief state and policy models, using structured textual claims with certainty labels to handle partial observability and long-term dependencies in complex environments.","ai_keywords":["belief state","reinforcement learning","partially observable environments","long-horizon tasks","belief state representation","abstraction","confidence estimation","structured approximate belief","conditional action selection","context window"],"githubStars":0},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.11436.md"}">
Papers
arxiv:2605.11436

Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

Published on May 12
· Submitted by
Singh
on May 13
Authors:
,
,
,
,
,
,
,

Abstract

Agent-BRACE decomposes LLM agents into belief state and policy models, using structured textual claims with certainty labels to handle partial observability and long-term dependencies in complex environments.

AI-generated summary

Large language models (LLMs) are increasingly deployed on long-horizon tasks in partially observable environments, where they must act while inferring and tracking a complex environment state over many steps. This leads to two challenges: partial observability requires maintaining uncertainty over unobserved world attributes, and long interaction history causes context to grow without bound, diluting task-relevant information. A principled solution to both challenges is a belief state: a posterior distribution over environment states given past observations and actions, which compactly encodes history for decision making regardless of episode length. In LLM agents, however, the open-ended nature of text makes it unclear how to represent such a distribution. Therefore, we introduce Agent-BRACE: Agent Belief state Representation via Abstraction and Confidence Estimation, a method that decouples an LLM agent into a belief state model and a policy model, jointly optimized via reinforcement learning. The belief state model produces a structured approximation of the belief distribution: a set of atomic natural language claims about the environment, each annotated with an ordinal verbalized certainty label ranging from certain to unknown. The policy model conditions on this compact, structured approximate belief rather than the full history, learning to select actions under explicit uncertainty. Across long-horizon, partially observable embodied language environments, Agent-BRACE achieves an average absolute improvement of +14.5% (Qwen2.5-3B-Instruct) and +5.3% (Qwen3-4B-Instruct), outperforming strong RL baselines while maintaining a near-constant context window independent of episode length. Further analysis shows that the learned belief becomes increasingly calibrated over the course of an episode as evidence accumulates.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.11436
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.11436 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.11436 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.11436 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers