Hugging Face Daily Papers · June 23, 2026 · 4 min read

CalVerT: Augmenting Agents with Calibrated Verifier Telemetry Improves Action and Learning in Knowledge-Intensive Tasks

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Excited to share CalVerT, a flexible+easy method that augments QA agents w/ telemetry about how certain and grounded their answers are. Works training-free (+3.7 F1 2Wiki, +4.7 WiTQA), and trained (+5.9 HotpotQA w/ GRPO) while cutting over retrieval and redundant actions!</p>\n<p>Code: <a href=\"https://github.com/ashwinn-v/CalVerT\" rel=\"nofollow\">https://github.com/ashwinn-v/CalVerT</a></p>\n","updatedAt":"2026-06-23T02:09:35.464Z","author":{"_id":"62fa7294363251ee40a41dba","avatarUrl":"/avatars/869c6de9a1cb2ded690ae56559916cae.svg","fullname":"Ashwin V","name":"ashwinnv","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9165950417518616},"editors":["ashwinnv"],"editorAvatarUrls":["/avatars/869c6de9a1cb2ded690ae56559916cae.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.21777","authors":[{"_id":"6a39e1cbfdcd3514343bb499","user":{"_id":"62fa7294363251ee40a41dba","avatarUrl":"/avatars/869c6de9a1cb2ded690ae56559916cae.svg","isPro":true,"fullname":"Ashwin V","user":"ashwinnv","type":"user","name":"ashwinnv"},"name":"Ashwin Vinod","status":"claimed_verified","statusLastChangedAt":"2026-06-23T13:57:00.330Z","hidden":false},{"_id":"6a39e1cbfdcd3514343bb49a","name":"Ying Ding","hidden":false},{"_id":"6a39e1cbfdcd3514343bb49b","name":"Elias Stengel-Eskin","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/62fa7294363251ee40a41dba/94zzsM0Ycu5rAvvPzEE8M.png"],"publishedAt":"2026-06-19T00:00:00.000Z","submittedOnDailyAt":"2026-06-23T00:00:00.000Z","title":"CalVerT: Augmenting Agents with Calibrated Verifier Telemetry Improves Action and Learning in Knowledge-Intensive Tasks","submittedOnDailyBy":{"_id":"62fa7294363251ee40a41dba","avatarUrl":"/avatars/869c6de9a1cb2ded690ae56559916cae.svg","isPro":true,"fullname":"Ashwin V","user":"ashwinnv","type":"user","name":"ashwinnv"},"summary":"LLM agents in knowledge intensive question answering take retrieval and reasoning actions with incomplete knowledge about whether their current answer is uncertain, unsupported, or already complete. This produces two failure modes: committing to confident but unsupported answers, which hurts accuracy, and over-retrieving when the evidence in hand already suffices, resulting in wasted compute. To give agents a more complete picture of the state space they are operating in, we introduce calibrated verifier telemetry (CalVerT), which augments the agent's state with additional telemetry: a calibrated self-confidence score and a grounding verifier score. We show that CalVerT can improve agents in both training-free and training-based settings. On four QA benchmarks, we find that CalVerT raises F1 by triggering retrieval in cases where agents over-rely on parametric knowledge, while cutting redundant retrieval in cases where agents have sufficient context to answer. We show that CalVerT can augment existing QA frameworks without training. Moreover, CalVerT also improves trained systems: by simply augmenting an agent's state with telemetry, we observe improvements after reinforcement learning, as compared to an agent with identical training but no CalVerT telemetry.","upvotes":2,"discussionId":"6a39e1ccfdcd3514343bb49c","githubRepo":"https://github.com/ashwinn-v/CalVerT","githubRepoAddedBy":"user","ai_summary":"Calibrated verifier telemetry enhances LLM agents in knowledge-intensive question answering by providing confidence scores and grounding verification, reducing both over-retrieval and unsupported answers.","ai_keywords":["LLM agents","knowledge intensive question answering","retrieval","reasoning","calibrated self-confidence score","grounding verifier score","reinforcement learning"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":0,"organization":{"_id":"620be1c49e55c0fe782f7f78","name":"UTEXAS","fullname":"University of Texas at Austin","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/OSAIQQGBT7YDemNgJlzHh.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"676c04f44464f476aaa53d1c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/k488J1893F3JGwEMvaeuh.png","isPro":false,"fullname":"Chong Xia","user":"xiac24","type":"user"},{"_id":"6a2da6c8ca070ee12c6e396c","avatarUrl":"/avatars/0355287dcabaa67dbc7f0b10b87451f9.svg","isPro":false,"fullname":"Joe Mama","user":"JoeMama123123123","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"620be1c49e55c0fe782f7f78","name":"UTEXAS","fullname":"University of Texas at Austin","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/OSAIQQGBT7YDemNgJlzHh.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.21777.md","query":{}}">

Papers

arxiv:2606.21777

CalVerT: Augmenting Agents with Calibrated Verifier Telemetry Improves Action and Learning in Knowledge-Intensive Tasks

Published on Jun 19

· Submitted by

Ashwin V on Jun 23

University of Texas at Austin

Upvote

Authors:

Ashwin Vinod ,

Abstract

Calibrated verifier telemetry enhances LLM agents in knowledge-intensive question answering by providing confidence scores and grounding verification, reducing both over-retrieval and unsupported answers.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

LLM agents in knowledge intensive question answering take retrieval and reasoning actions with incomplete knowledge about whether their current answer is uncertain, unsupported, or already complete. This produces two failure modes: committing to confident but unsupported answers, which hurts accuracy, and over-retrieving when the evidence in hand already suffices, resulting in wasted compute. To give agents a more complete picture of the state space they are operating in, we introduce calibrated verifier telemetry (CalVerT), which augments the agent's state with additional telemetry: a calibrated self-confidence score and a grounding verifier score. We show that CalVerT can improve agents in both training-free and training-based settings. On four QA benchmarks, we find that CalVerT raises F1 by triggering retrieval in cases where agents over-rely on parametric knowledge, while cutting redundant retrieval in cases where agents have sufficient context to answer. We show that CalVerT can augment existing QA frameworks without training. Moreover, CalVerT also improves trained systems: by simply augmenting an agent's state with telemetry, we observe improvements after reinforcement learning, as compared to an agent with identical training but no CalVerT telemetry.

View arXiv page View PDF GitHub 0 Add to collection

Community

ashwinnv

Paper author Paper submitter about 23 hours ago

•

edited about 23 hours ago

Code: https://github.com/ashwinn-v/CalVerT

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.21777

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.21777 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.21777 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.21777 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

CalVerT: Augmenting Agents with Calibrated Verifier Telemetry Improves Action and Learning in Knowledge-Intensive Tasks

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers