Hugging Face Daily Papers · June 11, 2026 · 3 min read

τ-Rec: A Verifiable Benchmark for Agentic Recommender Systems

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

τ-Rec: A verifiable benchmark for agentic recommender systems. Dataset: <a href=\"https://huggingface.co/datasets/nbharaths/tau-rec\">https://huggingface.co/datasets/nbharaths/tau-rec</a></p>\n","updatedAt":"2026-06-11T12:37:45.980Z","author":{"_id":"642dd9eb2f6dbab7757ea329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/nDglHznmW74nEjKhZ1sim.png","fullname":"Bharath Sivaram Narasimhan","name":"nbharaths","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5786004662513733},"editors":["nbharaths"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/nDglHznmW74nEjKhZ1sim.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.10156","authors":[{"_id":"6a295de0887fb79cbf65d672","user":{"_id":"642dd9eb2f6dbab7757ea329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/nDglHznmW74nEjKhZ1sim.png","isPro":false,"fullname":"Bharath Sivaram Narasimhan","user":"nbharaths","type":"user","name":"nbharaths"},"name":"Bharath Sivaram Narasimhan","status":"claimed_verified","statusLastChangedAt":"2026-06-11T08:39:58.941Z","hidden":false},{"_id":"6a295de0887fb79cbf65d673","name":"Karthik R Narasimhan","hidden":false}],"publishedAt":"2026-06-08T00:00:00.000Z","submittedOnDailyAt":"2026-06-11T00:00:00.000Z","title":"τ-Rec: A Verifiable Benchmark for Agentic Recommender Systems","submittedOnDailyBy":{"_id":"642dd9eb2f6dbab7757ea329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/nDglHznmW74nEjKhZ1sim.png","isPro":false,"fullname":"Bharath Sivaram Narasimhan","user":"nbharaths","type":"user","name":"nbharaths"},"summary":"As recommender systems transition toward agentic, multi-turn conversational interfaces, evaluation paradigms have struggled to keep pace. Current benchmarks often rely on \"LLM-as-a-judge\" evaluations, which introduce subjectivity, high costs and inconsistency. We present τ-Rec, a benchmark for agentic recommender systems that replaces subjective evaluation with verifiable rewards and a reveal-tagged elicitation (RTE) mechanism that controls how task constraints surface during dialogue. By testing agents against structured catalog predicates and employing a pass^k reliability metric, τ-Rec provides a systematic test for consistent reasoning. Our evaluation of nine configurations across five model families -- GPT-5.4, Claude Sonnet 4.6, Gemini 2.5 Flash, DeepSeek V4 Flash, Qwen3-32B and GPT-5 mini -- reveals a steep reliability cliff, where even the best model achieves only ~57% at pass^1 and ~38% at pass^4, highlighting a critical gap in current conversational agent deployment. All code and data are publicly available at https://github.com/nbharaths/tau-rec.","upvotes":1,"discussionId":"6a295de0887fb79cbf65d674","githubRepo":"https://github.com/nbharaths/tau-rec","githubRepoAddedBy":"user","ai_summary":"A benchmark for agentic recommender systems is introduced that uses verifiable rewards and controlled dialogue constraints to evaluate conversational agent reliability, revealing significant performance gaps among leading models.","ai_keywords":["agentic recommender systems","LLM-as-a-judge","reward-based evaluation","reveal-tagged elicitation","pass^k reliability metric","conversational interfaces","structured catalog predicates"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":1},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6a2ae6c2e36bc84d91b6e7cc","avatarUrl":"/avatars/abf4b4c0020f9332b6827952cc53163e.svg","isPro":false,"fullname":"mmgood","user":"mmgood","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.10156.md"}">

Papers

arxiv:2606.10156

τ-Rec: A Verifiable Benchmark for Agentic Recommender Systems

Published on Jun 8

· Submitted by

Bharath Sivaram Narasimhan on Jun 11

Upvote

Authors:

Bharath Sivaram Narasimhan ,

Abstract

A benchmark for agentic recommender systems is introduced that uses verifiable rewards and controlled dialogue constraints to evaluate conversational agent reliability, revealing significant performance gaps among leading models.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

As recommender systems transition toward agentic, multi-turn conversational interfaces, evaluation paradigms have struggled to keep pace. Current benchmarks often rely on "LLM-as-a-judge" evaluations, which introduce subjectivity, high costs and inconsistency. We present τ-Rec, a benchmark for agentic recommender systems that replaces subjective evaluation with verifiable rewards and a reveal-tagged elicitation (RTE) mechanism that controls how task constraints surface during dialogue. By testing agents against structured catalog predicates and employing a pass^k reliability metric, τ-Rec provides a systematic test for consistent reasoning. Our evaluation of nine configurations across five model families -- GPT-5.4, Claude Sonnet 4.6, Gemini 2.5 Flash, DeepSeek V4 Flash, Qwen3-32B and GPT-5 mini -- reveals a steep reliability cliff, where even the best model achieves only ~57% at pass^1 and ~38% at pass^4, highlighting a critical gap in current conversational agent deployment. All code and data are publicly available at https://github.com/nbharaths/tau-rec.

View arXiv page View PDF GitHub 1 Add to collection

Community

nbharaths

Paper author Paper submitter about 7 hours ago

τ-Rec: A verifiable benchmark for agentic recommender systems. Dataset: https://huggingface.co/datasets/nbharaths/tau-rec

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.10156

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.10156 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.10156 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

τ-Rec: A Verifiable Benchmark for Agentic Recommender Systems

Abstract

Community

Models citing this paper 0

Datasets citing this paper 1

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers