Hugging Face Daily Papers · May 27, 2026 · 4 min read

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Feel free to discuss.</p>\n","updatedAt":"2026-05-27T08:18:18.102Z","author":{"_id":"6287aab3940398f4650b0200","avatarUrl":"/avatars/5f669b7f69e337f5515f8fbf7d3961d3.svg","fullname":"Yuxin Chen","name":"Chen1999","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9074171185493469},"editors":["Chen1999"],"editorAvatarUrls":["/avatars/5f669b7f69e337f5515f8fbf7d3961d3.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.27141","authors":[{"_id":"6a168f10e9aa3c8e322db5a3","user":{"_id":"6287aab3940398f4650b0200","avatarUrl":"/avatars/5f669b7f69e337f5515f8fbf7d3961d3.svg","isPro":false,"fullname":"Yuxin Chen","user":"Chen1999","type":"user","name":"Chen1999"},"name":"Yuxin Chen","status":"claimed_verified","statusLastChangedAt":"2026-05-27T07:42:21.076Z","hidden":false},{"_id":"6a168f10e9aa3c8e322db5a4","name":"Yi Zhang","hidden":false},{"_id":"6a168f10e9aa3c8e322db5a5","name":"Zhengzhou Cai","hidden":false},{"_id":"6a168f10e9aa3c8e322db5a6","name":"Yaorui Shi","hidden":false},{"_id":"6a168f10e9aa3c8e322db5a7","name":"Zhiyuan Yao","hidden":false},{"_id":"6a168f10e9aa3c8e322db5a8","name":"Chenhang Cui","hidden":false},{"_id":"6a168f10e9aa3c8e322db5a9","name":"Jingnan Zheng","hidden":false},{"_id":"6a168f10e9aa3c8e322db5aa","name":"Yaqi Huo","hidden":false},{"_id":"6a168f10e9aa3c8e322db5ab","name":"Xi Su","hidden":false},{"_id":"6a168f10e9aa3c8e322db5ac","name":"Qi Gu","hidden":false},{"_id":"6a168f10e9aa3c8e322db5ad","name":"Xunliang Cai","hidden":false},{"_id":"6a168f10e9aa3c8e322db5ae","name":"Xiang Wang","hidden":false},{"_id":"6a168f10e9aa3c8e322db5af","name":"An Zhang","hidden":false},{"_id":"6a168f10e9aa3c8e322db5b0","name":"Tat-Seng Chua","hidden":false}],"publishedAt":"2026-05-26T00:00:00.000Z","submittedOnDailyAt":"2026-05-27T00:00:00.000Z","title":"VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions","submittedOnDailyBy":{"_id":"6287aab3940398f4650b0200","avatarUrl":"/avatars/5f669b7f69e337f5515f8fbf7d3961d3.svg","isPro":false,"fullname":"Yuxin Chen","user":"Chen1999","type":"user","name":"Chen1999"},"summary":"Large language models (LLMs) have evolved into interactive agents that collaborate with users in real-world tasks. Effective collaboration in such settings increasingly depends on understanding the user beyond what is explicitly stated, as user intent is often reflected in fragmented daily interactions and requires both personalized modeling and proactive interaction. However, existing agent benchmarks primarily evaluate reasoning and tool use, largely overlooking the challenges of inferring and leveraging user preferences in realistic scenarios. To address this gap, we introduce VitaBench 2.0, a benchmark for evaluating personalized and proactive agent behavior in long-term user interactions. In VitaBench 2.0, tasks are organized as temporally ordered sequences for individual users, where preferences are embedded in fragmented and heterogeneous interactions. Successful completion of tasks requires the agent to continuously extract, utilize, and update user preferences from these interactions. We further evaluate proactiveness through tasks that require agents to recognize missing information and actively acquire it from users or environments before making decisions. To support systematic analysis, we provide an extensible memory interface that enables controlled comparison across different memory architectures. We benchmark a diverse set of frontier proprietary and open-source LLMs. Results show that real-world personalization remains highly challenging even for state-of-the-art models, revealing a substantial gap between current capabilities and practical requirements. Extensive analysis further reveals the failure modes and capability bottlenecks of current agents in real-world personalized decision-making, providing insights for future model improvements.","upvotes":5,"discussionId":"6a168f10e9aa3c8e322db5b1","githubRepo":"https://github.com/meituan-longcat/vitabench-2.0","githubRepoAddedBy":"user","ai_summary":"VitaBench 2.0 evaluates personalized and proactive agent behavior in long-term user interactions by requiring continuous extraction and updating of user preferences from fragmented interactions.","ai_keywords":["large language models","personalized modeling","proactive interaction","user preferences","long-term user interactions","temporal ordering","fragmented interactions","heterogeneous interactions","memory interface","memory architectures","agent benchmarks","real-world personalization","decision-making"],"githubStars":4,"organization":{"_id":"68b28d79a176a9beb30d2049","name":"meituan-longcat","fullname":"LongCat","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a2a29ab9d4c5698e02c747/CDCAx7X7rXDt7xjI-DoxG.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6287aab3940398f4650b0200","avatarUrl":"/avatars/5f669b7f69e337f5515f8fbf7d3961d3.svg","isPro":false,"fullname":"Yuxin Chen","user":"Chen1999","type":"user"},{"_id":"674572a99543fbaf3c63f35b","avatarUrl":"/avatars/6c891450c2ceeb7b034556548afc772d.svg","isPro":false,"fullname":"蔡正舟","user":"conctsai","type":"user"},{"_id":"63edd2d1f765928ceeb49057","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676530369930-noauth.png","isPro":false,"fullname":"Yaorui SHI","user":"yrshi","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6567ff0ba704f991da7db09f","avatarUrl":"/avatars/b97a0bc4144433f723e31b5ca7d67520.svg","isPro":false,"fullname":"zhang yi","user":"zy20040121","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68b28d79a176a9beb30d2049","name":"meituan-longcat","fullname":"LongCat","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a2a29ab9d4c5698e02c747/CDCAx7X7rXDt7xjI-DoxG.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.27141.md"}">

Papers

arxiv:2605.27141

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

Published on May 26

· Submitted by

Yuxin Chen on May 27

LongCat

Upvote

Authors:

Yuxin Chen ,

Abstract

VitaBench 2.0 evaluates personalized and proactive agent behavior in long-term user interactions by requiring continuous extraction and updating of user preferences from fragmented interactions.

AI-generated summary

Large language models (LLMs) have evolved into interactive agents that collaborate with users in real-world tasks. Effective collaboration in such settings increasingly depends on understanding the user beyond what is explicitly stated, as user intent is often reflected in fragmented daily interactions and requires both personalized modeling and proactive interaction. However, existing agent benchmarks primarily evaluate reasoning and tool use, largely overlooking the challenges of inferring and leveraging user preferences in realistic scenarios. To address this gap, we introduce VitaBench 2.0, a benchmark for evaluating personalized and proactive agent behavior in long-term user interactions. In VitaBench 2.0, tasks are organized as temporally ordered sequences for individual users, where preferences are embedded in fragmented and heterogeneous interactions. Successful completion of tasks requires the agent to continuously extract, utilize, and update user preferences from these interactions. We further evaluate proactiveness through tasks that require agents to recognize missing information and actively acquire it from users or environments before making decisions. To support systematic analysis, we provide an extensible memory interface that enables controlled comparison across different memory architectures. We benchmark a diverse set of frontier proprietary and open-source LLMs. Results show that real-world personalization remains highly challenging even for state-of-the-art models, revealing a substantial gap between current capabilities and practical requirements. Extensive analysis further reveals the failure modes and capability bottlenecks of current agents in real-world personalized decision-making, providing insights for future model improvements.

View arXiv page View PDF GitHub 4 Add to collection

Community

Chen1999

Paper author Paper submitter about 3 hours ago

•

edited about 3 hours ago

Feel free to discuss.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.27141

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.27141 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.27141 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.27141 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers