Hugging Face Daily Papers · · 4 min read

RobotValues: Evaluating Household Robots When Human Values Conflict

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Seems like a very timely work, nice job!</p>\n","updatedAt":"2026-06-05T04:14:05.854Z","author":{"_id":"64a3b603fbd994e0767b52e9","avatarUrl":"/avatars/0eecc4db5b4da27703204b9301440a4b.svg","fullname":"Minjae Oh","name":"Riasok","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9881613850593567},"editors":["Riasok"],"editorAvatarUrls":["/avatars/0eecc4db5b4da27703204b9301440a4b.svg"],"reactions":[],"isReport":false}},{"id":"6a224d7146ed1f272d17237c","author":{"_id":"64644ace4bf912292229be78","avatarUrl":"/avatars/f5a7e3d29249a35755a91f0e1410c7a7.svg","fullname":"Jongwon Lim","name":"elijah0430","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-06-05T04:15:45.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"RobotValues: Evaluating Household Robots When Human Values Conflict","html":"<p>RobotValues: Evaluating Household Robots When Human Values Conflict</p>\n","updatedAt":"2026-06-05T04:15:45.857Z","author":{"_id":"64644ace4bf912292229be78","avatarUrl":"/avatars/f5a7e3d29249a35755a91f0e1410c7a7.svg","fullname":"Jongwon Lim","name":"elijah0430","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.4994271993637085},"editors":["elijah0430"],"editorAvatarUrls":["/avatars/f5a7e3d29249a35755a91f0e1410c7a7.svg"],"reactions":[],"isReport":false}},{"id":"6a2256fefac5014892dfc1e7","author":{"_id":"651a2b87a1a5e5d617d6f1d5","avatarUrl":"/avatars/33dfd3115a6b3d74ef6da7212aa97b14.svg","fullname":"della park","name":"dellaanima","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-06-05T04:56:30.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Important angle. Framing value conflicts as a first-class evaluation target feels overdue, and the privacy underselection finding is especially striking. The 80% failure rate at overriding default preferences even when explicitly instructed is a real concern. Great work!\n","html":"<p>Important angle. Framing value conflicts as a first-class evaluation target feels overdue, and the privacy underselection finding is especially striking. The 80% failure rate at overriding default preferences even when explicitly instructed is a real concern. Great work!</p>\n","updatedAt":"2026-06-05T04:56:30.563Z","author":{"_id":"651a2b87a1a5e5d617d6f1d5","avatarUrl":"/avatars/33dfd3115a6b3d74ef6da7212aa97b14.svg","fullname":"della park","name":"dellaanima","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8902193307876587},"editors":["dellaanima"],"editorAvatarUrls":["/avatars/33dfd3115a6b3d74ef6da7212aa97b14.svg"],"reactions":[{"reaction":"🚀","users":["Jongwondd","johnhan00"],"count":2}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.03312","authors":[{"_id":"6a224bc93490a593e87b1582","name":"Jongwook Han","hidden":false},{"_id":"6a224bc93490a593e87b1583","name":"Hyeongjin Kim","hidden":false},{"_id":"6a224bc93490a593e87b1584","name":"Yohan Jo","hidden":false}],"publishedAt":"2026-06-02T00:00:00.000Z","submittedOnDailyAt":"2026-06-05T00:00:00.000Z","title":"RobotValues: Evaluating Household Robots When Human Values Conflict","submittedOnDailyBy":{"_id":"64644ace4bf912292229be78","avatarUrl":"/avatars/f5a7e3d29249a35755a91f0e1410c7a7.svg","isPro":false,"fullname":"Jongwon Lim","user":"elijah0430","type":"user","name":"elijah0430"},"summary":"While household robots are often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize other values than task success, such as human autonomy, efficiency, or social appropriateness. Yet, there are no benchmarks for evaluating robots' value preferences in such scenarios. We introduce RobotValues, a benchmark to evaluate household robot planners in 10K value-conflict scenarios. Each instance consists of a realistic household image with multiple plausible robot actions that prioritize different human values. We construct RobotValues through LLM-assisted scenario generation, stakeholder-grounded value extraction, image generation and automatic quality control. Using RobotValues we evaluate VLMs used in robotics and find that models exhibit default value preferences, including safety and accommodation, while underselecting privacy-prioritizing actions. When the models are instructed to prioritize specific values that conflict with their own preferences, they often fail to override their default actions, choosing incorrect actions for 80% of the time. These findings suggest that household robot evaluation should measure not only task completion or safety compliance, but also whether robots can choose among plausible actions when human values conflict.","upvotes":20,"discussionId":"6a224bc93490a593e87b1585","ai_summary":"RobotValues benchmark evaluates household robot planners in value-conflict scenarios, revealing that vision-language models exhibit default value preferences and struggle to override them when instructed to prioritize conflicting values.","ai_keywords":["RobotValues","value-conflict scenarios","vision-language models","household robots","stakeholder-grounded value extraction","LLM-assisted scenario generation","image generation","automatic quality control"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"66d54dc8033492801db2bf5a","name":"SeoulNatlUniv","fullname":"Seoul National University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/659ccc9d18897eb6594e897f/_-0BM-1UyM-d-lRiahFnf.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64a3b603fbd994e0767b52e9","avatarUrl":"/avatars/0eecc4db5b4da27703204b9301440a4b.svg","isPro":false,"fullname":"Minjae Oh","user":"Riasok","type":"user"},{"_id":"65e9343d063e16f1c3eabe5b","avatarUrl":"/avatars/49700b15eb7b31769930798fb1d85112.svg","isPro":false,"fullname":"Woojung Song","user":"Opusdei","type":"user"},{"_id":"69e991e47d22f27adde7f518","avatarUrl":"/avatars/560c40b29721cd31558f49c5c7e1f797.svg","isPro":false,"fullname":"pikachu","user":"optimized-pikachu","type":"user"},{"_id":"6552f9e2ab7c20ac6fe7e556","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6552f9e2ab7c20ac6fe7e556/WYErK8nUyPXn4QmfhNlze.jpeg","isPro":false,"fullname":"John","user":"johnhan00","type":"user"},{"_id":"67dd45f1a412018fab2705ae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/FfOX4wkw4Zirw2O9Bdd4T.png","isPro":false,"fullname":"holilab","user":"holi-lab","type":"user"},{"_id":"64644ace4bf912292229be78","avatarUrl":"/avatars/f5a7e3d29249a35755a91f0e1410c7a7.svg","isPro":false,"fullname":"Jongwon Lim","user":"elijah0430","type":"user"},{"_id":"67371adc7ef9698051041c58","avatarUrl":"/avatars/4a1f58e390421dbd19cb13a4f06ec3e6.svg","isPro":false,"fullname":"Choi","user":"yunhowhour","type":"user"},{"_id":"66ac7b0997a8c9192bc551df","avatarUrl":"/avatars/41e9d93cde502e8235f9c8bd20be89cc.svg","isPro":false,"fullname":"Sangjun Song","user":"ssangjun706","type":"user"},{"_id":"69bd03325cb8f0d62bf56ef3","avatarUrl":"/avatars/272750344d9c5afa38312f9814e390bb.svg","isPro":false,"fullname":"Jongwon Lim","user":"Jongwondd","type":"user"},{"_id":"686605c5eff672038883bad1","avatarUrl":"/avatars/ef688cc260afa6f1712b548a89f0e0a4.svg","isPro":false,"fullname":"Hoyeol Yang","user":"hoyeolyang","type":"user"},{"_id":"662219a6a46ff7ee8823ebb5","avatarUrl":"/avatars/7e5e1288e15ba7bbcd9a645b12199724.svg","isPro":false,"fullname":"Injin Kong","user":"youuor7r","type":"user"},{"_id":"65950b0e52dc1046cac734b2","avatarUrl":"/avatars/c47285529ae6f35d44b2acfbb8c570ef.svg","isPro":false,"fullname":"Yoonah Park","user":"yoonaa","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"66d54dc8033492801db2bf5a","name":"SeoulNatlUniv","fullname":"Seoul National University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/659ccc9d18897eb6594e897f/_-0BM-1UyM-d-lRiahFnf.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.03312.md"}">
Papers
arxiv:2606.03312

RobotValues: Evaluating Household Robots When Human Values Conflict

Published on Jun 2
· Submitted by
Jongwon Lim
on Jun 5
Authors:
,
,

Abstract

RobotValues benchmark evaluates household robot planners in value-conflict scenarios, revealing that vision-language models exhibit default value preferences and struggle to override them when instructed to prioritize conflicting values.

While household robots are often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize other values than task success, such as human autonomy, efficiency, or social appropriateness. Yet, there are no benchmarks for evaluating robots' value preferences in such scenarios. We introduce RobotValues, a benchmark to evaluate household robot planners in 10K value-conflict scenarios. Each instance consists of a realistic household image with multiple plausible robot actions that prioritize different human values. We construct RobotValues through LLM-assisted scenario generation, stakeholder-grounded value extraction, image generation and automatic quality control. Using RobotValues we evaluate VLMs used in robotics and find that models exhibit default value preferences, including safety and accommodation, while underselecting privacy-prioritizing actions. When the models are instructed to prioritize specific values that conflict with their own preferences, they often fail to override their default actions, choosing incorrect actions for 80% of the time. These findings suggest that household robot evaluation should measure not only task completion or safety compliance, but also whether robots can choose among plausible actions when human values conflict.

Community

Seems like a very timely work, nice job!

Paper submitter about 7 hours ago

RobotValues: Evaluating Household Robots When Human Values Conflict

Important angle. Framing value conflicts as a first-class evaluation target feels overdue, and the privacy underselection finding is especially striking. The 80% failure rate at overriding default preferences even when explicitly instructed is a real concern. Great work!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.03312
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.03312 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.03312 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.03312 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers