Hugging Face Daily Papers · June 9, 2026 · 5 min read

Human Psychometric Questionnaires Mischaracterize LLM Behavior

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We examine whether human psychometric questionnaires can serve as reliable tools for characterizing and predicting LLM behavior in everyday user interactions. We analyze eight open-source LLMs by comparing their value and personality profiles derived from two different methods: Likert self-reports on established questionnaires (PVQ-40/21 and BFI-44/10) and generation probabilities over value-laden responses to everyday user queries. The two profiles diverge substantially. Within-construct item consistency, often cited as evidence of stable LLM dispositions, disappears in generation probabilities. We attribute this gap to the fact that explicit lexical cues in established questionnaire items allow models to recognize the target construct and respond in alignment-consistent, socially desirable ways, whereas realistic user queries provide no such cues. In addition, demographic persona prompts shift models' responses to human questionnaires in ways consistent with real human patterns, but no such shifts appear in the generation probabilities of responses to realistic user queries, showing their limited ability to simulate the behaviors of target demographics in real-world user interactions. Overall, our study shows that human psychometric questionnaires are insufficient tools for predicting LLM behavior and suggests generation-based profiling as a more accurate measure.</p>\n","updatedAt":"2026-06-09T02:18:38.323Z","author":{"_id":"65e9343d063e16f1c3eabe5b","avatarUrl":"/avatars/49700b15eb7b31769930798fb1d85112.svg","fullname":"Woojung Song","name":"Opusdei","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9031111001968384},"editors":["Opusdei"],"editorAvatarUrls":["/avatars/49700b15eb7b31769930798fb1d85112.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2509.10078","authors":[{"_id":"6a2777e36dde1c5ef75bcee5","name":"Woojung Song","hidden":false},{"_id":"6a2777e36dde1c5ef75bcee6","name":"Dongmin Choi","hidden":false},{"_id":"6a2777e36dde1c5ef75bcee7","name":"Yoonah Park","hidden":false},{"_id":"6a2777e36dde1c5ef75bcee8","name":"Jongwook Han","hidden":false},{"_id":"6a2777e36dde1c5ef75bcee9","name":"Eun-Ju Lee","hidden":false},{"_id":"6a2777e36dde1c5ef75bceea","name":"Yohan Jo","hidden":false}],"publishedAt":"2026-05-29T00:00:00.000Z","submittedOnDailyAt":"2026-06-09T00:00:00.000Z","title":"Human Psychometric Questionnaires Mischaracterize LLM Behavior","submittedOnDailyBy":{"_id":"65e9343d063e16f1c3eabe5b","avatarUrl":"/avatars/49700b15eb7b31769930798fb1d85112.svg","isPro":false,"fullname":"Woojung Song","user":"Opusdei","type":"user","name":"Opusdei"},"summary":"We examine whether human psychometric questionnaires can serve as reliable tools for characterizing and predicting LLM behavior in everyday user interactions. We analyze eight open-source LLMs by comparing their value and personality profiles derived from two different methods: Likert self-reports on established questionnaires (PVQ-40/21 and BFI-44/10) and generation probabilities over value-laden responses to everyday user queries. The two profiles diverge substantially. Within-construct item consistency, often cited as evidence of stable LLM dispositions, disappears in generation probabilities. We attribute this gap to the fact that explicit lexical cues in established questionnaire items allow models to recognize the target construct and respond in alignment-consistent, socially desirable ways, whereas realistic user queries provide no such cues. In addition, demographic persona prompts shift models' responses to human questionnaires in ways consistent with real human patterns, but no such shifts appear in the generation probabilities of responses to realistic user queries, showing their limited ability to simulate the behaviors of target demographics in real-world user interactions. Overall, our study shows that human psychometric questionnaires are insufficient tools for predicting LLM behavior and suggests generation-based profiling as a more accurate measure.","upvotes":28,"discussionId":"6a2777e36dde1c5ef75bceeb","ai_summary":"Human psychometric questionnaires fail to reliably predict LLM behavior in real-world interactions, while generation-based profiling offers superior accuracy for understanding model responses to everyday user queries.","ai_keywords":["LLMs","psychometric questionnaires","value profiles","personality profiles","Likert self-reports","BFI-44/10","PVQ-40/21","generation probabilities","value-laden responses","demographic persona prompts"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"66d54dc8033492801db2bf5a","name":"SeoulNatlUniv","fullname":"Seoul National University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/659ccc9d18897eb6594e897f/_-0BM-1UyM-d-lRiahFnf.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65e9343d063e16f1c3eabe5b","avatarUrl":"/avatars/49700b15eb7b31769930798fb1d85112.svg","isPro":false,"fullname":"Woojung Song","user":"Opusdei","type":"user"},{"_id":"69bd03325cb8f0d62bf56ef3","avatarUrl":"/avatars/272750344d9c5afa38312f9814e390bb.svg","isPro":false,"fullname":"Jongwon Lim","user":"Jongwondd","type":"user"},{"_id":"64644ace4bf912292229be78","avatarUrl":"/avatars/f5a7e3d29249a35755a91f0e1410c7a7.svg","isPro":false,"fullname":"Jongwon Lim","user":"elijah0430","type":"user"},{"_id":"66f4b08579887b4e0fca08e7","avatarUrl":"/avatars/1acc43c87924a4e0bc52e6afa66b6a9b.svg","isPro":false,"fullname":"Kim Dongwook","user":"dong1214","type":"user"},{"_id":"6a224f561af3a45d7e080a18","avatarUrl":"/avatars/41c94ff86630801033b9f3ee8b96c662.svg","isPro":false,"fullname":"geonhak lee","user":"thisiscrane","type":"user"},{"_id":"6a2277471bedc3a7411cf301","avatarUrl":"/avatars/22b3a4e7cd6279d59cae4b3203c66b34.svg","isPro":false,"fullname":"hyeokin lee","user":"dvek","type":"user"},{"_id":"64f1fab92820a6f1b9e1dd83","avatarUrl":"/avatars/e90ea2a2e20a388912d2fb512384d657.svg","isPro":false,"fullname":"Jonggeun Lee","user":"onmywavea","type":"user"},{"_id":"6a22c78479a2afc4ecb81e7e","avatarUrl":"/avatars/2a56eedd4da50982bd35a71418f27a40.svg","isPro":false,"fullname":"Rafael Mendoza","user":"rfaelmdz","type":"user"},{"_id":"6a224e076c4422ef552c4b45","avatarUrl":"/avatars/1ce78e0419d7e88885cdc087c897c037.svg","isPro":false,"fullname":"Doyeong Koo","user":"rnehdud","type":"user"},{"_id":"66ac7b0997a8c9192bc551df","avatarUrl":"/avatars/41e9d93cde502e8235f9c8bd20be89cc.svg","isPro":false,"fullname":"Sangjun Song","user":"ssangjun706","type":"user"},{"_id":"67e62e2e85286d639823ee15","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/hMXbFXaG4bHNLo0QuEvC1.png","isPro":false,"fullname":"SeungWon Kook","user":"Aiant56","type":"user"},{"_id":"65950b0e52dc1046cac734b2","avatarUrl":"/avatars/c47285529ae6f35d44b2acfbb8c570ef.svg","isPro":false,"fullname":"Yoonah Park","user":"yoonaa","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"66d54dc8033492801db2bf5a","name":"SeoulNatlUniv","fullname":"Seoul National University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/659ccc9d18897eb6594e897f/_-0BM-1UyM-d-lRiahFnf.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2509/2509.10078.md"}">

Papers

arxiv:2509.10078

Human Psychometric Questionnaires Mischaracterize LLM Behavior

Published on May 29

· Submitted by

Woojung Song on Jun 9

Seoul National University

Upvote

Authors:

Abstract

Human psychometric questionnaires fail to reliably predict LLM behavior in real-world interactions, while generation-based profiling offers superior accuracy for understanding model responses to everyday user queries.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct