Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a given chapter, not whether responses align with the character's psychological trajectory, especially in scenarios the source text never explores. We introduce ArcANE (Arc-Aware Narrative Evaluation), an automatically constructed benchmark spanning 17 novels and 80 principal characters. A Character Arc segments the narrative into phases along a psychological axis, and each probe poses the same scenario across phases, spanning both situations within the source text and situations beyond it. Across six models and six context modes, conditioning on the Character Arc tops every other context strategy on every model, and the gap is largest on scenarios outside the source text where retrieval has nothing to find. We further fine-tune open-weight models on the same data to obtain ArcANE-8B/32B, which widen the Arc advantage even more on scenarios outside the source text.</p>\n","updatedAt":"2026-06-05T03:40:21.477Z","author":{"_id":"64644ace4bf912292229be78","avatarUrl":"/avatars/f5a7e3d29249a35755a91f0e1410c7a7.svg","fullname":"Jongwon Lim","name":"elijah0430","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8555381298065186},"editors":["elijah0430"],"editorAvatarUrls":["/avatars/f5a7e3d29249a35755a91f0e1410c7a7.svg"],"reactions":[],"isReport":false}},{"id":"6a2256bc72092e1e3b04f999","author":{"_id":"651a2b87a1a5e5d617d6f1d5","avatarUrl":"/avatars/33dfd3115a6b3d74ef6da7212aa97b14.svg","fullname":"della park","name":"dellaanima","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-06-05T04:55:24.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Really compelling framing. Shifting evaluation from static factual recall to psychological trajectory alignment feels like the right direction for RPLAs, and the out-of-source-text scenarios are a clever stress test where retrieval genuinely can't help.\nI'm curious how the Character Arc conditioning holds up on characters with non-linear or ambiguous arcs (like unreliable narrators), where the psychological axis might be harder to segment cleanly. Nice work!\n","html":"<p>Really compelling framing. Shifting evaluation from static factual recall to psychological trajectory alignment feels like the right direction for RPLAs, and the out-of-source-text scenarios are a clever stress test where retrieval genuinely can't help.<br>I'm curious how the Character Arc conditioning holds up on characters with non-linear or ambiguous arcs (like unreliable narrators), where the psychological axis might be harder to segment cleanly. Nice work!</p>\n","updatedAt":"2026-06-05T04:55:24.359Z","author":{"_id":"651a2b87a1a5e5d617d6f1d5","avatarUrl":"/avatars/33dfd3115a6b3d74ef6da7212aa97b14.svg","fullname":"della park","name":"dellaanima","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8616281151771545},"editors":["dellaanima"],"editorAvatarUrls":["/avatars/33dfd3115a6b3d74ef6da7212aa97b14.svg"],"reactions":[{"reaction":"👍","users":["Jongwondd","johnhan00","Opusdei"],"count":3}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.05553","authors":[{"_id":"6a22450e3490a593e87b151f","name":"Woojung Song","hidden":false},{"_id":"6a22450e3490a593e87b1520","name":"Nalim Kim","hidden":false},{"_id":"6a22450e3490a593e87b1521","name":"Sangjun Song","hidden":false},{"_id":"6a22450e3490a593e87b1522","name":"Chaewon Heo","hidden":false},{"_id":"6a22450e3490a593e87b1523","name":"Jongwon Lim","hidden":false},{"_id":"6a22450e3490a593e87b1524","name":"Yohan Jo","hidden":false}],"publishedAt":"2026-06-04T00:00:00.000Z","submittedOnDailyAt":"2026-06-05T00:00:00.000Z","title":"ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?","submittedOnDailyBy":{"_id":"64644ace4bf912292229be78","avatarUrl":"/avatars/f5a7e3d29249a35755a91f0e1410c7a7.svg","isPro":false,"fullname":"Jongwon Lim","user":"elijah0430","type":"user","name":"elijah0430"},"summary":"Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a given chapter, not whether responses align with the character's psychological trajectory, especially in scenarios the source text never explores. We introduce ArcANE (Arc-Aware Narrative Evaluation), an automatically constructed benchmark spanning 17 novels and 80 principal characters. A Character Arc segments the narrative into phases along a psychological axis, and each probe poses the same scenario across phases, spanning both situations within the source text and situations beyond it. Across six models and six context modes, conditioning on the Character Arc tops every other context strategy on every model, and the gap is largest on scenarios outside the source text where retrieval has nothing to find. We further fine-tune open-weight models on the same data to obtain ArcANE-8B/32B, which widen the Arc advantage even more on scenarios outside the source text.","upvotes":34,"discussionId":"6a22450e3490a593e87b1525","ai_summary":"Role-playing language agents require dynamic character development that evolves through narratives, necessitating benchmarks that evaluate psychological trajectory alignment rather than static factual recall, with ArcANE demonstrating superior performance when character arc information is conditioned into models.","ai_keywords":["role-playing language agents","character arc","narrative evaluation","psychological trajectory","automatic benchmark construction","conditional modeling","fine-tuning","open-weight models"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"66d54dc8033492801db2bf5a","name":"SeoulNatlUniv","fullname":"Seoul National University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/659ccc9d18897eb6594e897f/_-0BM-1UyM-d-lRiahFnf.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64644ace4bf912292229be78","avatarUrl":"/avatars/f5a7e3d29249a35755a91f0e1410c7a7.svg","isPro":false,"fullname":"Jongwon Lim","user":"elijah0430","type":"user"},{"_id":"65e9343d063e16f1c3eabe5b","avatarUrl":"/avatars/49700b15eb7b31769930798fb1d85112.svg","isPro":false,"fullname":"Woojung Song","user":"Opusdei","type":"user"},{"_id":"64a3b603fbd994e0767b52e9","avatarUrl":"/avatars/0eecc4db5b4da27703204b9301440a4b.svg","isPro":false,"fullname":"Minjae Oh","user":"Riasok","type":"user"},{"_id":"66ac7b0997a8c9192bc551df","avatarUrl":"/avatars/41e9d93cde502e8235f9c8bd20be89cc.svg","isPro":false,"fullname":"Sangjun Song","user":"ssangjun706","type":"user"},{"_id":"69bd03325cb8f0d62bf56ef3","avatarUrl":"/avatars/272750344d9c5afa38312f9814e390bb.svg","isPro":false,"fullname":"Jongwon Lim","user":"Jongwondd","type":"user"},{"_id":"67dd45f1a412018fab2705ae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/FfOX4wkw4Zirw2O9Bdd4T.png","isPro":false,"fullname":"holilab","user":"holi-lab","type":"user"},{"_id":"662219a6a46ff7ee8823ebb5","avatarUrl":"/avatars/7e5e1288e15ba7bbcd9a645b12199724.svg","isPro":false,"fullname":"Injin Kong","user":"youuor7r","type":"user"},{"_id":"6552f9e2ab7c20ac6fe7e556","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6552f9e2ab7c20ac6fe7e556/WYErK8nUyPXn4QmfhNlze.jpeg","isPro":false,"fullname":"John","user":"johnhan00","type":"user"},{"_id":"69e991e47d22f27adde7f518","avatarUrl":"/avatars/560c40b29721cd31558f49c5c7e1f797.svg","isPro":false,"fullname":"pikachu","user":"optimized-pikachu","type":"user"},{"_id":"686605c5eff672038883bad1","avatarUrl":"/avatars/ef688cc260afa6f1712b548a89f0e0a4.svg","isPro":false,"fullname":"Hoyeol Yang","user":"hoyeolyang","type":"user"},{"_id":"6729c8b17eb664d70aa03a75","avatarUrl":"/avatars/45ed004e44339e0d9b1cb51ea72b3927.svg","isPro":false,"fullname":"ByeongHyun Yang","user":"PoQo","type":"user"},{"_id":"67371adc7ef9698051041c58","avatarUrl":"/avatars/4a1f58e390421dbd19cb13a4f06ec3e6.svg","isPro":false,"fullname":"Choi","user":"yunhowhour","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":1,"organization":{"_id":"66d54dc8033492801db2bf5a","name":"SeoulNatlUniv","fullname":"Seoul National University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/659ccc9d18897eb6594e897f/_-0BM-1UyM-d-lRiahFnf.png"}}">
ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?
Abstract
Role-playing language agents require dynamic character development that evolves through narratives, necessitating benchmarks that evaluate psychological trajectory alignment rather than static factual recall, with ArcANE demonstrating superior performance when character arc information is conditioned into models.
Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a given chapter, not whether responses align with the character's psychological trajectory, especially in scenarios the source text never explores. We introduce ArcANE (Arc-Aware Narrative Evaluation), an automatically constructed benchmark spanning 17 novels and 80 principal characters. A Character Arc segments the narrative into phases along a psychological axis, and each probe poses the same scenario across phases, spanning both situations within the source text and situations beyond it. Across six models and six context modes, conditioning on the Character Arc tops every other context strategy on every model, and the gap is largest on scenarios outside the source text where retrieval has nothing to find. We further fine-tune open-weight models on the same data to obtain ArcANE-8B/32B, which widen the Arc advantage even more on scenarios outside the source text.
Community
Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a given chapter, not whether responses align with the character's psychological trajectory, especially in scenarios the source text never explores. We introduce ArcANE (Arc-Aware Narrative Evaluation), an automatically constructed benchmark spanning 17 novels and 80 principal characters. A Character Arc segments the narrative into phases along a psychological axis, and each probe poses the same scenario across phases, spanning both situations within the source text and situations beyond it. Across six models and six context modes, conditioning on the Character Arc tops every other context strategy on every model, and the gap is largest on scenarios outside the source text where retrieval has nothing to find. We further fine-tune open-weight models on the same data to obtain ArcANE-8B/32B, which widen the Arc advantage even more on scenarios outside the source text.
Really compelling framing. Shifting evaluation from static factual recall to psychological trajectory alignment feels like the right direction for RPLAs, and the out-of-source-text scenarios are a clever stress test where retrieval genuinely can't help.
I'm curious how the Character Arc conditioning holds up on characters with non-linear or ambiguous arcs (like unreliable narrators), where the psychological axis might be harder to segment cleanly. Nice work!
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.05553 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.05553 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.05553 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.