Hugging Face Daily Papers · June 10, 2026 · 5 min read

BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Large language models (LLMs) increasingly participate in emotionally sensitive social conversations, where responses may shift from balanced support toward excessive validation or escalatory alignment. Existing sycophancy research primarily focuses on factual agreement and instruction-following settings, leaving culturally grounded conversational sycophancy underexplored. We introduce BenSyc, the first benchmark for studying conversational sycophancy in Bengali social contexts. Starting from 11,840 Reddit posts and 170k comments collected from communities across Bangladesh and West Bengal, we construct a human-validated benchmark with binary labels and a fine-grained five-level taxonomy spanning Invalidation, Neutral, Support, Validation, and Escalation. We evaluate more than 15 open and proprietary LLMs on conversational alignment classification and response generation tasks. Results show that distinguishing empathetic support from reinforcement-oriented validation remains challenging even for frontier instruction-tuned models: the best system achieves only 61.8 Macro-F1 on binary detection and 61.7 Macro-F1 on five-class classification. In generation settings, several models frequently produce strongly validating or escalatory responses in emotionally charged situations. Our findings highlight substantial variation across model families and conversational behaviors, underscoring the importance of culturally grounded multilingual benchmarks for evaluating socially aligned conversational AI systems.</p>\n","updatedAt":"2026-06-10T01:47:31.778Z","author":{"_id":"624af031ff04dbb2756d48dd","avatarUrl":"/avatars/f521c164a429b43db9752618a83920da.svg","fullname":"Sajib Acharjee Dip","name":"Sajib-006","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8660841584205627},"editors":["Sajib-006"],"editorAvatarUrls":["/avatars/f521c164a429b43db9752618a83920da.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.10061","authors":[{"_id":"6a28c117e7d78ea7587e529c","name":"Kazi Noshin","hidden":false},{"_id":"6a28c117e7d78ea7587e529d","name":"Sajib Acharjee Dip","hidden":false},{"_id":"6a28c117e7d78ea7587e529e","name":"Ranat Das Prangon","hidden":false},{"_id":"6a28c117e7d78ea7587e529f","name":"Fardin Hassan Tamim","hidden":false},{"_id":"6a28c117e7d78ea7587e52a0","name":"Syed Ishtiaque Ahmed","hidden":false},{"_id":"6a28c117e7d78ea7587e52a1","name":"Liqing Zhang","hidden":false},{"_id":"6a28c117e7d78ea7587e52a2","name":"Sharifa Sultana","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/624af031ff04dbb2756d48dd/BlD0LwRBOIHuHYIpxsjC4.png","https://cdn-uploads.huggingface.co/production/uploads/624af031ff04dbb2756d48dd/5dIVXQRQdxqM79rR1Jdmh.png"],"publishedAt":"2026-06-08T00:00:00.000Z","submittedOnDailyAt":"2026-06-10T00:00:00.000Z","title":"BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts","submittedOnDailyBy":{"_id":"624af031ff04dbb2756d48dd","avatarUrl":"/avatars/f521c164a429b43db9752618a83920da.svg","isPro":false,"fullname":"Sajib Acharjee Dip","user":"Sajib-006","type":"user","name":"Sajib-006"},"summary":"Large language models (LLMs) increasingly participate in emotionally sensitive social conversations, where responses may shift from balanced support toward excessive validation or escalatory alignment. Existing sycophancy research primarily focuses on factual agreement and instruction-following settings, leaving culturally grounded conversational sycophancy underexplored. We introduce BenSyc, the first benchmark for studying conversational sycophancy in Bengali social contexts. Starting from 11,840 Reddit posts and 170k comments collected from communities across Bangladesh and West Bengal, we construct a human-validated benchmark with binary labels and a fine-grained five-level taxonomy spanning Invalidation, Neutral, Support, Validation, and Escalation. We evaluate more than 15 open and proprietary LLMs on conversational alignment classification and response generation tasks. Results show that distinguishing empathetic support from reinforcement-oriented validation remains challenging even for frontier instruction-tuned models: the best system achieves only 61.8 Macro-F1 on binary detection and 61.7 Macro-F1 on five-class classification. In generation settings, several models frequently produce strongly validating or escalatory responses in emotionally charged situations. Our findings highlight substantial variation across model families and conversational behaviors, underscoring the importance of culturally grounded multilingual benchmarks for evaluating socially aligned conversational AI systems.","upvotes":0,"discussionId":"6a28c117e7d78ea7587e52a3","projectPage":"https://huggingface.co/spaces/Sajib-006/bensyc-project","ai_summary":"Researchers create BenSyc, a benchmark for evaluating conversational sycophancy in Bengali contexts, revealing challenges in distinguishing empathetic support from validation and escalation in emotionally sensitive dialogues.","ai_keywords":["conversational sycophancy","empathetic support","validation","escalation","multilingual benchmarks","instruction-tuned models","binary detection","five-class classification"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"65448bef5b5d9185ba3202b9","name":"UIUC-CS","fullname":"University of Illinois at Urbana-Champaign","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65448b21fcb96b8b48733729/ycqcXFayMTTD_KpE37067.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"organization":{"_id":"65448bef5b5d9185ba3202b9","name":"UIUC-CS","fullname":"University of Illinois at Urbana-Champaign","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65448b21fcb96b8b48733729/ycqcXFayMTTD_KpE37067.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.10061.md"}">

Papers

arxiv:2606.10061

BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts

Published on Jun 8

· Submitted by

Sajib Acharjee Dip on Jun 10

University of Illinois at Urbana-Champaign

Upvote

Authors:

Abstract

Researchers create BenSyc, a benchmark for evaluating conversational sycophancy in Bengali contexts, revealing challenges in distinguishing empathetic support from validation and escalation in emotionally sensitive dialogues.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

View arXiv page View PDF Project page Add to collection

Community

Sajib-006

Paper submitter about 15 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.10061

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.10061 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts

Abstract

Community

Models citing this paper 0

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers