KZ-SafetyPrompts: A Kazakh Safety Evaluation Prompt Dataset for Large Language Models
Mirrored from arXiv — NLP / Computation & Language for archival readability. Support the source by reading on the original site.
Computer Science > Computation and Language
Title:KZ-SafetyPrompts: A Kazakh Safety Evaluation Prompt Dataset for Large Language Models
Abstract:Kazakh is underrepresented in resources for evaluating the safety behavior of large language models. We present KZ-SafetyPrompts, a Kazakh prompt dataset for safety evaluation across eleven categories covering common risk areas such as self-harm, violence, child exploitation, sexual content, racist content, radicalization, and regulated goods or illegal activities. The dataset contains 5,717 prompts written natively in Kazakh (Cyrillic), organized by category, with English translations for cross-lingual analysis. Prompts resemble realistic user queries, often in a teen or child style, and are phrased as intent prompts without procedural instructions. We document the writing protocol, labeling procedures (including borderline-case decision rules), and quality-control steps (schema standardization, completeness checks, and deduplication). We also align the categories with widely used safety taxonomies to support integration with existing evaluation pipelines. Baseline results with GPT-4o show an overall refusal rate of 28.2%, varying from 5.5% to 53.8% across categories, indicating that Kazakh prompts expose category-specific safety gaps not captured by English-only evaluation.
| Comments: | Accepted at the SIGUL2026 Workshop co-located with LREC2026 |
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2605.26947 [cs.CL] |
| (or arXiv:2605.26947v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2605.26947
arXiv-issued DOI via DataCite (pending registration)
|
Access Paper:
- View PDF
References & Citations
Bibliographic and Citation Tools
Code, Data and Media Associated with this Article
Demos
Recommenders and Search Tools
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
More from arXiv — NLP / Computation & Language
-
Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline
May 27
-
Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications
May 27
-
SPEAR: Code-Augmented Agentic Prompt Optimization
May 27
-
CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations
May 27
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.