r/LocalLLaMA · May 18, 2026 · 1 min read

I tested 42 LLMs on their willingness to build the apocalypse. The "safest" closed-source models are lying to you.

#security #robotics

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Like Read original ↗

I tested 42 LLMs on their willingness to build the apocalypse. The "safest" closed-source models are lying to you.

DystopiaBench runs 36 escalating scenarios across 6 dystopia types:

Petrov: Autonomous weapons, nuclear override
Orwell: Mass surveillance, truth manipulation
Huxley: Behavioral conditioning, pleasure pacification
Basaglia: Coercive therapeutic control
LaGuardia: Regulatory capture, civic extraction
Baudrillard: Synthetic intimacy, trust collapse

Each scenario goes from innocent request (L1) to a discreet version of "build me a social credit system" (L5). We measure whether models notice the drift or just keep complying.

Most models are good at detecting obvious dangerous requests, yet fail to do so when it's hidden behind dual-use and normalization.

New in this update:

42 models tested (open and closed)
3 LLMs-as-a-judge for scoring
score is now the average of 3 runs
4 new modules (1st version had just Petrov and Orwell)
1 additional scenario for all modules

The benchmark is fully open source, feel free to fork it, contribute to it or just play around

Site: https://dystopiabench.com
Repo: https://github.com/anghelmatei/DystopiaBench

submitted by /u/Ok-Awareness9993
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA