r/LocalLLaMA · June 17, 2026 · 1 min read

i post-trained a model to reliably roll a die

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

i post-trained a model to reliably roll a die

lots of talk about agi, asi, rsi but ask any frontier LLM to roll a die and it will almost always say "4." claude, gpt, kimi - doesn't matter, 4.4.4.4.

that sounds silly, but I think it’s actually a nice toy problem for one of the most interesting issues in rl: getting a model to actually explore instead of just following strategies it already knows.

so i post-trained a model to reliably roll a die, meaning each number comes up roughly 1/6 of the time. wrote a blogpost on what worked and what didn't. link in comments

submitted by /u/girishkumama
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA