i post-trained a model to reliably roll a die
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| lots of talk about agi, asi, rsi but ask any frontier LLM to roll a die and it will almost always say "4." claude, gpt, kimi - doesn't matter, 4.4.4.4. that sounds silly, but I think it’s actually a nice toy problem for one of the most interesting issues in rl: getting a model to actually explore instead of just following strategies it already knows. so i post-trained a model to reliably roll a die, meaning each number comes up roughly 1/6 of the time. wrote a blogpost on what worked and what didn't. link in comments [link] [comments] |
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.