I made a superhuman Generals.io agent with self-play RL [P]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
Hi everyone,
I trained a self-play RL agent for Generals.io that reached superhuman-level and ranked #1 on the human 1v1 leaderboard.
It began as my master's thesis where the goal was to beat a prior algorithm based agent. We succeeded using behavior cloning, RL fine-tuning and reward shaping, but the agent was still consistently beaten by the top players.
So I gave it a round two and fixed the largest bottlenecks:
- Reimplemented the whole pipeline in JAX (from NumPy/Torch)
- Used Vision Transformer instead of the CNN
Both are a result of the same idea: to invest in scaling rather than human priors and ad-hoc patches.
The blog is written as a guide for anyone building something similar — the dead ends, the decisions, and the intuitions and tricks I picked up along the way.
It's all open source, including the fast JAX simulator — handy on its own if you want an imperfect-information RTS env to play with.
Links
- Guide: https://kam.mff.cuni.cz/~straka/blog/generals.html
- Simulator (JAX): https://github.com/strakam/generals-bots
- Agent: https://github.com/strakam/AverageJoe
I hope you find the blogpost entertaining!
Feedback and questions welcome 🤗.
[link] [comments]
More from r/MachineLearning
-
Loss functions in Instance Representation Learning [R]
Jun 29
-
Price elasticity model [R]
Jun 29
-
Rejected MICCAI paper: workshop -> journal/conference or directly journal/conference [R]
Jun 29
-
I built a demo agricultural planning system with an AI advisor for small-scale farmers in Nicaragua using NASA data [p]
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.