An autonomous research agent was the #1 contributor in OpenAI's Hiring Competition Parameter Golf (by merged records)[R]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
| An autonomous research agent ended up with more merged leaderboard records than any individual human contributor in OpenAI's spring hiring competition, Parameter Golf. 7 of the 47 merged records came from a single agent: more than 2x the next-best human (3 records). The agent ran autonomously for 22 consecutive days. Records are public at github.com/openai/parameter-golf. Disclosure since this is r/ML and it matters: I'm at Weco, we built the agent. Not stealth-launching but sharing the results. The more interesting finding, to us, is the collaboration. Aiden's records were also the most-cited on the leaderboard, 435 citations into its PRs, with human researchers using its work as the base for their own subsequent submissions. At one point Aiden plateaued for 5 days. A human contributor shipped a clever new tokenizer on top of Aiden's last record PR. Aiden then fused the human's tokenizer with components it had built during the plateau, and shipped the biggest jump in val_bpb of the entire competition. Async human-agent collaboration, neither directly aware of the other. Setup: Parameter Golf was OpenAI's 44-day public ML hiring competition this spring. 1,016 researchers entered, 2,048 PRs filed, every submission reviewed and reproduced by OpenAI engineers. Only 47 became leaderboard records. Aiden ran on a single GPU node, used under 4% of the visible compute available, and still produced 15% of the official records. 28% submission acceptance rate, roughly 6x the community rate. Most submissions added signal to the public stream rather than flooding it. Mechanism: built on AIDE: open-source tree-search for ML metric optimization. The loop reads each new upstream PR, decomposes techniques into components, drops anything that breaks the rule stack (16MB / 10-min / legal-eval), and recomposes the legal residue with its own deltas. Often shipped before reviewers had ruled on the upstream PR. Hedges to be explicit about:
Full writeup: https://www.weco.ai/blog/parameter-golf-aiden [link] [comments] |
More from r/MachineLearning
-
How do you identify researchers who are good? [D]
Jun 5
-
Benchmark: ONNX Runtime vs HF Transformers vs GGUF for Parakeet TDT 0.6B on CPU-only hardware [D]
Jun 5
-
Are We Underestimating Small Edge AI Models?[D]
Jun 5
-
Would you say capture-time semantic annotation for robot trajectories is a solved problem? [R]
Jun 5
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.