Scrap the LLMs. Scoring 4.76% on the brand new ARC-3 using pure code, a 2012 AMD CPU, and zero AI tokens.[P]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
| Hey everyone, The ARC Prize 2026 just launched the interactive ARC-AGI-3 track, and the collective AI world is panic-renting massive H100 clusters trying to get multi-billion parameter LLMs to navigate these dynamic environments. Predictably, out-of-the-box LLMs are faceplanting pulling in flat 0.00% scores on a ton of these tasks because they can't handle real-time, blind spatial loops. I decided to go completely in the opposite direction. Zero LLMs. Zero transformer networks. Just raw Python script. And to make it a true engineering challenge, I ran the local evaluation harness on a machine that belongs in a museum:
Here is what happened on Run 12 playing game environment The Strategy: Deterministic Computer VisionInstead of prompting an AI to "look at the grid and guess an action," I treated the ARC-AGI-3 terminal feed like a classic matrix manipulation problem. I wrote a highly specialized The Result: Outperforming Frontier Models with MathMy pure-code script successfully beat Level 1, locking in 15.00 points and registering an overall score of 4.76%. For context, a 4.76% on a blind, zero-instruction interactive environment using nothing but mid-2010s computer vision math completely exposes how reliant modern "agents" are on static pattern matching. A script that takes up less than a few kilobytes of RAM did what massive data-center models are struggling to process. The Bottleneck: The Efficiency PenaltySo why only 4.76%? Because ARC-AGI-3 tracks learning efficiency against a human baseline.
Because my heuristic has no memory of what it did a split-second ago, it got trapped in a micro-jitter loop. It calculates a centroid, clicks it, the grid shifts by a pixel, it recalculates, and clicks the exact same object. It burned 19 clicks on a 2-click puzzle, and the benchmark's scoring algorithm heavily penalized the over-clicking. Why This Is the Ultimate Forcing FunctionBuilding a rule-based agent on a spinning Toshiba HDD and a 14-year-old CPU architecture is unironically brilliant for optimization. I don't have the VRAM or the clock cycles to write sloppy, bloated code. If my pixel segmentation takes too long, the environment times out. The Fix for Run 13: I’m writing a hard-coded spatial memory gate. If the newly calculated centroid is within a 2-pixel radius of the last click, and the global grid matrix hasn't drastically changed, the code will suppress the action. If I can drop that 19-action count down to a lean 2 or 3 clicks, the efficiency multiplier is going to skyrocket the score all running at 2,000+ FPS on a legendary retro rig. Is anyone else bypassing the LLM wrapper trend entirely for the interactive tracks and treating ARC-AGI-3 like a pure algorithmic logic puzzle? Let's discuss. [link] [comments] |
More from r/MachineLearning
-
[R] Measuring the Symmetry--Data Exchange Rate
Jun 4
-
How do ML researchers actually use AI tools to improve their writing? [D]
Jun 4
-
We built a source-available LLM reliability library (free for research / personal / internal eval) that can cut inference cost by half at matched quality, and you adopt it by changing one import [P] [R]
Jun 4
-
[P]Stop using print() to debug your agents. Here's a 60-second alternative.[P]
Jun 4
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.