r/MachineLearning · June 5, 2026 · 3 min read

Scrap the LLMs. Scoring 4.76% on the brand new ARC-3 using pure code, a 2012 AMD CPU, and zero AI tokens.[P]

#model-release #benchmark #gpu #hardware

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

Hey everyone,

The ARC Prize 2026 just launched the interactive ARC-AGI-3 track, and the collective AI world is panic-renting massive H100 clusters trying to get multi-billion parameter LLMs to navigate these dynamic environments. Predictably, out-of-the-box LLMs are faceplanting pulling in flat 0.00% scores on a ton of these tasks because they can't handle real-time, blind spatial loops.

https://preview.redd.it/xv62h96r6d5h1.png?width=1920&format=png&auto=webp&s=d26412173de8f864efbcc7938d0b4292e86085c0

I decided to go completely in the opposite direction. Zero LLMs. Zero transformer networks. Just raw Python script.

And to make it a true engineering challenge, I ran the local evaluation harness on a machine that belongs in a museum:

CPU: AMD FX-8350 (Released in 2012)
GPU: NVIDIA GeForce GTX 970 (Released in 2014 mostly sitting idle because there's no model to load)
Storage: An ancient 1.82 TB Toshiba spinning hard drive (HDD)

Here is what happened on Run 12 playing game environment r11l-495a7899:

The Strategy: Deterministic Computer Vision

Instead of prompting an AI to "look at the grid and guess an action," I treated the ARC-AGI-3 terminal feed like a classic matrix manipulation problem.

I wrote a highly specialized heuristic-clicker agent. Every time the environment ticks, it parses the grid arrays, segments color blobs, calculates the exact spatial center of mass via object-centroid detection, and maps that mathematical pixel coordinate to a GameAction click.

The Result: Outperforming Frontier Models with Math

My pure-code script successfully beat Level 1, locking in 15.00 points and registering an overall score of 4.76%.

For context, a 4.76% on a blind, zero-instruction interactive environment using nothing but mid-2010s computer vision math completely exposes how reliant modern "agents" are on static pattern matching. A script that takes up less than a few kilobytes of RAM did what massive data-center models are struggling to process.

The Bottleneck: The Efficiency Penalty

So why only 4.76%? Because ARC-AGI-3 tracks learning efficiency against a human baseline.

Human Baseline for Level 1: 2 actions.
My Agent's Actions: 19 actions.
Result: Total session hit GAME_OVER at action #411.

Because my heuristic has no memory of what it did a split-second ago, it got trapped in a micro-jitter loop. It calculates a centroid, clicks it, the grid shifts by a pixel, it recalculates, and clicks the exact same object. It burned 19 clicks on a 2-click puzzle, and the benchmark's scoring algorithm heavily penalized the over-clicking.

Why This Is the Ultimate Forcing Function

Building a rule-based agent on a spinning Toshiba HDD and a 14-year-old CPU architecture is unironically brilliant for optimization. I don't have the VRAM or the clock cycles to write sloppy, bloated code. If my pixel segmentation takes too long, the environment times out.

The Fix for Run 13: I’m writing a hard-coded spatial memory gate. If the newly calculated centroid is within a 2-pixel radius of the last click, and the global grid matrix hasn't drastically changed, the code will suppress the action.

If I can drop that 19-action count down to a lean 2 or 3 clicks, the efficiency multiplier is going to skyrocket the score all running at 2,000+ FPS on a legendary retro rig.

Is anyone else bypassing the LLM wrapper trend entirely for the interactive tracks and treating ARC-AGI-3 like a pure algorithmic logic puzzle? Let's discuss.

submitted by /u/-SLOW-MO-JOHN-D
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

The Strategy: Deterministic Computer Vision

The Result: Outperforming Frontier Models with Math

The Bottleneck: The Efficiency Penalty

Why This Is the Ultimate Forcing Function

Discussion (0)

More from r/MachineLearning