r/MachineLearning · · 1 min read

TinyTPU: SystemVerilog systolic array compiled to WASM, running live in browser - RTL golden-verified against numpy [P]

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

TinyTPU: SystemVerilog systolic array compiled to WASM, running live in browser - RTL golden-verified against numpy [P]

Most explanations of TPUs and systolic arrays are either hand-wavy diagrams or papers. I wanted to see the thing actually run, so I built it.

TinyTPU is a 4×4 weight-stationary systolic array in real SystemVerilog, compiled to WebAssembly, with a step-by-step browser visualization.

You enter two matrices, hit run, and watch the actual hardware execute: weights loading into PEs, matrix A streaming in diagonally (the "skew" that makes systolic arrays work), partial sums accumulating down the grid, results draining from the bottom.

It has three levels:

  • L1 - isolate a single MAC cell, watch one multiply-accumulate happen
  • L2 - the full 4×4 array executing a real matmul
  • L3 - tiling: what happens when your matrix is bigger than the hardware

Nothing on screen is faked. The visualization reads state directly from compiled RTL.

If you're trying to understand how matrix multiply maps to hardware why TPUs are efficient, what "weight-stationary" actually means, why the diagonal stagger exists this might click it for you in a way papers don't.

Repo: tiny-tpu

Live demo: Live

If this project interests you please do star the repo, if you find something needs improving open a PR, I hope ya'll check this out and give me some feedback 🙏

submitted by /u/Horror-Flamingo-2150
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/MachineLearning