Kuma: compiling PyTorch models into self-contained WebGPU executables [P]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
I've been experimenting with a compiler/runtime project that I'm not entirely sure is a good idea, so I'd love some feedback from people who've worked on deployment systems.
The idea is to compile an exported PyTorch model into a self-contained package that contains:
- graph
- binary weights
- backend kernels (currently WGSL)
- runtime metadata
A lightweight runtime loads that package and executes it directly in the browser with WebGPU. No Python, no server inference, and no dependency on a heavyweight runtime.
Right now the attached demos are just neural video representations because they were easy to test, but the motivation is actually operator networks and scientific ML, where I like the idea of distributing a single portable artifact.
The repo is here:
https://github.com/Slater-Victoroff/Kuma
I'm mostly looking for architectural feedback.
Some questions I'm wrestling with:
- Is embedding backend kernels in the artifact a terrible idea?
- Is this solving a real deployment problem or just reinventing ONNX Runtime?
- Are there existing systems I should study that take a similar approach?
- If you were designing a deployment format today, what would you change?
I'd especially appreciate thoughts from people who've worked on ONNX, IREE, TVM, ExecuTorch, MLIR, or similar compiler/runtime projects.
[link] [comments]
More from r/MachineLearning
-
Loss functions in Instance Representation Learning [R]
Jun 29
-
Price elasticity model [R]
Jun 29
-
Rejected MICCAI paper: workshop -> journal/conference or directly journal/conference [R]
Jun 29
-
I built a demo agricultural planning system with an AI advisor for small-scale farmers in Nicaragua using NASA data [p]
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.