Working on a cgo-free CUDA binding in Go for ML stuff Week 3 - open source [P]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
At our work we use CUDA in Rust since the company switched to it recently. Rust has pretty good Driver API bindings but it made me wonder why the hell we cant have something decent in Go without cgo.
I mostly build ML tools in the last month and Go is my main language for pretty much everything.
Problem is most Go CUDA projects still need cgo and the full toolkit at build time. That breaks cross compilation and makes Docker images huge which sucks when working on machine learning projects.
So last month I started messing around with a proof of concept that loads libcuda.so at runtime using purego. No cgo at all.
Biggest pain was thread affinity. CUDA keeps context per thread so goroutines switching around kept breaking things. I built a simple executor that locks an OS thread with runtime.LockOSThread and funnels all calls through a channel.
Heres roughly what using it looks like right now:
func run() error { cuda.Init() dev, _ := cuda.GetDevice(0) ctx, _ := dev.Primary() defer ctx.Close() a, _ := cuda.Alloc[float32](ctx, 1024) b, _ := cuda.Alloc[float32](ctx, 1024) c, _ := cuda.Alloc[float32](ctx, 1024) stream, _ := ctx.NewStream() start, _ := ctx.NewEvent() stop, _ := ctx.NewEvent() start.Record(stream) fn.LaunchOn(bg, stream, cfg, cuda.Arg(a), cuda.Arg(b), cuda.Arg(c), cuda.ArgValue(int32(1024)), ) stop.Record(stream) stop.Synchronize() duration, _ := start.Elapsed(stop) fmt.Printf("GPU time: %v\n", duration) return nil } On my 4070 Ti a 10M vector add showed CPU timer at like 160us but actual GPU event timing was 434us. That difference surprised me.
The project is still super early and moves slow cuz i only code on weekends and im a total noob with CUDA. Slowly adding Graphs and multi gpu support.
THIS IS SO early , so treat it more like a learning cuda repo, but im having fun learning cuda. Thought some of you might find it interesting too.
repo is github.com/eitamring/gocudrv if you wanna take a look.
Would be cool if anyone with 5xxx series cards wants to try it wink wink
[link] [comments]
More from r/MachineLearning
-
PapersWithCode new features - week 1 [P]
May 24
-
Expedia ML Scientist II interview experience anyone ? [D]
May 24
-
Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA [D]
May 24
-
Per-pixel bounding-box regression + DBSCAN for handwritten word detection - visual walkthrough of WordDetectorNet [P]
May 23
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.