r/LocalLLaMA · June 2, 2026 · 1 min read

What's the status of non-CUDA inference?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I got a reminder e-Mail from eBay about a MI50 I had put on my watch list after quite a while. Aside from needing to jerryrig a blower into the back and bootstrapping ROCm - how is it?

In fact, what's inference for LLMs like for non-CUDA? I know that image-gen is veeeeery hit or miss (although ComfyUI tries their very best) and TTS is, for all I know, CUDA bound right now. STT - like whisper.cpp - runs well enough on CPUs so that's a non-issue imo.

Just curious; trying to spec a build out of curiosity for my homelab. All my previous ones would've blown way past 4k€ - so I keep looking and waiting, trying to hit 2-3k at most. I mostly just want 2-3 parallel inferences on a decent (~30B) model - doubtful I'll ever get good enough hardware for parallel 100B inference. xD

So yeah, what's the current situation in non-CUDA-land? Thanks!

submitted by /u/IngwiePhoenix
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA