What's the status of non-CUDA inference?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I got a reminder e-Mail from eBay about a MI50 I had put on my watch list after quite a while. Aside from needing to jerryrig a blower into the back and bootstrapping ROCm - how is it?
In fact, what's inference for LLMs like for non-CUDA? I know that image-gen is veeeeery hit or miss (although ComfyUI tries their very best) and TTS is, for all I know, CUDA bound right now. STT - like whisper.cpp - runs well enough on CPUs so that's a non-issue imo.
Just curious; trying to spec a build out of curiosity for my homelab. All my previous ones would've blown way past 4k€ - so I keep looking and waiting, trying to hit 2-3k at most. I mostly just want 2-3 parallel inferences on a decent (~30B) model - doubtful I'll ever get good enough hardware for parallel 100B inference. xD
So yeah, what's the current situation in non-CUDA-land? Thanks!
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.