r/LocalLLaMA · · 1 min read

I compared all specs of the major GPUs/machines that are being used here, because bandwidth is not everything. Some of ya'll need a reality check.

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I compared all specs of the major GPUs/machines that are being used here, because bandwidth is not everything. Some of ya'll need a reality check.

Hot takes:
- Mac studio is overpriced Raspberry Pi that is way more inefficient than people think (together with most macs). M5 MBP is better with the "tensor" MMA, but not by much.
- Spark was actually decent when it was just 3-4k. Strix is obviously much better now
- 3090 are complete overkill for single stream usage, V100s are much better value if you can find them cheap. P40 are very niche, but decent if you want exactly 48GB of vram, run moe and don't have money for Mi50s or V100s.
- P100s are extremely underrated entry level LLM gpu's that are not talked about enough. 200 bucks (dual gpu) for a combined 32GB of 700GB/s memory and 70% of M3 Ultra compute is crazy.

I understand that this sub is now filled with gamers who do nothing but ERP with anime waifus on their setups, but for people who do something actually productive, prefill is still very important and this is completely hidden by the "generate 1000 word story" benchmarks that most posts or big AI youtube channels do. Especially with multimodal models that eat up context like mad.

I'm still collecting data for prefill and generation charts I'd like to do in the future... I also couldn't find much reliable power data, so if you could provide that from your own setups in the comments I'll be glad.

Thanks for coming to my ted talk.

submitted by /u/Ok_Top9254
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA