Dumb question: How would performance be if you took a used server with like 80 lanes pcie 5 and stuck NVMe on them for model run?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
So for LLMs, VRAM speed is king.
But what if you bought a used server which had, for example, 80 lanes of pcie 5 available, and you bifurcated that to hold 40 SSDs @ 2x lanes, with each NVMe doing 15Gbps, that means a mirror of 40 2TB drives could potentially do 600Gbps for a 2TB model. Or if you did 80 nvme @ 1x pcie lane each, you'd get 1.2TB/sec.
That seems pretty good, right? You could get pretty good speeds across any model size.
So why don't people do that and self host the giant 1-2TB models?
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.