r/LocalLLaMA · June 10, 2026 · 1 min read

Dumb question: How would performance be if you took a used server with like 80 lanes pcie 5 and stuck NVMe on them for model run?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

So for LLMs, VRAM speed is king.

But what if you bought a used server which had, for example, 80 lanes of pcie 5 available, and you bifurcated that to hold 40 SSDs @ 2x lanes, with each NVMe doing 15Gbps, that means a mirror of 40 2TB drives could potentially do 600Gbps for a 2TB model. Or if you did 80 nvme @ 1x pcie lane each, you'd get 1.2TB/sec.

That seems pretty good, right? You could get pretty good speeds across any model size.

So why don't people do that and self host the giant 1-2TB models?

submitted by /u/StartupTim
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA