r/LocalLLaMA · · 1 min read

For dual GPUs, will there be any big impact to inference speeds when running in PCIe 5.0 x8/x4 vs x8/x8?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I bought the Biostar Z890 Valkyrie because it was on sale and had three PCIe 5.0 slots connected to the CPU (x16 or x8/x8 or x8/x4/x4), which I thought would be great for running dual GPUs for LLM inference. The problem is that now I want to add a SATA expansion card to the bottom PCIe slot, but this will drop the middle slot to x4 speeds. Would I see a performance hit for inference if I run the two GPUs in x8/x4 mode, both when the model if fully loaded into VRAM and when I have to use partial offloading?

submitted by /u/PhantomWolf83
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA