r/LocalLLaMA · · 1 min read

Maximizing performance of 2x3090 + NVLink

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Hey all, I have built myself a decent rig with the following specs:

- Ubuntu 24.04
- 2x3090 founder’s with NVLink
- Ryzen 7950x3d
- 64GB DDR5

I am currently routing my display through an eGPU to maximize available VRAM. My current go-to is Qwen 3.6 27B Q8_0 with MTP and ik_llama’s graph split + ngl 99. It works very well with pi and I get very good output, but I can only manage to get ~60 Tok/s at the absolute maximum in very short bursts, and it lives around 40-45TPS on average.

I imagine that my setup, minus maybe the nvlink, is pretty common to this sub, so I’m curious to hear how people are squeezing more performance out of their cards, or if the stats I’m seeing are par for the course.

submitted by /u/IUseClifford
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA