r/LocalLLaMA · · 1 min read

Finding the 4x 3090 Sweet Spot

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Finding the 4x 3090 Sweet Spot

https://preview.redd.it/8o43bjhe9d1h1.png?width=5346&format=png&auto=webp&s=1c87c2ee8b8ffff43495f543266056b0e26d3947

In another post I had someone ask me about the power draw of the 4x 3090 setup so I'm sharing a a full test I conducted to understand the efficiency curve. Used this blog post (not mine) as a reference.

Setup:

  • GPUs: 4x RTX 3090 (Dell OEM, EVGA XC3, 2x ASUS Strix)
  • PCIe Topology: Gen 3 (Bifurcated: x16 / x8 / x8 / x4)
  • Model: Qwen3.6-27B (FP16)
  • Backend: vLLM v0.20.2 (TP=4)
Power Limit (W) Output (t/s) Prompt Processing (t/s) Total Throughput (t/s) Efficiency (t/joule)
350/390 (Unrestricted) 29 239 269 0.77
300 29 238 268 0.89
275 29 236 265 0.96
250 29 232 261 1.04
220 27 220 248 1.13
200 24 196 221 1.11

Takeaways:

  1. The 220W Sweet Spot: Peak efficiency (matches the blog's findings)
  2. Diminishing Returns: Increasing the limit beyond 250W provides diminishing returns

Hope this helps someone. Happy to answer any questions.

I'm VERY satisfied with Qwen 3.6 27B as a daily driver, but I would still like to know if there are any better/bigger models I can run on this setup. My understanding is that the best I can do is DSv4 at Q2 - not sure if it's fully supported yet though.

Additional context: it's an open build on a generic mining frame. I'm cooling it with 10x TL-C12C-S (5 on each side of gpus perpendicularly). I finished building this very recently so I'm open to suggestions on how to improve it.

Edit: Added prompt processing to the table

submitted by /u/anitamaxwynnn69
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA