r/LocalLLaMA · May 15, 2026 · 1 min read

Finding the 4x 3090 Sweet Spot

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

https://preview.redd.it/8o43bjhe9d1h1.png?width=5346&format=png&auto=webp&s=1c87c2ee8b8ffff43495f543266056b0e26d3947

In another post I had someone ask me about the power draw of the 4x 3090 setup so I'm sharing a a full test I conducted to understand the efficiency curve. Used this blog post (not mine) as a reference.

Setup:

GPUs: 4x RTX 3090 (Dell OEM, EVGA XC3, 2x ASUS Strix)
PCIe Topology: Gen 3 (Bifurcated: x16 / x8 / x8 / x4)
Model: Qwen3.6-27B (FP16)
Backend: vLLM v0.20.2 (TP=4)

Power Limit (W)	Output (t/s)	Prompt Processing (t/s)	Total Throughput (t/s)	Efficiency (t/joule)
350/390 (Unrestricted)	29	239	269	0.77
300	29	238	268	0.89
275	29	236	265	0.96
250	29	232	261	1.04
220	27	220	248	1.13
200	24	196	221	1.11

Takeaways:

The 220W Sweet Spot: Peak efficiency (matches the blog's findings)
Diminishing Returns: Increasing the limit beyond 250W provides diminishing returns

Hope this helps someone. Happy to answer any questions.

I'm VERY satisfied with Qwen 3.6 27B as a daily driver, but I would still like to know if there are any better/bigger models I can run on this setup. My understanding is that the best I can do is DSv4 at Q2 - not sure if it's fully supported yet though.

Additional context: it's an open build on a generic mining frame. I'm cooling it with 10x TL-C12C-S (5 on each side of gpus perpendicularly). I finished building this very recently so I'm open to suggestions on how to improve it.

Edit: Added prompt processing to the table

submitted by /u/anitamaxwynnn69
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA