Finding the 4x 3090 Sweet Spot
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| In another post I had someone ask me about the power draw of the 4x 3090 setup so I'm sharing a a full test I conducted to understand the efficiency curve. Used this blog post (not mine) as a reference. Setup:
Takeaways:
Hope this helps someone. Happy to answer any questions. I'm VERY satisfied with Qwen 3.6 27B as a daily driver, but I would still like to know if there are any better/bigger models I can run on this setup. My understanding is that the best I can do is DSv4 at Q2 - not sure if it's fully supported yet though. Additional context: it's an open build on a generic mining frame. I'm cooling it with 10x TL-C12C-S (5 on each side of gpus perpendicularly). I finished building this very recently so I'm open to suggestions on how to improve it. Edit: Added prompt processing to the table [link] [comments] |
More from r/LocalLLaMA
-
RAG on Snapdragon X2 Laptop, 200K documents.
May 15
-
Dynamically allocating compute budget to hard set of problems and evolving the sections with Qwen-35B-A3B gets you near GPT-5.4-xHigh on HLE
May 15
-
Orthrus-Qwen3-8B : up to 7.8×tokens/forward on Qwen3-8B, frozen backbone, provably identical output distribution
May 15
-
I built a self-hosted open-source MCP server that gives any local LLM real financial data — SEC filings, 13F, insider & congressional trades, short data, FRED
May 15
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.