Best models in 3x3090 (72GB VRAM) in Q2 2026?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Sometime around the beginning of the year I setup my LLM computer — 3x3090 in a very old DDR4 computer, so I only use the 72GB VRAM to load the models (for speed)
I’ve been mostly using these three models: - GPT-OSS 120b still pretty sold - Qwen3.5 122b very (very!!) good for one shot coding but extremely over thinking in my opinion - GLM Air 4.5 106B in non-think by default which I use a lot for quick replies
Occasionally I also use: - Gemma 4 31B or Qwen3.6 27B as they are quick to load and offload, and sometimes I need to use a video card for other tasks — I keep the LLM in 2x3090 and 1x3090 for audio-image stuff. Because they also fit nicely in 48GB in Q8 I do trust them over the bigger models in some instances.
Honorables mentions I stopped using without any valid reason: - Nematron Nano Omni 30B A3B is very good, but I just never use it because I default to the big ones for most general tasks - Devstral Small 2 24B used to be my favorite before Qwen 27B completely replaced it for me as my go-to dev focused LLM, mixed with the big Qwen 122B for “architectural” decision
Is there anything newer or better that would fit in 72GB?
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.