r/LocalLLaMA · June 13, 2026 · 1 min read

Best models in 3x3090 (72GB VRAM) in Q2 2026?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Sometime around the beginning of the year I setup my LLM computer — 3x3090 in a very old DDR4 computer, so I only use the 72GB VRAM to load the models (for speed)

I’ve been mostly using these three models: - GPT-OSS 120b still pretty sold - Qwen3.5 122b very (very!!) good for one shot coding but extremely over thinking in my opinion - GLM Air 4.5 106B in non-think by default which I use a lot for quick replies

Occasionally I also use: - Gemma 4 31B or Qwen3.6 27B as they are quick to load and offload, and sometimes I need to use a video card for other tasks — I keep the LLM in 2x3090 and 1x3090 for audio-image stuff. Because they also fit nicely in 48GB in Q8 I do trust them over the bigger models in some instances.

Honorables mentions I stopped using without any valid reason: - Nematron Nano Omni 30B A3B is very good, but I just never use it because I default to the big ones for most general tasks - Devstral Small 2 24B used to be my favorite before Qwen 27B completely replaced it for me as my go-to dev focused LLM, mixed with the big Qwen 122B for “architectural” decision

Is there anything newer or better that would fit in 72GB?

submitted by /u/liviuberechet
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA