Looking for Suggestions — Single 5090 & 64gb DDR5
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Hi Reddit,
I am planning on running Qwen 3.6 27b NVFP4 via vLLM on my 5090 but was wondering if something like 35b a3b at Q8 on Llama would produce better results for agentic coding and utilize the system memory. My research says no but if that’s the case what would yall do to utilize the system memory?
[link] [comments]
More from r/LocalLLaMA
-
Folks running qwen 3.6 27b for agentic work. Do you dare to use q4_k_m?
May 27
-
Stop traumatizing AI into loops and turn hallucinations into an honest "I don't know!" by being NICE to them (Proof of Concept, Research, I don't want to sell anything)
May 27
-
Single 3090 with Q4 Qwen 27B, context dropped from 137k to 14k with MTP enabled. Is it normal?
May 27
-
Cactus Hybrid Router: Gemma4-2B can match Gemini-3.1-Flash-Lite by routing 15-55% of tasks to Gemini And Running The Rest Locally.
May 26
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.