r/LocalLLaMA · June 20, 2026 · 1 min read

Best Settings for 48GB VRAM + Qwen 3.6 27B

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Hey everyone, I've been running Qwen3.6 27B (Q8_0) across an RTX 4090 + RTX 3090 setup using llama.cpp with tensor split, and I wanted to share what's been working best for me so far. See if anyone has any better settings

Hardware: RTX 4090 (24GB) + RTX 3090 (24GB), 48GB VRAM total

OS Arch Linux (using igpu for display)

Settings:

Quant: Q8_0
Split mode: tensor
Layers on GPU: -ngl 999
Context: 250k (-c 250000)
Speculative decoding: --spec-type draft-mtp --spec-draft-n-max 4
parallel requests: -np 3
Unified KV cache: -kvu
Chat template: --chat-template-kwargs '{"preserve_thinking": true}'
Flags: --no-mmap -fa on --jinja -fit off --no-op-offload
Vision: mmproj-F16 with --no-mmproj-offload

This gives me 75-100t/s tg and 1500 pp 250k un quantized context + vision + MTP

submitted by /u/viperx7
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA