r/LocalLLaMA · · 1 min read

Qwen 27b MTP Config, Llama.cpp Single 3090

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

What setup are you using for qwen 27b on a single 3090?

Here's what I've started using today. It has to compact often but I'm worried about giving up more accuracy and reliability with a lower quant:

llama-server -m /Models/q3.6/Qwen3.6-27B-Q5_K_S.gguf -c 65536 -ngl -1 -t 8 -ctk q8_0 -ctv q8_0 --chat-template-kwargs "{\"preserve_thinking\": true}" --spec-type draft-mtp --spec-draft-n-max 2 --fit off --mmproj /Models/q3.6/mmproj-Qwen3.6-27B-f16.gguf --no-mmproj-offload

I'm getting around 65tk/s.

I've also seen these recommendations: https://github.com/noonghunna/club-3090/blob/master/docs/SINGLE_CARD.md

They seem to be using the q4 quant. How are you weighing the tradeoffs?

submitted by /u/GotHereLateNameTaken
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA