qwen3.6-35b-a3b-mtp running on GTX 1060 6GB
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I have this old 10-year old Dell T5810 workstation with 32GB ddr3(?) memory and a E5-2698v3 (16 cores 32 threads), a GTX 1060 6GB that's used for mining back in the old days (paid itself back many times over). I managed to get the model running with LMStudio in Windows(!). My settings are:
Model: unsloth qwen3.6-35B-a3b-MTP-GGUF UD Q4_K_XL
Ctx length:131072
GPU offload 41
CPU threadpool size 16
Max concurrent 4
Number of experts 8
Number of MOE layers offloaded to CPU 41
MTP max draft 3
KV quantization both Q4_0
prefill 16k about 130-150tps
decode 4k about 16tps
Very usable for chat.
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.