r/LocalLLaMA · · 1 min read

qwen3.6-35b-a3b-mtp running on GTX 1060 6GB

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I have this old 10-year old Dell T5810 workstation with 32GB ddr3(?) memory and a E5-2698v3 (16 cores 32 threads), a GTX 1060 6GB that's used for mining back in the old days (paid itself back many times over). I managed to get the model running with LMStudio in Windows(!). My settings are:

Model: unsloth qwen3.6-35B-a3b-MTP-GGUF UD Q4_K_XL

Ctx length:131072

GPU offload 41

CPU threadpool size 16

Max concurrent 4

Number of experts 8

Number of MOE layers offloaded to CPU 41

MTP max draft 3

KV quantization both Q4_0

prefill 16k about 130-150tps

decode 4k about 16tps

Very usable for chat.

submitted by /u/xxvegas
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA