r/LocalLLaMA · · 1 min read

Qwen3.6 huge quality gain from Q4 to Q6 for coding agent

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

So, last week I tried to update my unused local LLM setup. I had to stop using it because quality was too low and deepseek was too cheap.

First thing I stopped using Ollama and now I only use llama.cpp built in server that works really great.

The quality improvement from Q4 to Q6 is outstanding and finally a local LLM server can work very similarly to paid APIs.

That's great! And MTP makes a big performance gain, on a dual 3090 (downvolted and limited to 65°C) it generates from 20 to 50 tokens per second with minimal heat generation.

So yes, that time has finally arrived! Local coding agents are a thing and they work 😎

submitted by /u/Yes-Scale-9723
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA