Qwen3.6 huge quality gain from Q4 to Q6 for coding agent
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
So, last week I tried to update my unused local LLM setup. I had to stop using it because quality was too low and deepseek was too cheap.
First thing I stopped using Ollama and now I only use llama.cpp built in server that works really great.
The quality improvement from Q4 to Q6 is outstanding and finally a local LLM server can work very similarly to paid APIs.
That's great! And MTP makes a big performance gain, on a dual 3090 (downvolted and limited to 65°C) it generates from 20 to 50 tokens per second with minimal heat generation.
So yes, that time has finally arrived! Local coding agents are a thing and they work 😎
[link] [comments]
More from r/LocalLLaMA
-
Behold! Probably the most ghetto local AI server:
May 27
-
260K-param LLM running on an emulated 90s CPU inside an 18-year-old RTOS
May 27
-
Qwen3.6 35B-A3B successfully completed the FoodTruck Bench!
May 27
-
SWE-rebench Leaderboard (March, April and May 2026): GPT-5.5, Opus 4.7, Cursor (Composer 2.5), Kimi K2.6 and More
May 27
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.