r/LocalLLaMA · · 2 min read

Qwen3.6-27B UD Q3 with kv at q8 is quite amazing for simple proof of concepts

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Preface, technology is not my industry, but I am a very passionate poor man. So much so that I discovered 'AI' - ChatGPT in the beginning of 2025. So go easy on me, I only try.

I kind of understand MOE vs. Dense models, MOEs are much forgiving when it comes to running as there are only X amount of experts activated at inference, if i understand correctly, where in dense model every parameter is activated so depending on the model size the software pushes its hardware to whatever limits.

That being said, I have an Mi50 32 GB, in a T5610 with 64 GB DDR3 RAM and a 256 GB Sata SSD. That's all I could afford. In it, I ran Qwen 3.6 - 27B at Q3 kv at q8 - i got some usable speed at about 180+ tps for prompt processing and 9 tps for decoding/text gen. Sad, yes, but I wanted to see if it could help me create proof of concepts.

My industry is construction, there is literally no accounting software that was made for this, so I got pissed and went on an adventure 3 days ago. I have a SaaS in development for about 8 years, no VC, investors, or anyone, just me and my piss poor self and 2 engineers (home country super cheap), so what I do is create these POCs and have a meeting with my actual coders and they are able to then fabricate a solution that I like. Anyways. Q3 did whats in this github repo. You can bring it up with docker. Just make sure you change .env.example to just .env - Q3 isn't the best, nor would anyone recommend it, Q4 at minimum, but comparing to 35B MOE, I really liked it. I like sharing what I create. I think I am going to ask my team to improve this and keep this open source. There are too many contractors struggling with proper god damn accounting software and shit out there is expensive for no reason.

https://github.com/ikantkode/exaMath

Honestly, just wanted to share my sad story.

submitted by /u/exaknight21
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA