Stop asking what model to run. There are literally only two.
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Can we please ban the daily "I have an RTX 3060, what should I run?" slop threads? It’s not complicated. As of right now, Hugging Face is empty and exactly two local models exist on this entire planet:
- Qwen 3.6 35b a3b
- Qwen 3.6 27b
That is the entire list. Your specs don’t matter. Your use case doesn’t matter.
Stop coping with your pristine, full-precision Q8s of tiny 1B models just because they "fit perfectly in your VRAM." You look ridiculous. Grab a heavily brain-damaged, ultra-low quant of the 35B, force-feed it to your GPU, and let your system RAM bleed. A garbage quant of a massive model is a bagillion times better than your precious micro-models anyway. Just cram it in.
And if you're going to whine that open source is dead because a local model won't instantly rewrite your entire enterprise codebase? Fine. Give up, pull out your credit card, and go spend your money on Claude Code like the rest of the contrarians.
Can we pin this so everyone can finally shut up and stop posting? Thanks.
Now, that has been solved lets go touch grass.
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.