How many of you do use Q1 or Q2 of Big models(100-250B)? How's it?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Sharing popular(also recent) models for reference:
151-250B :
- DeepSeek-V4-Flash
- Step-3.X-Flash
- Command-a-plus-05-2026
- Laguna-M.1
- MiniMax-M2.X
- Qwen3-235B-A22B
100-150B :
- GLM-4.5-Air
- Qwen3.5-122B-A10B
- NVIDIA-Nemotron-3-Super-120B-A12B
- Mistral-Small-4-119B-2603
- Devstral-2-123B-Instruct-2512
- Mistral-Medium-3.5-128B
- Llama-4-Scout-17B-16E-Instruct (Yay! got your attention)
<100B :
- Llama-3.3-70B-Instruct
- Qwen3-Coder-Next
- Qwen3-Next-80B-A3B
I see that some people do use Q3(even up to IQ3_XXS) whenever they couldn't run Q4 on their rig. Ex: Noticed that some DGX/SH users do use Q3 of MiniMax-M2 models as Q4 is so tight.
I guess Q1/Q2 won't be good for small/medium size models(~40B size) .... Talking about Agentic coding level. Chatting would be semi-usable quality-wise I think, though I'm not sure.
But I believe it's totally opposite for Big/Large models due to bigger size of the models. So how many of you do use Q1 or Q2 of Big models(100-250B)? How's it & are those enough for you now? Please share your feedback on both Agentic coding, Writing & Chatting stuffs with such quants of those above models. Also please let us know what issues are you facing with Q1/Q2 quants? Ex: Looping issues, Repetition issues, Tool calling issues, etc.,
Personally I don't go below Q4 of small/medium models even though I have only 8GB VRAM on my current laptop. My upcoming rig comes with 96GB VRAM + 128GB RAM so posted this thread. Thought of trying Q1/Q2 of models like NVIDIA-Nemotron-3-Ultra-550B-A55B, GLM-5.X, etc.,
[link] [comments]
More from r/LocalLLaMA
-
Been running Qwen3.6-27B through a 3-critic harness. The harness matters more than I thought
Jun 30
-
I Hate Dario Amodei, and everything he stands for.
Jun 29
-
Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.
Jun 29
-
Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.