r/LocalLLaMA · · 1 min read

2-bit QAT model releases

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

So far model releases that take advantage of Quantization a
Aware Training (QAT) have been focused on 4-bit.

I’m curious what could be accomplished with a larger MoE model around 120b up to 400b. Obviously the model could not approach 8/16 bit performance, but perhaps this could be a better alternative to training a ternary LLM (1.58 bit) from scratch. At these sizes you could fit the model into consumer computers running 64/128 gb RAM and perhaps it could out perform a model at about half the size (80b/235b) at 4-bit precision.

I suspect the reason it wouldn’t be tried is tooling and coding might suffer too much. I’m thinking about it in the context of creative writing. In my experience 2-bit can still perform.

What do you think?

submitted by /u/silenceimpaired
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA