2-bit QAT model releases
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
So far model releases that take advantage of Quantization a
Aware Training (QAT) have been focused on 4-bit.
I’m curious what could be accomplished with a larger MoE model around 120b up to 400b. Obviously the model could not approach 8/16 bit performance, but perhaps this could be a better alternative to training a ternary LLM (1.58 bit) from scratch. At these sizes you could fit the model into consumer computers running 64/128 gb RAM and perhaps it could out perform a model at about half the size (80b/235b) at 4-bit precision.
I suspect the reason it wouldn’t be tried is tooling and coding might suffer too much. I’m thinking about it in the context of creative writing. In my experience 2-bit can still perform.
What do you think?
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.