Does it make sense to use alternative quantizations of QAT models? [D]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
From TF's website:
Quantization aware training emulates inference-time quantization, creating a model that downstream tools will use to produce actually quantized models.
So is it designed to work with a very specific quantization method (for Gemma-4, presumably, Google's own)? Or would it make sense to use alternative quantization methods?
According to the benchmarks unsloth released, its (alternative) quantizations of Gemma-4-QAT are closer to the QAT fine-tunes, but is it a good thing, or does it defeat the purpose of QAT?
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.