r/LocalLLaMA · · 1 min read

Quality evaluation of quants with limited time or tokens

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

About a year ago, people were publishing a lot of benchmarks about various quants of models. I understand that it is not really feasible with the current (and other welcome) frequent releases of new models, but on the other side, it may be still useful to know locally whether q3 of this model is better than q6 of that model.

I've checked a few benchmarks, but it seems they are versatile, and the models may generate millions of tokens, which, with a 300b+ moe model on a home setup of 10-20 t/s seems to be not feasible to benchmark. I'd rather have a benchmark where I could limit the focus to the tasks that provide the most predictive power (e.g. tasks that may pass on q6 but may fail on q5).

Of course there is always the DIY approach, but I am wondering if people have already tackled this problem somehow. I'd even settle if there were an automatic way to describe that q5 is roughly 95.56% of q8, or something along those lines.

submitted by /u/isoos
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA