r/LocalLLaMA · · 2 min read

New sampler + verifier *drastically* improves tiny 0.5b model coding performance

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I read it with a little bit of effort

The tiny model result is insane, theoretically this could make make a 0.5b on-par with a 2/3/4b ish class model in coding with no weights change*. And for large models it could maybe fix let's say 30-50% hallucination problems (educated guesstimate here)

Don't expect this to ever come to vLLM or SGLang, but llama.cpp could integrate this easily* like `--top-n-sigma`.

*Now there's this one... small... okay big catch: Aside from this being a backtrack sampler so that's an automatic 5-30% decode speed hit because the model has to go back and re-generate if it fucks up... You also need to train a small verifier model... and by small I mean roughly the same size as the original model. So it doubles VRAM requirements, more than doubles mem bandwidth and increases compute requirement somewhere in the range of 1.5-3x. Sorry not sorry research is still cool though. More importantly, this is proof that a better backtrack sampler (like this one) can actually fix a lot of LLM's issues, and two more papers down the line we could have VGB but fast as fuck. That or the AI labs will find a way around the limitations in the paper, and co-train a smaller verifier along with the model.

Two small saving graces are:
1. The verifier model generalises across weight class OR LOWER. So a verifier for a 30B model will work on any 30B model OR LOWER as long as it saw same distribution of diversity (ie. domains, so if it saw math it will generalise on math, but not if it didn't see wikipedia it won't generalise on it) in data
2. It costs almost nothing compared to full pre-training to train the verifier. You just take the original model and train it using special training data (which already exists like that PMK one) equivalent to ~0.01% of pre-training token size

submitted by /u/Dany0
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA