r/LocalLLaMA · May 30, 2026 · 1 min read

Has anyone experimented with stabilizing low quant models with lower temp and top p?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I was thinking about trying some bigger models out on my 80GB VRAM setup, but everything MoE is too slow with CPU offload. Otherwise there aren't many models that are purpose built for 80GB VRAM. Most of the bigger models require using a heavily quantized version. As I was looking at some benchmarks of same top p I realized there's something that can be done here but I haven't read anyone recently post about it. Playing with some LLM sampling visualization tools shows that it might be possible to reduce some wild outputs by reducing temp and top p. I'll be trying it this evening.

Tool example, not mine : https://artefact2.github.io/llm-sampling/index.xhtml

submitted by /u/fragment_me
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA