[3090] Gemma4 QAT + MTP quick TPS numbers [TLDR 1.2-1.8x better]
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| These last few weeks have been godsend for 24GB (and below) gpu poor peeps.
We're at the tipping point where GPU poor (24gb and below) people are actually NOT poor any more. I was already happy with Gemma 4 31b running at 40tok/s but now its 70-80tok/s Its not a wonder 3090 prices are increasing. For ref: • Hardware [link] [comments] |
More from r/LocalLLaMA
-
mtmd : add video input support by ngxson · Pull Request #24269 · ggml-org/llama.cpp
Jun 8
-
Gemma 4 Chat Template now has preserve thinking
Jun 8
-
what’s was your local daily driver for coding last week?
Jun 8
-
kv-cache : avoid kv cells copies by ggerganov · Pull Request #24277 · ggml-org/llama.cpp
Jun 8
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.