Gemma 4 MTP vs DFlash on 1x H100: dense vs MoE results
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Benchmarked Gemma 4 MTP and z-lab's DFlash on a single H100 80GB using vLLM and NVIDIA's SPEED-Bench qualitative dataset. Setup:
Results:
For a real deployment, try both approaches on your own setup and workload instead of assuming one will always be better. The results can change with the model, prompts, hardware, and serving configuration. Hope these numbers give people a useful reference point. All the benchmark setup and scripts used for benchmarking and to reproduce these results are in the Github repository. You can read about more results and in-depth analysis in our blog: https://jarvislabs.ai/blog/gemma-4-mtp-vs-dflash-benchmark [link] [comments] |
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.