Diffusion Gemma is 4x faster, but makes 6x more mistakes!
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Benchmarked the new Gemma diffusion model against its autoregressive twin on a single H100 (FP8). We gave each the same three tasks: write a Steve Jobs biography, the history of Tetris, and the story of BeOS - every next topic less popular than the previous one. Then we fact-checked every claim in every answer. Gemma4 got 45 facts right, 5 wrong. DiffusionGemma got 33 right, 28 wrong. The less popular the topic, the worse it got: 4 mistakes on Jobs, 12 on Tetris, 12 on BeOS. It named Clara Clley as Steve Jobs' mother, invented a colleague for Pajitnov named Geri Gulovik and priced the BeBox at $9,999. The real one cost $1,600. Outputs: The reason is simple. DiffusionGemma throws 256 tokens on the screen at once and polishes them pass after pass until the text sounds smooth. Smooth is all it cares about: a fake name, date or number sounds just as smooth as a real one, so it stays. Regular Gemma4 meanwhile writes one word at a time and checks every new word against everything before it. Google says it themselves in the launch post: quality is lower, use regular Gemma 4 when facts matter. Open source Local Ai models harness: Atomic.Chat (I'm founder, we support GGUF models, MLX Apple Silicon, MTP and Google TurboQuant for long context window, working on Diffusion support via llama.cpp) [link] [comments] |
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.