r/LocalLLaMA · · 1 min read

DeepMind Just Dropped "DiffusionGemma" — Text Generation via Image-Style Diffusion Model

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

DeepMind Just Dropped "DiffusionGemma" — Text Generation via Image-Style Diffusion Model

Another open weight model got dropped today, this one's from DeepMind, seems like a good day for the OSS geeks.

Released under Apache 2.0

Instead of generating text sequentially token-by-token like almost every autoregressive model on the market, it uses a text diffusion head.

- Throws a 256-token "canvas" of random placeholder noise onto the screen.

- Uses Uniform State Diffusion to iteratively refine and denoise the entire block of text all at once.

- Because every token can attend to every other token

- It even features Error Correction via Re-Noising, meaning if its confidence drops mid-generation, it introduces noise to self-correct its own mistakes in real-time.

- Because it processes entire blocks at once, it shifts the local inference bottleneck away from memory bandwidth and onto raw compute. (1,000+ tokens per second on a single NVIDIA H100. 700+ tokens per second locally on an RTX 5090.)

Hardware footprint

It’s a 26B Mixture of Experts (MoE) built on Gemma 4 architecture, but it only activates 3.8B parameters during inference. When quantized, it comfortably fits inside an 18GB VRAM footprint, making it incredibly accessible for local PC workflows.

It's already live on Hugging Face and has native day-zero integration with vLLM, Unsloth (for fine-tuning), and Hugging Face Transformers.

submitted by /u/beasthunterr69
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA