r/LocalLLaMA · June 11, 2026 · 1 min read

nvidia/diffusiongemma-26B-A4B-it-NVFP4 · Hugging Face

#multimodal #open-source #gpu

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Like Read original ↗

nvidia/diffusiongemma-26B-A4B-it-NVFP4 · Hugging Face

Model Overview

Description:

DiffusionGemma 26B A4B IT is an open-weights multimodal generative model developed by Google DeepMind that processes text, image, and video inputs to produce text output via discrete diffusion. Built on the Gemma 4 26B A4B Mixture-of-Experts (MoE) architecture with 25.2B total parameters and 3.8B active parameters, the model employs an encoder-decoder design with bidirectional attention that generates tokens in parallel 256-token blocks, enabling high-speed generation exceeding 1,100 tokens per second at low batch sizes on NVIDIA Hopper H100 (FP8). DiffusionGemma 26B A4B IT supports a 256K token context window, configurable thinking (reasoning) mode, native function calling, and multilingual inference across 35+ languages. The NVIDIA DiffusionGemma 26B A4B IT NVFP4 model is quantized with Model Optimizer.

This model is ready for commercial and non-commercial use.

Use Case:

Use Case: DiffusionGemma 26B A4B IT is designed for developers, researchers, and enterprises requiring high-speed multimodal text generation. Supported use cases include conversational AI and chatbots, text summarization, code generation and step-by-step reasoning, image and document understanding (OCR, chart comprehension, PDF parsing, screen and UI parsing), video content analysis, agentic workflows with native function calling, and multilingual NLP tasks across 35+ languages.

submitted by /u/pmttyji
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Model Overview

Description:

Use Case:

Discussion (0)

More from r/LocalLLaMA