r/LocalLLaMA · June 23, 2026 · 2 min read

Boogu Base, Turbo, Edit - open-source unified image generation and editing model series

#image-gen #open-source #funding #security

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Boogu Base, Turbo, Edit - open-source unified image generation and editing model series

Boogu-Image-0.1 is a competitive Apache-2.0 open-source unified image generation and editing model family, including Base, Turbo, Edit, and other variants that provide stable, practical capabilities for high-quality text-to-image generation, fast generation, image editing, and Chinese-English text rendering. Closed-source multimodal understanding and generation systems like Nano Banana Pro and GPT-Image-2 achieve remarkable performance not because of a single model, but through a highly unified suite of system capabilities. However, under training compute that is extremely limited compared with closed-source systems, we find that systematically improving a model's understanding ability, data quality, and training pipeline can still significantly improve image generation and editing performance. Specifically, compared with some existing open-source models, our training data scale is roughly one order of magnitude smaller. We hope our empirical study and open-source release will help advance the open-source ecosystem for multimodal generation and understanding.

📸 Photography with reliable text rendering — Boogu-Image-0.1-Turbo delivers realistic photography, while also offering solid performance on both simple and dense text rendering.
📝 Strong dense text rendering — Boogu-Image-0.1-Base shows competitive results on dense, layout-heavy text scenarios such as posters, documents, brand guides, and complex bilingual designs.
💡 Recommendation — When your workload is dominated by dense / ultra-dense text rendering needs, we recommend running Boogu-Image-0.1-Base at 2K output resolution for the best layout fidelity and character accuracy.
Boogu-Image-0.1-Base: Foundation model with strong diversity and controllability — ideal for fine-tuning and downstream development. Mainly intended for ultra-dense text rendering; for photorealism, Turbo is usually the better default.
Boogu-Image-0.1-Edit: Image editing and transformation variant.
Boogu-Image-0.1-Turbo: Distilled variant with the same parameter count, typically requiring only 3~4 steps. Focuses on high-quality generation and photorealism while preserving bilingual text rendering and prompt adherence.

Model size : 10B (12-80GB VRAM needed depends on config, check Model card for more info)

Models:

GitHub:

Misc:

https://huggingface.co/Comfy-Org/Boogu-Image

submitted by /u/pmttyji
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA