r/LocalLLaMA · · 2 min read

Boogu Base, Turbo, Edit - open-source unified image generation and editing model series

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Boogu Base, Turbo, Edit - open-source unified image generation and editing model series

Boogu-Image-0.1 is a competitive Apache-2.0 open-source unified image generation and editing model family, including Base, Turbo, Edit, and other variants that provide stable, practical capabilities for high-quality text-to-image generation, fast generation, image editing, and Chinese-English text rendering. Closed-source multimodal understanding and generation systems like Nano Banana Pro and GPT-Image-2 achieve remarkable performance not because of a single model, but through a highly unified suite of system capabilities. However, under training compute that is extremely limited compared with closed-source systems, we find that systematically improving a model's understanding ability, data quality, and training pipeline can still significantly improve image generation and editing performance. Specifically, compared with some existing open-source models, our training data scale is roughly one order of magnitude smaller. We hope our empirical study and open-source release will help advance the open-source ecosystem for multimodal generation and understanding.

  • 📸 Photography with reliable text rendering — Boogu-Image-0.1-Turbo delivers realistic photography, while also offering solid performance on both simple and dense text rendering.
  • 📝 Strong dense text rendering — Boogu-Image-0.1-Base shows competitive results on dense, layout-heavy text scenarios such as posters, documents, brand guides, and complex bilingual designs.
  • 💡 Recommendation — When your workload is dominated by dense / ultra-dense text rendering needs, we recommend running Boogu-Image-0.1-Base at 2K output resolution for the best layout fidelity and character accuracy.

  • Boogu-Image-0.1-Base: Foundation model with strong diversity and controllability — ideal for fine-tuning and downstream development. Mainly intended for ultra-dense text rendering; for photorealism, Turbo is usually the better default.

  • Boogu-Image-0.1-Edit: Image editing and transformation variant.

  • Boogu-Image-0.1-Turbo: Distilled variant with the same parameter count, typically requiring only 3~4 steps. Focuses on high-quality generation and photorealism while preserving bilingual text rendering and prompt adherence.

Model size : 10B (12-80GB VRAM needed depends on config, check Model card for more info)

Models:

GitHub:

Misc:

submitted by /u/pmttyji
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA