News / #image-gen Tag Image Gen 96 articles archived under #image-gen · RSS Sign in to follow arXiv — Machine Learning research 1mo ago Moment Matching Q-Learning arXiv:2605.29033v1 Announce Type: new Abstract: Score-based and flow-based generative models exhibit remarkable expressive capacity in capturing complex distributions, and have been extensively deployed in tasks ranging from image generation to reinforcement learning.… 31 Hugging Face Daily Papers research 1mo ago GenClaw: Code-Driven Agentic Image Generation Abstract GenClaw presents a code-driven agentic image generation framework that enables precise visual construction through conceptualization, sketching, and coloring stages, integrating programmatic logic with generative models. AI-generated summary Image generation models have… 8 r/LocalLLaMA community 1mo ago Qwen/Qwen-Image-Bench · Hugging Face Model Description Q-Judger is a vision-language model fine-tuned specifically for automated evaluation of text-to-image generated images. Given a text prompt and a generated image, the model evaluates the image on fine-grained quality criteria organized in a 3-level hierarchy… 8 arXiv — NLP / Computation & Language research 1mo ago ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment arXiv:2605.27374v1 Announce Type: new Abstract: Recent advances in multimodal large language models (MLLMs) and diffusion models (DMs) have opened new possibilities for AI-generated content. Yet, personalized cover image generation remains underexplored, despite its critical… 26 arXiv — NLP / Computation & Language research 1mo ago PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI arXiv:2605.27545v1 Announce Type: new Abstract: Jailbreak attacks on multimodal AI systems remain underexplored, even though unsafe image generation can have more severe consequences than unsafe text and current defenses are relatively immature. We introduce PAST2HARM, a simple… 38 Hugging Face Daily Papers research 1mo ago MRT: Masked Region Transformer for Layered Image Generation and Editing at Scale Abstract A 20B-parameter masked region diffusion model enables scalable multi-layer transparent image generation and editing through unified task handling and efficient canvas management. AI-generated summary Layered image generation and editing is a fundamental capability that… 21 arXiv — Machine Learning research 1mo ago Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models arXiv:2605.26491v1 Announce Type: new Abstract: Preference optimization has emerged as an efficient alternative to online reinforcement learning from human feedback (RLHF) for aligning text-to-image diffusion models. However, existing methods largely reduce supervision to binary… 10 arXiv — Machine Learning research 1mo ago On the Error-Correcting Effects of Stochasticity in Discrete Diffusion arXiv:2605.26582v1 Announce Type: new Abstract: Discrete diffusion models achieve strong performance in text and image generation, but their inference remains slow and must inherently balance sampling efficiency and sample quality. In this work, we present a systematic study of… 8 arXiv — Machine Learning research 1mo ago RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models arXiv:2605.26632v1 Announce Type: new Abstract: Diffusion Transformers (DiT) achieve strong performance in image generation but incur substantial inference costs. While prior work has reduced this cost via quantization and distillation, semi-structured sparsity, which can nearly… 27 Hugging Face Daily Papers research 1mo ago Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation Abstract A novel approach conditions diffusion models on multimodal large language models for subject-driven image generation, combining text and reference image encoding with VAE-based identity conditioning to improve both semantic understanding and identity preservation.… 7 Hugging Face Daily Papers research 1mo ago RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models Abstract Diffusion Transformers achieve strong image generation performance but face high inference costs; this work proposes RT-Lynx, which uses activation sparsification and optimized CUDA kernels to accelerate inference while maintaining generation quality. AI-generated… 27 r/LocalLLaMA community 1mo ago PrismML just released Binary and Ternary Bonsai Image 4B: 1-bit/ternary text-to-image diffusion transformers that can even run 100% locally in your browser on WebGPU. The PrismML team really cooked with these models. They're only ~3GB in size (compared to FLUX.2 Klein 4B, which is ~16GB). Apache-2.0! Official collection on HF: https://huggingface.co/collections/prism-ml/bonsai-image Link to demo:… 11 Hugging Face Daily Papers research 1mo ago Injecting Image Guidance into Text-Conditioned Diffusion Models at Inference Abstract Visual Concept Fusion enables dual text and image conditioning in diffusion models through feature alignment and fusion strategies without requiring retraining. AI-generated summary Text-to-image diffusion models like Stable Diffusion generate high-quality images from… 35 Hugging Face Daily Papers research 1mo ago Reinforcing Few-step Generators via Reward-Tilted Distribution Matching Abstract RTDMD is a two-stage framework that combines distribution matching distillation with reward-guided reinforcement learning to improve few-step image generation alignment with human preferences. AI-generated summary Recent advances in few-step diffusion distillation have… 30 Hugging Face Daily Papers research 1mo ago Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Abstract Lens is a compact 3.8B-parameter text-to-image model achieving superior performance with reduced training compute through dense caption datasets, multi-resolution batching, efficient architecture, and optimization techniques. AI-generated summary We introduce Lens, a… 19 Hugging Face Daily Papers research 1mo ago ETCHR: Editing To Clarify and Harness Reasoning Abstract A novel image editing approach called ETCHR is introduced that decouples visual reasoning from image generation, improving multimodal language model performance across multiple visual reasoning tasks through a two-stage training process. AI-generated summary Multimodal… 7 Hugging Face Daily Papers research 1mo ago AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment Abstract AutoRubric-T2I automatically generates and selects explicit rubrics to guide Vision-Language Model judges for text-to-image generation, achieving high-quality reward signals with minimal human annotation while improving generation quality in downstream tasks.… 36 Hugging Face Daily Papers research 1mo ago RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution Abstract Discrete autoregressive text-to-image models suffer from latent covariate shift during policy optimization, which RankE addresses through end-to-end co-evolution of policy and decoder components. AI-generated summary Discrete autoregressive (AR) text-to-image (T2I)… 9 Hugging Face Daily Papers research 1mo ago SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers Abstract SEGA improves high-resolution text-to-image generation by adaptively scaling attention across RoPE components based on spatial-frequency structure during denoising steps. AI-generated summary Diffusion transformers (DiTs) have emerged as a dominant architecture for… 33 Hugging Face Daily Papers research 1mo ago GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation Abstract A self-evolving image generation framework uses tool-orchestrated trajectories and visual experience distillation to improve generative capabilities through iterative learning and reference-based prompting. AI-generated summary Open-ended image generation is no longer a… 19 Smol AI News news-outlet 1mo ago not much happened today **RAEv2** advances representation-first tokenization with **>10x faster convergence** and improved generation, tested on **text-to-image** and **world models**. **NVIDIA's Gated DeltaNet-2** innovates linear attention with channel-wise gates, outperforming **KDA** and… 23 Hugging Face Daily Papers research 1mo ago OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation Abstract OcclusionFormer addresses inter-object occlusion challenges in layout-to-image generation by modeling explicit Z-order priority through diffusion transformers and volume rendering techniques. AI-generated summary Recent layout-to-image models have achieved remarkable… 36 Hugging Face Daily Papers research 1mo ago PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset Abstract A large-scale UHR image-text dataset and evaluation benchmark are introduced to advance ultra-high-resolution text-to-image generation capabilities. AI-generated summary Text-to-Image (T2I) models have recently seen notable progress around 1K and 2K resolution. With the… 29 r/LocalLLaMA community 1mo ago bytedance released an open source model that attempts to do just about anything with only 3b parameters Lance is a lightweight native unified multimodal model that supports image and video understanding, generation, and editing within a single framework. Efficient at 3B scale. With only 3B active parameters , Lance delivers strong performance across image generation, image… 32 arXiv — Machine Learning research 1mo ago Systematic Optimization of Real-Time Diffusion Model Inference on Apple M3 Ultra arXiv:2605.16259v1 Announce Type: new Abstract: While real-time image generation using diffusion models has advanced rapidly on NVIDIA GPUs, systematic optimization research on non-CUDA platforms such as Apple Silicon remains extremely limited. In this study, we conducted… 32 Hugging Face Daily Papers research 1mo ago Efficient Image Synthesis with Sphere Latent Encoder Abstract A decoupled framework for few-step image generation that improves efficiency and performance by separating pixel-space operations from latent denoising training. AI-generated summary Few-step image generation has seen rapid progress, with consistency and meanflow-based… 15 Hugging Face Daily Papers research 1mo ago InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation Abstract InsightTok improves discrete visual tokenization for better text and face reconstruction through content-aware perceptual losses, enhancing autoregressive image generation quality. AI-generated summary Text and faces are among the most perceptually salient and… 12 Hugging Face Daily Papers research 1mo ago Aligning Latent Geometry for Spherical Flow Matching in Image Generation Abstract Geodesic flow matching improves image generation by projecting latents onto fixed radius spheres and using spherical linear interpolation instead of linear paths, preserving semantic content through angular components. AI-generated summary Latent flow matching for image… 26 Hugging Face Daily Papers research 1mo ago Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning Abstract Realiz3D addresses the domain gap between synthetic renders and real images in 3D-consistent image generation by decoupling visual domain from control signals through residual adapters and layer-specific denoising strategies. AI-generated summary We often aim to… 19 Hugging Face Daily Papers research 1mo ago Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning Abstract A closed-loop visual reasoning framework integrates visual-language planning with diffusion generation to improve complex image synthesis while addressing latency and optimization challenges. AI-generated summary Despite rapid advancements, current text-to-image (T2I)… 14 Hugging Face Daily Papers research 1mo ago Does Synthetic Layered Design Data Benefit Layered Design Decomposition? Abstract Synthetic layered image data improves graphic design decomposition by enabling scalable training and better layer distribution control compared to traditional methods. AI-generated summary Recent advances in image generation have made it easy to produce high-quality… 35 r/LocalLLaMA community 1mo ago Built an open-source one-prompt-to-cinematic-reel pipeline on a single GPU — FLUX.2 [klein] for character keyframes, Wan2.2-I2V for animation, vision critic with auto-retry, music + 9-language narration in the same pipeline Shipped this for the AMD x lablab hackathon. Attached video is one of the actual reels the pipeline produced - one English sentence in, finished mp4 with characters, story, music, and voice-over out (fast demo video, not the best quality). ~45 minutes end-to-end on a single AMD… 13 Hugging Face Daily Papers research 1mo ago Asymmetric Flow Models Abstract Asymmetric Flow Modeling enables efficient high-dimensional flow-based generation by restricting noise prediction to low-rank subspaces while maintaining full-dimensional data prediction, achieving superior performance in pixel-space text-to-image generation through… 12 Hugging Face Daily Papers research 1mo ago Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation Abstract INSET is a unified multimodal model that embeds images as native vocabulary within textual instructions, enabling better handling of complex interleaved inputs through transformer-based contextual locality and supporting both image generation and editing tasks.… 34 r/MachineLearning community 1mo ago Image generation models running locally on limited resources [P] I have a project consisting of generating high quality free ebook covers out of its content. On my 16GB of ram machine with no gpu, i have tested the opensourced stable diffusion models without any success. All return bad quality covers with blurred faces and scenes that do not… 6 arXiv — Machine Learning research 1mo ago Efficient Adjoint Matching for Fine-tuning Diffusion Models arXiv:2605.11480v1 Announce Type: new Abstract: Reward fine-tuning has become a common approach for aligning pretrained diffusion and flow models with human preferences in text-to-image generation. Among reward-gradient-based methods, Adjoint Matching (AM) provides a principled… 30 OpenAI news 2mo ago Introducing ChatGPT Images 2.0 ChatGPT Images 2.0 introduces a state-of-the-art image generation model with improved text rendering, multilingual support, and advanced visual reasoning. 9 Smol AI News news-outlet 2mo ago GPT-Image-2 **OpenAI** launched **GPT-Image-2**, enhancing image generation with improved text rendering, layout fidelity, editing, multilingual support, and "thinking" capabilities. It supports generating slides, infographics, diagrams, UI mockups, and QR codes, and integrates with tools… 36 OpenAI news 2mo ago Codex for (almost) everything The updated Codex app for macOS and Windows adds computer use, in-app browsing, image generation, memory, and plugins to accelerate developer workflows. 5 Hugging Face official-blog 3mo ago PRX Part 3 — Training a Text-to-Image Model in 24h! Back to Articles PRX Part 3 — Training a Text-to-Image Model in 24h! Team Article Published March 3, 2026 Upvote 64 David Bertoin Bertoin Photoroom Roman Frigg photoroman Photoroom Jon Almazán jon-almazan Photoroom Introduction Welcome back 👋 In the last two posts ( Part 1 and… 23 Smol AI News news-outlet 4mo ago Nano Banana 2 aka Gemini 3.1 Flash Image Preview: the new SOTA Imagegen model **Google and DeepMind** launched **Nano Banana 2** (aka **Gemini 3.1 Flash Image Preview**), a leading image generation and editing model integrated across multiple Google products with features like **4K upscaling**, **multi-subject consistency**, and **real-time… 29 Hugging Face official-blog 4mo ago Training Design for Text-to-Image Models: Lessons from Ablations Back to Articles Training Design for Text-to-Image Models: Lessons from Ablations Team Article Published February 3, 2026 Upvote 73 David Bertoin Bertoin Photoroom Roman Frigg photoroman Photoroom Jon Almazán jon-almazan Photoroom Welcome back! This is the second part of our… 13 Hugging Face official-blog 7mo ago Diffusers welcomes FLUX-2 Back to Articles Welcome FLUX.2 - BFL’s new open image generation model 🤗 Published November 25, 2025 Update on GitHub Upvote 190 YiYi Xu YiYiXu Daniel Gu dg845 Sayak Paul sayakpaul Alvaro Somoza OzzyGT Dhruv Nair dn6 Aritra Roy Gosthipaty ariG23498 Linoy Tsaban linoyts… 12 Google DeepMind official-blog 7mo ago Build with Nano Banana Pro, our Gemini 3 Pro Image model Build with Nano Banana Pro, our Gemini 3 Pro Image model Share x.com Facebook LinkedIn Mail Here’s how developers can use Nano Banana Pro (Gemini 3 Pro Image), a powerful new image generation and editing model with advanced features and creative control. Alisa Fortin Product… 10 Google DeepMind official-blog 15mo ago Experiment with Gemini 2.0 Flash native image generation Native image output is available in Gemini 2.0 Flash for developers to experiment with in Google AI Studio and the Gemini API. 5 Eugene Yan research 43mo ago Text-to-Image: Diffusion, Text Conditioning, Guidance, Latent Space The fundamentals of text-to-image generation, relevant papers, and experimenting with DDPM. 35 Page 2 of 2 · 96 articles ← Newer