News / #image-gen Tag Image Gen 96 articles archived under #image-gen · RSS Sign in to follow arXiv — Machine Learning research 1h ago Can AI Draw Science? A Benchmark for Evaluating Scientific Figure Generation by Text-to-Image and Multimodal Models arXiv:2606.28406v1 Announce Type: new Abstract: Text-to-image and multimodal generative models are increasingly used to produce scientific figures such as mechanism diagrams, experimental-design schematics, conceptual frameworks, and graphical abstracts. Yet existing… 36 TechCrunch — AI news-outlet 9h ago Gemini’s personalized AI image generation is now free for US users Google is expanding Gemini’s personalized AI image generation to eligible free users in the U.S., allowing the chatbot to create images based on your interests and data from connected Google apps. 29 r/LocalLLaMA community 2d ago clark-labs/clark-air-sana-1.6b-1.58bit · Hugging Face A Sana 1.6B text-to-image transformer compressed to ternary (~1.85 bits/weight): 8.6× smaller than FP16, near-FP16 quality. Footprint (measured) Artifact Size vs FP16 What it is FP16 transformer 3.21 GB 1× (100%) reference Clark Air (packed) 374 MB 8.6× (≈12%) packed ternary (… 36 Hugging Face Daily Papers research 3d ago Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation Abstract A unified agentic framework called Qwen-Image-Agent is proposed to address the context gap in text-to-image generation by progressively constructing complete generation context through planning, reasoning, searching, and memory mechanisms. Generated by… 22 arXiv — NLP / Computation & Language research 4d ago DanceOPD: On-Policy Generative Field Distillation arXiv:2606.27377v1 Announce Type: cross Abstract: Modern image generation demands a single model that unifies diverse capabilities, including text-to-image (T2I), local editing, and global editing. However, these capabilities are rarely naturally aligned and often conflict. For… 18 Hugging Face Daily Papers research 4d ago DanceOPD: On-Policy Generative Field Distillation Abstract A novel on-policy generative field distillation framework called DanceOPD is proposed to unify text-to-image generation, local editing, and global editing capabilities in flow-matching models through capability-specific routing and velocity-based training. Generated by… 10 Hugging Face Daily Papers research 5d ago IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation Abstract Implicit Visual Chain-of-Thought decomposes visual conditioning into structural and semantic cascades for improved structure-aware image generation with sketch supervision. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Unified multi-modal large language models (MLLMs)… 7 r/LocalLLaMA community 5d ago SDXL running locally in the browser on WebGPU, open-source I needed simple local image generation without the usual setup. No virtual environments, no ComfyUI with a complex graph and installation as an exe. So i tried to push the whole thing into the browser and run it on WebGPU. It's a browser extension. You install it, then it loads… 13 Hugging Face Daily Papers research 5d ago Semantic Browsing: Controllable Diversity for Image Generation Abstract Text-to-image models are enhanced with controlled diversity through semantic browsing capabilities that enable structured navigation of image variations based on meaningful semantic decisions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Modern text-to-image models… 4 Hugging Face Daily Papers research 5d ago FLUX3D: High-Fidelity 3D Gaussian Generation with Diffusion-Aligned Sparse Representation Abstract FLUX3D addresses limitations in image-to-3D Gaussian Splatting generation by improving representation learning and cross-modal alignment through specialized architectures and attention mechanisms. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Sparse voxel representation… 34 arXiv — Machine Learning research 6d ago Information-Theoretic Classifier-Free Guidance with Adaptive Schedule Optimization arXiv:2606.24025v1 Announce Type: new Abstract: Diffusion models have achieved strong performance in image, text-to-image, and video generation, where conditional generation is often controlled by classifier-free guidance (CFG). CFG improves condition consistency by increasing a… 35 Hugging Face Daily Papers research 6d ago Are Text-to-Image Models Inductivist Turkeys? A Counterfactual Benchmark for Causal Reasoning Abstract Text-to-image models fail to generate counterfactual scenes because they rely on tightly coupled visual-textual patterns rather than causal reasoning, demonstrating limited understanding beyond pattern matching. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Text-to-image… 26 Hugging Face Daily Papers research 7d ago Safe Few-Step Generation via Velocity Editing Abstract VESFlow is a training-free safety method for flow matching-based text-to-image generation that edits velocity fields to ensure safe output while maintaining prompt integrity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Flow matching has recently emerged as a strong… 16 r/LocalLLaMA community 7d ago Boogu Base, Turbo, Edit - open-source unified image generation and editing model series Boogu-Image-0.1 is a competitive Apache-2.0 open-source unified image generation and editing model family , including Base , Turbo , Edit , and other variants that provide stable, practical capabilities for high-quality text-to-image generation, fast generation, image editing,… 22 Hugging Face Daily Papers research 7d ago Exploring the Design Space of Reward Backpropagation for Flow Matching Abstract FlowBP addresses limitations in flow matching model alignment by using a surrogate trajectory framework that reduces memory usage and gradient chaining while maintaining performance across multiple text-to-image models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 23 Hugging Face Daily Papers research 8d ago BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation Abstract A 3D brain MRI generative model uses a masked-autoencoder tokenizer to create clinically informative embeddings that support both medical task performance and controlled image generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Three-dimensional (3D) brain MRI is… 6 r/LocalLLaMA community 8d ago Local text to image model comparaison: The ultimate test. I selected 192 prompts to evaluate text-to-image model various capabilities and generated images for all the local models I was able to make work on my GX10 Spark. For instance: Is the model good at text? At faces? At human anatomy? At respecting spatial composition, etc...? You… 4 r/MachineLearning community 9d ago Studying FLUX in diffusers library was hard, so I built a smaller open-source version [P] If you've tried to study modern diffusion models by digging through the official diffusers library, you know it can be overwhelming with its complexity and abstractions. I wanted to simplify FLUX diffusion models, so I built minFLUX : a PyTorch implementation focused on its core… 38 Hugging Face Daily Papers research 10d ago The FID Lottery: Quantifying Hidden Randomness in Generative-Model Evaluation Abstract Analysis of FID variance across different training and sampling seeds reveals significant reproducibility issues in image generation evaluation, with retraining causing larger fluctuations than resampling, and recommends updated evaluation protocols with error bars and… 21 arXiv — NLP / Computation & Language research 11d ago NAMESAKES: Probing Identity Memorization in Text-to-Image Models arXiv:2606.20155v1 Announce Type: cross Abstract: Text-to-image (T2I) models generate realistic likenesses of some individuals when prompted with their names, raising privacy concerns. However, distinguishing whether a generated face is memorized or fabricated currently requires… 10 Latent.Space news-outlet 12d ago [AINews] Midjourney Medical: scan your organs like you step on a scale The only bootstrapped frontier lab announces its second product and second 12 Hacker News — AI on Front Page community 12d ago Midjourney Medical https://www.midjourney.com/medical Video: https://x.com/midjourney/status/2067422898407837797 Comments URL: https://news.ycombinator.com/item?id=48579650 Points: 228 # Comments: 203 10 Smol AI News news-outlet 12d ago Midjourney Medical: scan your organs like you step on a scale **Midjourney** unveiled a new **medical imaging/scanning system** called the **Midjourney Scanner**, described as **radiation-free, magnet-free, fast, and low-cost**, but requiring a **water immersion tank** and having **coarser resolution than CT/MRI**. The announcement… 12 Hugging Face Daily Papers research 13d ago Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification Abstract UniAR presents a unified autoregressive framework that uses a single discrete visual tokenizer to bridge visual understanding and generation, achieving state-of-the-art results in image generation and editing through multi-level feature fusion, bitwise quantization, and… 19 r/LocalLLaMA community 17d ago Open Dungeon: local roleplay with Gemma 4 QAT + inline Uncen-FLUX images, running at full 256K context under 8GB RAM (OS) I wanted AI Dungeon but fully local and actually private, so I built it. The narrator is Gemma 4 (QAT Q4) through Ollama, and when a scene is worth showing it draws the picture too, locally, with FLUX. No API keys, no cloud, nothing leaves your machine. The part that surprised… 26 Hugging Face Daily Papers research 17d ago Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback Abstract Structured Defect Grounding (SDG) addresses limitations in text-to-image model diagnosis by modeling defects as structured sets and using vision-language models for detection and reward-based alignment. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Despite generating… 22 Hugging Face Daily Papers research 17d ago High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation Abstract A 2-step image generation model is developed through distillation from an 8-step teacher using distribution-aligned adversarial learning, step-decoupled parameterization, and end-to-end training with iterative regularization. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 33 Hugging Face Daily Papers research 18d ago Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents Abstract Evoflux enables compact language models to execute tool workflows more reliably by using evolutionary search to repair failed plans during inference, significantly improving execution feasibility compared to traditional fine-tuning methods. Generated by… 20 arXiv — Machine Learning research 19d ago Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs arXiv:2606.12280v1 Announce Type: new Abstract: Post-training quantization lets large text-to-image diffusion transformers run on consumer GPUs, yet the hardware-specific trade-offs are seldom measured directly. We quantize Ideogram 4.0 - a 9.3B flow-matching diffusion… 17 Hugging Face Daily Papers research 19d ago Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions Abstract A teacher-student framework decouples complex reasoning from efficient reward deployment in text-to-image training, achieving superior preference accuracy and optimization performance. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Reward models are central to… 22 Hugging Face Daily Papers research 19d ago i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models Abstract A comprehensive experimental study of text-to-image diffusion models reveals key design choices and training insights leading to the development of i1, a 3B-parameter model that matches leading performance while maintaining full openness. Generated by… 21 Ars Technica — AI news-outlet 19d ago Google DeepMind releases DiffusionGemma, a model that runs local AI 4x faster Diffusion AI is most common in image generation, but it can make text outputs much faster. 29 Hacker News — AI on Front Page community 19d ago Mercedes‑Benz starts large‑scale production of electric axial flux motor Article URL: https://media.mercedes-benz.com/en/article/bebac2af-acdc-465a-9538-adb0bf3d8ccf Comments URL: https://news.ycombinator.com/item?id=48472877 Points: 262 # Comments: 139 21 Hugging Face Daily Papers research 20d ago Text-to-Image Models Need Less from Text Encoders Than You Think Abstract Text-to-image models primarily utilize basic text representation aspects like word merging and order rather than complex contextual information encoded in full text embeddings. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Text-to-image models rely on text prompts as… 36 r/MachineLearning community 21d ago Open image generation models are closer to closed-source quality than this sub thinks [D] I run evaluations on generative image models as part of my workflow, mostly comparing coherence, prompt adherence, and compositional accuracy across different architectures. The consensus here seems to be that open models are still a generation behind closed APIs. Based on my… 25 Hacker News — AI on Front Page community 25d ago Ask HN: What was your "oh shit" moment with GenAI? Most of us were amused when DALL-E and its peers went mainstream, and we were quick to point out the obvious flaws. Then ChatGPT hit the scene and again, many of us dismissed it as a parlor trick that would never amount to much. Using LLMs for coding initially was a only small… 26 Hugging Face Daily Papers research 25d ago Training-Free Multi-Concept LoRA Composition with Prompt-Aware Weighting Abstract Multi-concept customization in text-to-image generation is improved through prompt-aware weighting strategies that reduce interference between learned visual concepts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Low-Rank Adaptation (LoRA) successfully enables… 5 Hugging Face Daily Papers research 25d ago Do Text Edits Generalize to Visual Generation? Benchmarking Cross-Modal Knowledge Editing in UMMs Abstract Research reveals significant disparities between text and image generation capabilities in multimodal models, with effective textual knowledge editing not transferring reliably to visual output, necessitating modality-aware editing approaches. Generated by… 9 r/MachineLearning community 25d ago Research in Image/Video Gen AI models [D] I've been going down a rabbit hole with image/video generation/editing models for a few months now, started with playing around with Stable Diffusion and ComfyUI, then got genuinely hooked on understanding why things work, not just that they do. I have an Engineering background… 20 The Information — AI news-outlet 26d ago Cybersecurity’s AI Paradox It's no secret that criminals are using AI to streamline computer hacks in hopes of emptying out people’s bank accounts (never has it looked more appealing to stash cash under the mattress!). Cybersecurity executives, meanwhile, are rubbing their hands with glee at the influx of… 32 Hugging Face Daily Papers research 26d ago Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation Abstract Decoupled Residual Denoising Diffusion models (DRDD) improve unified image-to-image translation by separating noise diffusion for domain harmonization from residual diffusion for semantic mapping, enhancing data efficiency and performance. Generated by… 32 r/LocalLLaMA community 27d ago 1-bit Bonsai Image 4B and Ternary Bonsai Image 4B Image Generation for Local Devices with just 0.93 GB and 1.21 GB respectively of Diffusion Transformer Footprint. So tiny! https://prismml.com/news/bonsai-image-4b   submitted by   /u/Addyad [link]   [comments] 6 Hacker News — AI on Front Page community 27d ago Adafruit Receives Demand Letter from Fenwick Legal Counsel on Behalf of Flux.ai Article URL: https://blog.adafruit.com/ Comments URL: https://news.ycombinator.com/item?id=48368121 Points: 255 # Comments: 87 11 arXiv — Machine Learning research 28d ago CHAM-net: A Contrastive Hierarchical Adaptive Meta-network for Robust Global Methane Flux Prediction arXiv:2606.00338v1 Announce Type: new Abstract: Methane is a potent greenhouse gas that significantly contributes to global warming. However, accurately estimating global methane emissions and consumption remains challenging due to the complex interactions among environmental… 28 Hugging Face Daily Papers research 28d ago Compositional Text-to-Image Generation Via Region-aware Bimodal Direct Preference Optimization Abstract BiDPO enhances text-to-image models for complex compositional prompts through preference-based fine-tuning and region-level guidance. AI-generated summary Despite the rapid progress of text-to-image (T2I) models, generating images that accurately reflect complex… 18 Hugging Face Daily Papers research 28d ago Guidance Contrastive Token Credit Assignment for Discrete Policy Optimization Abstract GCPO enables per-token credit assignment in reinforcement learning by contrasting model predictions under positive and negative prompts, improving performance in text-to-image generation and chain-of-thought reasoning tasks. AI-generated summary Group-advantage-based… 27 Hugging Face Daily Papers research 29d ago Representation Forcing for Bottleneck-Free Unified Multimodal Models Abstract Representation Forcing enables unified multimodal models to perform both perception and generation tasks end-to-end without relying on external latent spaces, matching state-of-the-art performance in image generation while improving understanding capabilities.… 27 Hacker News — AI on Front Page community 29d ago 1-Bit Bonsai Image 4B Image Generation for Local Devices Article URL: https://prismml.com/news/bonsai-image-4b Comments URL: https://news.ycombinator.com/item?id=48346257 Points: 228 # Comments: 81 36 r/LocalLLaMA community 29d ago Should I buy this RTX 2060 12GB graphics card at around $260 for AI purpose ? I’m interested in running Gemma 4 model/s for text only . It runs smooth even on my laptop but gets crazy hot. Initially wanted to buy an 8 GB card. But I find this price for 12 GB good. (Maybe I can run some image generation models too. But its not important.) It has 6 Month… 10 r/LocalLLaMA community 1mo ago Could someone make some ggufs for Qwen-Image-Bench? I'd like to try it out for automating image generation quality output, I haven't had great luck with that using 27b base or gemma. If this can reliably detect 6 fingered generations and other undesirable outputs it would be a great boon. I took a swing and quantizing it myself… 30 Page 1 of 2 · 96 articles Older →