Stable Diffusion Image Generation 10 min read

Stable Diffusion Guide 2025: Install, Run & Generate Images

Stable Diffusion is free, open-source, and runs on your own hardware — no subscriptions, no content filters, no API costs. This guide covers ComfyUI setup, VRAM requirements, prompt engineering, and how SD compares to Midjourney and DALL-E.

1. What is Stable Diffusion?

Stable Diffusion is an open-source text-to-image AI model originally released by Stability AI in 2022. Unlike cloud services, it runs entirely on your local GPU — meaning no recurring subscription, no usage limits, and no data sent to servers. The community has built thousands of fine-tuned models, LoRA adapters, and extensions on top of it.

Current model generations

SD 1.5 — older but huge community library, runs on 4GB VRAM, 512x512 native
SDXL (Stable Diffusion XL) — higher quality, 1024x1024 native, 6GB+ VRAM recommended
SD 3.5 (Medium / Large) — latest Stability AI release, better anatomy, 8GB+ VRAM
Flux (Black Forest Labs) — highest photorealism, Apache 2.0 license, 12GB+ VRAM for full quality

Why use Stable Diffusion?

Free forever — no subscription, no per-image cost after setup
No content restrictions — generate anything (within local legal limits)
Full control — ControlNet for pose/depth/edge, LoRA for style
Thousands of community models on Civitai and Hugging Face
API access — build apps with local inference via Replicate or ComfyUI API

2. Installation options

There are three popular UIs for running Stable Diffusion locally. ComfyUI is the current recommended choice — it's faster, more powerful, and actively maintained.

Recommended

ComfyUI

Node-based workflow editor. Fastest performance, best SDXL/Flux/SD3 support, highly customizable. Steeper initial learning curve but most powerful.

Install: git clone https://github.com/comfyanonymous/ComfyUI

AUTOMATIC1111 (webui)

The original and most popular UI. Simpler interface, 500+ extensions including ControlNet. Slower than ComfyUI but easier to start with.

Install: git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui

Forge (AUTOMATIC1111 fork)

Fork of AUTOMATIC1111 with improved memory efficiency and faster inference. Good choice if you have limited VRAM (4–6GB) and want the A1111 interface.

Install: git clone https://github.com/lllyasviel/stable-diffusion-webui-forge

3. Hardware requirements

VRAM is the primary bottleneck for Stable Diffusion. More VRAM = higher resolution, faster generation, and access to larger models.

VRAM What you can run Example GPU Speed
4 GB SD 1.5 at 512x512, Forge SDXL with optimizations RTX 3050 ~15s / image
6 GB SDXL at 1024x1024 with tiled VAE RTX 3060 ~10s / image
8 GB SDXL, SD 3.5 Medium, Flux Schnell (quantized) RTX 3070 / 4060 ~5s / image
12 GB+ Flux Dev / Pro, SD 3.5 Large, high-res 2048px RTX 3080 / 4070 ~3s / image
16 GB (Mac) SDXL, Flux (slower via MPS/Metal), SD 3.5 M1/M2/M3 Pro ~8s / image

CPU inference is possible (very slow — minutes per image) via OLLAMA_NUM_GPU=0 or CPU-only mode in ComfyUI.

4. Basic workflow in ComfyUI

ComfyUI uses a node graph. The default workflow is loaded on first launch — here's what each node does:

1

Load Checkpoint (model file)

Select your .safetensors model from the dropdown. This is the core model that determines the overall style.

2

CLIP Text Encode (positive and negative prompts)

Enter your positive prompt (what you want) and negative prompt (what to avoid) in the two text nodes. Both are tokenized by the CLIP encoder.

3

KSampler (sampler settings)

Configure steps (20–30), CFG scale (6–8), sampler name (DPM++ 2M), scheduler (Karras), and seed. These control quality and reproducibility.

4

VAE Decode + Save Image

The VAE decodes the latent tensor to a pixel image. The final node saves it to ComfyUI/output/. Click “Queue Prompt” to generate.

Workflow summary

# Basic ComfyUI workflow (terminal / API-driven)
# 1. Load checkpoint
# 2. Set positive prompt
# 3. Set negative prompt
# 4. Choose sampler (DPM++ 2M Karras)
# 5. Set steps=25, CFG=7, seed=-1 (random)
# 6. Queue prompt → image saved to /output/

5. Key parameters explained

Steps (20–30)

The number of denoising iterations. More steps → more refined but slower. 20–25 steps is usually sufficient with DPM++ 2M Karras. Beyond 30 steps has diminishing returns.

CFG Scale (6–9)

Classifier-Free Guidance — how strongly the model follows your prompt. Lower (5–6) = more creative and varied. Higher (9–12) = more literal but can oversaturate. 7 is a good default.

Seed

Controls the random noise pattern. Same seed + same settings = same image. Use −1 for random. Lock a seed when you find a good composition and tweak other parameters without losing it.

Sampler: DPM++ 2M Karras

The best all-around sampler for SDXL and SD 1.5. Fast convergence, good detail, minimal artifacts at 20+ steps. Euler a and DDIM are also common alternatives. Flux models use their own scheduler (not Karras).

6. Prompt engineering for Stable Diffusion

SD prompts work differently from Midjourney — they respond well to comma-separated tags rather than natural sentences. Use this formula: subject + style + quality tags + negative.

# Prompt formula: subject + style + quality tags + negative
Positive: "a golden retriever on a mountain trail, photorealistic,
  sharp focus, 8k, cinematic lighting, DSLR photography"

Negative: "blurry, deformed, bad anatomy, watermark, text,
  low quality, jpeg artifacts, cropped"

Common quality tags

masterpiece, best quality — general quality boost
8k, sharp focus — high resolution look
photorealistic, DSLR — photo style
cinematic lighting — dramatic light
studio lighting — clean even light
anime style — 2D illustration look

Standard negative prompt

blurry, deformed, bad anatomy, extra fingers, watermark, text, low quality, jpeg artifacts, cropped, ugly, worst quality

7. Where to download models

Civitai (civitai.com)

The largest community hub for Stable Diffusion models. Thousands of checkpoints, LoRAs, embeddings, and VAEs. Includes photorealistic, anime, stylized, and concept art models. Free to download.

Note: Civitai has NSFW content. Create an account and enable content filters if you want to restrict what appears. Always check model licenses before commercial use.

Hugging Face (huggingface.co)

Official source for Stability AI models (SDXL, SD 3.5, Flux). Also hosts hundreds of fine-tunes and academic models. Requires a free account for gated models (accept license terms). Use huggingface-cli download or the web UI to download .safetensors files.

Where to put downloaded models

Checkpoints: ComfyUI/models/checkpoints/

LoRAs: ComfyUI/models/loras/

VAEs: ComfyUI/models/vae/

Embeddings: ComfyUI/models/embeddings/

8. Stable Diffusion vs Midjourney vs DALL-E 3

Each tool has different strengths. The core tradeoff is control and cost (SD) vs ease and quality (Midjourney) vs ecosystem integration (DALL-E).

Feature Stable Diffusion Midjourney DALL-E 3
Cost Free (GPU required) $10–$120/mo Free via ChatGPT / $0.04/img API
Image quality Excellent (Flux, SDXL) Best artistic quality Good, best for text
Ease of use Harder (local setup) Easiest (Discord) Easy (ChatGPT)
Content control Full (local) Restricted Restricted
ControlNet ✓ Yes ✗ No ✗ No
Custom models ✓ Thousands ✗ No ✗ No
API ✓ Replicate, fal.ai ✗ No public API ✓ OpenAI API
Best for Control, free, developers Artistic quality Text-in-images, ease

Quick pick guide

SD:You want full control, no cost, developer API, or ControlNet pose control
MJ:You want the best cinematic / artistic quality with minimal effort
DALL-E:You want text rendered in images, or integration with ChatGPT workflow
🔔

Monitor Stability AI and image generation tool status at Prismix

Track Stability AI, Replicate, fal.ai, and other image generation services. Get free alerts when something goes down.

FAQ

What is Stable Diffusion?

Stable Diffusion is a free, open-source AI image generation model you can run locally on your own GPU. Unlike Midjourney or DALL-E, it requires no subscription — just a GPU with enough VRAM. It supports SDXL, SD 3.5, and community-created models from Civitai and Hugging Face.

How much VRAM do I need for Stable Diffusion?

4GB VRAM can run SD 1.5 models at 512x512. 6GB handles SDXL with optimizations. 8GB is comfortable for SDXL at 1024x1024. 12GB+ runs Flux and SD 3.5 without compromise. Apple Silicon Mac users can use unified memory — 16GB is sufficient for most models.

ComfyUI vs AUTOMATIC1111: which is better?

ComfyUI is the current recommended choice — it has a node-based workflow editor, runs faster, and is actively maintained with better SDXL/Flux support. AUTOMATIC1111 is older but has a simpler interface and more extensions. Forge is a fork of AUTOMATIC1111 with improved memory efficiency.

Where can I download Stable Diffusion models?

The two main sources are Civitai (civitai.com) and Hugging Face (huggingface.co). Civitai has thousands of community-trained models. Hugging Face hosts official Stability AI models. Note: some Civitai models have NSFW content — check model pages carefully.