Stable Diffusion Image Generation 10 min read

Stable Diffusion Guide 2025: Install, Run & Generate Images

Stable Diffusion is free, open-source, and runs on your own hardware — no subscriptions, no content filters, no API costs. This guide covers ComfyUI setup, VRAM requirements, prompt engineering, and how SD compares to Midjourney and DALL-E.

1. What is Stable Diffusion?

Stable Diffusion is an open-source text-to-image AI model originally released by Stability AI in 2022. Unlike cloud services, it runs entirely on your local GPU — meaning no recurring subscription, no usage limits, and no data sent to servers. The community has built thousands of fine-tuned models, LoRA adapters, and extensions on top of it.

Current model generations

SD 1.5 — older but huge community library, runs on 4GB VRAM, 512x512 native

SDXL (Stable Diffusion XL) — higher quality, 1024x1024 native, 6GB+ VRAM recommended

SD 3.5 (Medium / Large) — latest Stability AI release, better anatomy, 8GB+ VRAM

Flux (Black Forest Labs) — highest photorealism, Apache 2.0 license, 12GB+ VRAM for full quality

Why use Stable Diffusion?

✓Free forever — no subscription, no per-image cost after setup

✓No content restrictions — generate anything (within local legal limits)

✓Full control — ControlNet for pose/depth/edge, LoRA for style

✓Thousands of community models on Civitai and Hugging Face

✓API access — build apps with local inference via Replicate or ComfyUI API

2. Installation options

There are three popular UIs for running Stable Diffusion locally. ComfyUI is the current recommended choice — it's faster, more powerful, and actively maintained.

Recommended

ComfyUI

Node-based workflow editor. Fastest performance, best SDXL/Flux/SD3 support, highly customizable. Steeper initial learning curve but most powerful.

Install: git clone https://github.com/comfyanonymous/ComfyUI

AUTOMATIC1111 (webui)

The original and most popular UI. Simpler interface, 500+ extensions including ControlNet. Slower than ComfyUI but easier to start with.

Install: git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui

Forge (AUTOMATIC1111 fork)

Fork of AUTOMATIC1111 with improved memory efficiency and faster inference. Good choice if you have limited VRAM (4–6GB) and want the A1111 interface.

Install: git clone https://github.com/lllyasviel/stable-diffusion-webui-forge

3. Hardware requirements

VRAM is the primary bottleneck for Stable Diffusion. More VRAM = higher resolution, faster generation, and access to larger models.

VRAM	What you can run	Example GPU	Speed
4 GB	SD 1.5 at 512x512, Forge SDXL with optimizations	RTX 3050	~15s / image
6 GB	SDXL at 1024x1024 with tiled VAE	RTX 3060	~10s / image
8 GB	SDXL, SD 3.5 Medium, Flux Schnell (quantized)	RTX 3070 / 4060	~5s / image
12 GB+	Flux Dev / Pro, SD 3.5 Large, high-res 2048px	RTX 3080 / 4070	~3s / image
16 GB (Mac)	SDXL, Flux (slower via MPS/Metal), SD 3.5	M1/M2/M3 Pro	~8s / image

CPU inference is possible (very slow — minutes per image) via OLLAMA_NUM_GPU=0 or CPU-only mode in ComfyUI.

4. Basic workflow in ComfyUI

ComfyUI uses a node graph. The default workflow is loaded on first launch — here's what each node does:

Load Checkpoint (model file)

Select your .safetensors model from the dropdown. This is the core model that determines the overall style.

CLIP Text Encode (positive and negative prompts)

Enter your positive prompt (what you want) and negative prompt (what to avoid) in the two text nodes. Both are tokenized by the CLIP encoder.

KSampler (sampler settings)

Configure steps (20–30), CFG scale (6–8), sampler name (DPM++ 2M), scheduler (Karras), and seed. These control quality and reproducibility.

VAE Decode + Save Image

The VAE decodes the latent tensor to a pixel image. The final node saves it to ComfyUI/output/. Click “Queue Prompt” to generate.

Workflow summary

# Basic ComfyUI workflow (terminal / API-driven)
# 1. Load checkpoint
# 2. Set positive prompt
# 3. Set negative prompt
# 4. Choose sampler (DPM++ 2M Karras)
# 5. Set steps=25, CFG=7, seed=-1 (random)
# 6. Queue prompt → image saved to /output/

5. Key parameters explained

Steps (20–30)

The number of denoising iterations. More steps → more refined but slower. 20–25 steps is usually sufficient with DPM++ 2M Karras. Beyond 30 steps has diminishing returns.

CFG Scale (6–9)

Classifier-Free Guidance — how strongly the model follows your prompt. Lower (5–6) = more creative and varied. Higher (9–12) = more literal but can oversaturate. 7 is a good default.

Seed

Controls the random noise pattern. Same seed + same settings = same image. Use −1 for random. Lock a seed when you find a good composition and tweak other parameters without losing it.

Sampler: DPM++ 2M Karras

The best all-around sampler for SDXL and SD 1.5. Fast convergence, good detail, minimal artifacts at 20+ steps. Euler a and DDIM are also common alternatives. Flux models use their own scheduler (not Karras).

6. Prompt engineering for Stable Diffusion

SD prompts work differently from Midjourney — they respond well to comma-separated tags rather than natural sentences. Use this formula: subject + style + quality tags + negative.

# Prompt formula: subject + style + quality tags + negative
Positive: "a golden retriever on a mountain trail, photorealistic,
  sharp focus, 8k, cinematic lighting, DSLR photography"

Negative: "blurry, deformed, bad anatomy, watermark, text,
  low quality, jpeg artifacts, cropped"

Common quality tags

masterpiece, best quality — general quality boost

8k, sharp focus — high resolution look

photorealistic, DSLR — photo style

cinematic lighting — dramatic light

studio lighting — clean even light

anime style — 2D illustration look

Standard negative prompt

blurry, deformed, bad anatomy, extra fingers, watermark, text, low quality, jpeg artifacts, cropped, ugly, worst quality

7. Where to download models

Civitai (civitai.com)

The largest community hub for Stable Diffusion models. Thousands of checkpoints, LoRAs, embeddings, and VAEs. Includes photorealistic, anime, stylized, and concept art models. Free to download.

Note: Civitai has NSFW content. Create an account and enable content filters if you want to restrict what appears. Always check model licenses before commercial use.

Hugging Face (huggingface.co)

Official source for Stability AI models (SDXL, SD 3.5, Flux). Also hosts hundreds of fine-tunes and academic models. Requires a free account for gated models (accept license terms). Use huggingface-cli download or the web UI to download .safetensors files.

Where to put downloaded models

Checkpoints: ComfyUI/models/checkpoints/

LoRAs: ComfyUI/models/loras/

VAEs: ComfyUI/models/vae/

Embeddings: ComfyUI/models/embeddings/

8. Stable Diffusion vs Midjourney vs DALL-E 3

Each tool has different strengths. The core tradeoff is control and cost (SD) vs ease and quality (Midjourney) vs ecosystem integration (DALL-E).

Feature	Stable Diffusion	Midjourney	DALL-E 3
Cost	Free (GPU required)	$10–$120/mo	Free via ChatGPT / $0.04/img API
Image quality	Excellent (Flux, SDXL)	Best artistic quality	Good, best for text
Ease of use	Harder (local setup)	Easiest (Discord)	Easy (ChatGPT)
Content control	Full (local)	Restricted	Restricted
ControlNet	✓ Yes	✗ No	✗ No
Custom models	✓ Thousands	✗ No	✗ No
API	✓ Replicate, fal.ai	✗ No public API	✓ OpenAI API
Best for	Control, free, developers	Artistic quality	Text-in-images, ease

Quick pick guide

SD:You want full control, no cost, developer API, or ControlNet pose control

MJ:You want the best cinematic / artistic quality with minimal effort

DALL-E:You want text rendered in images, or integration with ChatGPT workflow

🔔

Monitor Stability AI and image generation tool status at Prismix

Track Stability AI, Replicate, fal.ai, and other image generation services. Get free alerts when something goes down.

Stability AI status Get alerts free →

FAQ

What is Stable Diffusion?

Stable Diffusion is a free, open-source AI image generation model you can run locally on your own GPU. Unlike Midjourney or DALL-E, it requires no subscription — just a GPU with enough VRAM. It supports SDXL, SD 3.5, and community-created models from Civitai and Hugging Face.

How much VRAM do I need for Stable Diffusion?

4GB VRAM can run SD 1.5 models at 512x512. 6GB handles SDXL with optimizations. 8GB is comfortable for SDXL at 1024x1024. 12GB+ runs Flux and SD 3.5 without compromise. Apple Silicon Mac users can use unified memory — 16GB is sufficient for most models.

ComfyUI vs AUTOMATIC1111: which is better?

ComfyUI is the current recommended choice — it has a node-based workflow editor, runs faster, and is actively maintained with better SDXL/Flux support. AUTOMATIC1111 is older but has a simpler interface and more extensions. Forge is a fork of AUTOMATIC1111 with improved memory efficiency.

Where can I download Stable Diffusion models?

The two main sources are Civitai (civitai.com) and Hugging Face (huggingface.co). Civitai has thousands of community-trained models. Hugging Face hosts official Stability AI models. Note: some Civitai models have NSFW content — check model pages carefully.

Stable Diffusion vs Midjourney → Midjourney guide → Best AI image generators → Midjourney alternatives → All guides →