Stable Diffusion Guide 2025: Install, Run & Generate Images
Stable Diffusion is free, open-source, and runs on your own hardware — no subscriptions, no content filters, no API costs. This guide covers ComfyUI setup, VRAM requirements, prompt engineering, and how SD compares to Midjourney and DALL-E.
1. What is Stable Diffusion?
Stable Diffusion is an open-source text-to-image AI model originally released by Stability AI in 2022. Unlike cloud services, it runs entirely on your local GPU — meaning no recurring subscription, no usage limits, and no data sent to servers. The community has built thousands of fine-tuned models, LoRA adapters, and extensions on top of it.
Current model generations
Why use Stable Diffusion?
2. Installation options
There are three popular UIs for running Stable Diffusion locally. ComfyUI is the current recommended choice — it's faster, more powerful, and actively maintained.
ComfyUI
Node-based workflow editor. Fastest performance, best SDXL/Flux/SD3 support, highly customizable. Steeper initial learning curve but most powerful.
Install: git clone https://github.com/comfyanonymous/ComfyUI
AUTOMATIC1111 (webui)
The original and most popular UI. Simpler interface, 500+ extensions including ControlNet. Slower than ComfyUI but easier to start with.
Install: git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
Forge (AUTOMATIC1111 fork)
Fork of AUTOMATIC1111 with improved memory efficiency and faster inference. Good choice if you have limited VRAM (4–6GB) and want the A1111 interface.
Install: git clone https://github.com/lllyasviel/stable-diffusion-webui-forge
3. Hardware requirements
VRAM is the primary bottleneck for Stable Diffusion. More VRAM = higher resolution, faster generation, and access to larger models.
| VRAM | What you can run | Example GPU | Speed |
|---|---|---|---|
| 4 GB | SD 1.5 at 512x512, Forge SDXL with optimizations | RTX 3050 | ~15s / image |
| 6 GB | SDXL at 1024x1024 with tiled VAE | RTX 3060 | ~10s / image |
| 8 GB | SDXL, SD 3.5 Medium, Flux Schnell (quantized) | RTX 3070 / 4060 | ~5s / image |
| 12 GB+ | Flux Dev / Pro, SD 3.5 Large, high-res 2048px | RTX 3080 / 4070 | ~3s / image |
| 16 GB (Mac) | SDXL, Flux (slower via MPS/Metal), SD 3.5 | M1/M2/M3 Pro | ~8s / image |
CPU inference is possible (very slow — minutes per image) via OLLAMA_NUM_GPU=0 or CPU-only mode in ComfyUI.
4. Basic workflow in ComfyUI
ComfyUI uses a node graph. The default workflow is loaded on first launch — here's what each node does:
Load Checkpoint (model file)
Select your .safetensors model from the dropdown. This is the core model that determines the overall style.
CLIP Text Encode (positive and negative prompts)
Enter your positive prompt (what you want) and negative prompt (what to avoid) in the two text nodes. Both are tokenized by the CLIP encoder.
KSampler (sampler settings)
Configure steps (20–30), CFG scale (6–8), sampler name (DPM++ 2M), scheduler (Karras), and seed. These control quality and reproducibility.
VAE Decode + Save Image
The VAE decodes the latent tensor to a pixel image. The final node saves it to ComfyUI/output/. Click “Queue Prompt” to generate.
Workflow summary
# Basic ComfyUI workflow (terminal / API-driven) # 1. Load checkpoint # 2. Set positive prompt # 3. Set negative prompt # 4. Choose sampler (DPM++ 2M Karras) # 5. Set steps=25, CFG=7, seed=-1 (random) # 6. Queue prompt → image saved to /output/
5. Key parameters explained
Steps (20–30)
The number of denoising iterations. More steps → more refined but slower. 20–25 steps is usually sufficient with DPM++ 2M Karras. Beyond 30 steps has diminishing returns.
CFG Scale (6–9)
Classifier-Free Guidance — how strongly the model follows your prompt. Lower (5–6) = more creative and varied. Higher (9–12) = more literal but can oversaturate. 7 is a good default.
Seed
Controls the random noise pattern. Same seed + same settings = same image. Use −1 for random. Lock a seed when you find a good composition and tweak other parameters without losing it.
Sampler: DPM++ 2M Karras
The best all-around sampler for SDXL and SD 1.5. Fast convergence, good detail, minimal artifacts at 20+ steps. Euler a and DDIM are also common alternatives. Flux models use their own scheduler (not Karras).
6. Prompt engineering for Stable Diffusion
SD prompts work differently from Midjourney — they respond well to comma-separated tags rather than natural sentences. Use this formula: subject + style + quality tags + negative.
# Prompt formula: subject + style + quality tags + negative Positive: "a golden retriever on a mountain trail, photorealistic, sharp focus, 8k, cinematic lighting, DSLR photography" Negative: "blurry, deformed, bad anatomy, watermark, text, low quality, jpeg artifacts, cropped"
Common quality tags
masterpiece, best quality — general quality boost8k, sharp focus — high resolution lookphotorealistic, DSLR — photo stylecinematic lighting — dramatic lightstudio lighting — clean even lightanime style — 2D illustration lookStandard negative prompt
blurry, deformed, bad anatomy, extra fingers, watermark, text, low quality, jpeg artifacts, cropped, ugly, worst quality
7. Where to download models
Civitai (civitai.com)
The largest community hub for Stable Diffusion models. Thousands of checkpoints, LoRAs, embeddings, and VAEs. Includes photorealistic, anime, stylized, and concept art models. Free to download.
Hugging Face (huggingface.co)
Official source for Stability AI models (SDXL, SD 3.5, Flux). Also hosts hundreds of fine-tunes and academic models. Requires a free account for gated models (accept license terms). Use huggingface-cli download or the web UI to download .safetensors files.
Where to put downloaded models
Checkpoints: ComfyUI/models/checkpoints/
LoRAs: ComfyUI/models/loras/
VAEs: ComfyUI/models/vae/
Embeddings: ComfyUI/models/embeddings/
8. Stable Diffusion vs Midjourney vs DALL-E 3
Each tool has different strengths. The core tradeoff is control and cost (SD) vs ease and quality (Midjourney) vs ecosystem integration (DALL-E).
| Feature | Stable Diffusion | Midjourney | DALL-E 3 |
|---|---|---|---|
| Cost | Free (GPU required) | $10–$120/mo | Free via ChatGPT / $0.04/img API |
| Image quality | Excellent (Flux, SDXL) | Best artistic quality | Good, best for text |
| Ease of use | Harder (local setup) | Easiest (Discord) | Easy (ChatGPT) |
| Content control | Full (local) | Restricted | Restricted |
| ControlNet | ✓ Yes | ✗ No | ✗ No |
| Custom models | ✓ Thousands | ✗ No | ✗ No |
| API | ✓ Replicate, fal.ai | ✗ No public API | ✓ OpenAI API |
| Best for | Control, free, developers | Artistic quality | Text-in-images, ease |
Quick pick guide
Monitor Stability AI and image generation tool status at Prismix
Track Stability AI, Replicate, fal.ai, and other image generation services. Get free alerts when something goes down.
FAQ
What is Stable Diffusion?
Stable Diffusion is a free, open-source AI image generation model you can run locally on your own GPU. Unlike Midjourney or DALL-E, it requires no subscription — just a GPU with enough VRAM. It supports SDXL, SD 3.5, and community-created models from Civitai and Hugging Face.
How much VRAM do I need for Stable Diffusion?
4GB VRAM can run SD 1.5 models at 512x512. 6GB handles SDXL with optimizations. 8GB is comfortable for SDXL at 1024x1024. 12GB+ runs Flux and SD 3.5 without compromise. Apple Silicon Mac users can use unified memory — 16GB is sufficient for most models.
ComfyUI vs AUTOMATIC1111: which is better?
ComfyUI is the current recommended choice — it has a node-based workflow editor, runs faster, and is actively maintained with better SDXL/Flux support. AUTOMATIC1111 is older but has a simpler interface and more extensions. Forge is a fork of AUTOMATIC1111 with improved memory efficiency.
Where can I download Stable Diffusion models?
The two main sources are Civitai (civitai.com) and Hugging Face (huggingface.co). Civitai has thousands of community-trained models. Hugging Face hosts official Stability AI models. Note: some Civitai models have NSFW content — check model pages carefully.