Ollama Not Working?
Server not starting, model not found, port 11434 in use, CUDA out of memory, or GPU not being used? Step-by-step fix for Mac, Windows, and Linux.
Ollama — live status
Updated every 5 minutes. Monitors ollama.com API and model registry at prismix.dev/service/ollama.
Quick diagnosis
# 1. Is the server running?
curl http://localhost:11434
# 2. What models are downloaded?
ollama list
# 3. Run a quick test
ollama run llama3.2:3b "say hi"
# 4. Check GPU detection (Nvidia)
nvidia-smi
What's wrong? Diagnose fast
Server not starting
Port 11434 in use by another Ollama instance, or permission error on ~/.ollama directory. Kill existing process with: lsof -i :11434 (Linux/Mac) or netstat -ano | findstr 11434 (Windows).
Model not found
Model not downloaded yet. Run: ollama pull modelname (e.g. ollama pull llama3.2). Model names are case-sensitive. Check available models at ollama.com/models.
CUDA out of memory
GPU VRAM too small. Rule: 8B model needs ~5GB VRAM at Q4_K_M. Use a smaller variant (llama3.2:3b instead of 8b), or set OLLAMA_NUM_GPU=0 to force CPU mode.
Running slowly (CPU mode)
GPU not being used. Check Ollama logs for "using CUDA" or "using Metal". On Windows/Linux: CUDA drivers must be installed. Apple Silicon uses Metal automatically.
Windows / WSL2 issues
On Windows, prefer the native .exe installer over WSL2 for GPU access. If using WSL2, ensure CUDA drivers are installed on Windows host and GPU passthrough is configured.
API not responding to requests
Ensure OLLAMA_HOST=0.0.0.0:11434 if you need to access Ollama from another machine or Docker container. By default, Ollama only binds to localhost (127.0.0.1).
Ollama model VRAM requirements
| Model | Parameters | VRAM (Q4_K_M) | Ollama pull name | Speed (Apple M4) |
|---|---|---|---|---|
| Llama 3.2 | 3B | ~2 GB | llama3.2:3b | ~60 tok/s |
| Llama 3.2 | 8B | ~5 GB | llama3.2 | ~30 tok/s |
| Mistral | 7B | ~4 GB | mistral | ~35 tok/s |
| Gemma 3 | 12B | ~7 GB | gemma3:12b | ~20 tok/s |
| Llama 3.3 | 70B | ~40 GB | llama3.3:70b | ~5 tok/s |
VRAM estimates at Q4_K_M quantization. CPU RAM needed is roughly the same as VRAM. Speed varies by hardware — M4 Pro is ~2× faster than M1.
Step-by-step fix
- 1
Check if the server is running
Run
curl http://localhost:11434. If you see "Ollama is running", the server is fine. If connection refused: start the server withollama servein a terminal. On Mac, click the Ollama menu-bar app to start it. Keep the terminal open — closing it stops the server. - 2
Fix port 11434 already in use
On Linux/Mac:
lsof -i :11434finds the blocking process. Kill it withkill -9 {PID}. On Windows:netstat -ano | findstr 11434thentaskkill /PID {PID} /F. If you want Ollama on a different port:OLLAMA_HOST=0.0.0.0:11435 ollama serve. - 3
Pull the model before running
If you see "model not found": run
ollama pull modelnamefirst. The pull downloads the model (can be 1–40GB). Then verify withollama list. Model names must be lowercase and match exactly — check ollama.com/models for the exact name. - 4
Fix VRAM / out of memory
If the model runs out of VRAM: (1) use a smaller model —
ollama pull llama3.2:3binstead ofllama3.2(8B); (2) use a smaller quantization — add:q4_K_Mtag; (3) force CPU:OLLAMA_NUM_GPU=0 ollama run modelname(slow but always works). - 5
Enable GPU acceleration
Check if Ollama is using GPU: look for "using CUDA" or "using Metal" in
ollama serveoutput. If not: Nvidia — install CUDA toolkit 11.8+ and update Nvidia drivers; runnvidia-smito verify. AMD (Linux) — use the ROCm-enabled Ollama build. Apple Silicon — Metal is automatic, no config needed. - 6
Windows-specific: fix WSL2 GPU passthrough
On Windows, prefer the native Ollama installer (ollama.com/download) over WSL2 for GPU access. If you need WSL2: (1) install CUDA drivers on Windows (not inside WSL2); (2) ensure WSL2 kernel version is 5.10+; (3) inside WSL2, install CUDA toolkit via apt, not the Windows installer. Alternatively, use
OLLAMA_NUM_GPU=-1 ollama serveto auto-detect all available GPUs.
Common Ollama error messages
| Error | Cause | Fix |
|---|---|---|
| connection refused (port 11434) | Ollama not running | ollama serve in terminal |
| model not found | Not downloaded yet | ollama pull modelname |
| address already in use | Port 11434 taken | kill existing process (lsof -i :11434) |
| CUDA out of memory | VRAM too small | Use smaller model or OLLAMA_NUM_GPU=0 |
| failed to load model | Corrupted download | ollama rm modelname && ollama pull modelname |
| no such model | Wrong model name | Check exact name at ollama.com/models |
Monitor all AI services you use
Track 75+ AI services alongside Ollama — get alerted when OpenAI, Anthropic, or any cloud AI goes down while you're running local models. Free.
Frequently asked questions
Why is Ollama not working?
Most Ollama issues are local: (1) server not running — start with ollama serve; (2) model not pulled — run ollama pull modelname first; (3) port 11434 conflict — kill existing process; (4) VRAM too small — use smaller model or OLLAMA_NUM_GPU=0; (5) GPU drivers not installed — CUDA for Nvidia, ROCm for AMD.
Ollama model not found — how to fix?
Run "ollama pull modelname" to download the model first. Find exact model names at ollama.com/models. Model names are case-sensitive and include tags (e.g. llama3.2:3b for the 3B version). After pulling, verify with "ollama list".
Ollama CUDA out of memory — how to fix?
Use a smaller model: 3B models need ~2GB VRAM, 8B needs ~5GB, 70B needs ~40GB at Q4_K_M. Try "ollama pull llama3.2:3b" if you have limited VRAM. Or force CPU mode: OLLAMA_NUM_GPU=0 ollama run modelname (much slower but always works).
Ollama not using GPU — how to fix?
On Nvidia: install CUDA toolkit 11.8+ and update GPU drivers. On AMD (Linux): use the ROCm build. On Apple Silicon: GPU acceleration is automatic via Metal — no config needed. Check logs for "using CUDA/Metal" when starting ollama serve.
Ollama API not accessible from other machines — how to fix?
By default Ollama only binds to localhost. To expose it on your network: set OLLAMA_HOST=0.0.0.0:11434 in your environment before running ollama serve. Also allow port 11434 in your firewall. Note: this exposes Ollama without authentication — use only on trusted networks.