Free 6 min read

Ollama Not Working?

Server not starting, model not found, port 11434 in use, CUDA out of memory, or GPU not being used? Step-by-step fix for Mac, Windows, and Linux.

Ollama live status

Ollama — live status

Updated every 5 minutes. Monitors ollama.com API and model registry at prismix.dev/service/ollama.

Full status →
Note: Ollama runs locally — most issues are on your machine (model not pulled, port conflict, VRAM), not server-side. The status badge monitors the Ollama model registry (ollama.com) for download issues.

Quick diagnosis

# 1. Is the server running?

curl http://localhost:11434

# 2. What models are downloaded?

ollama list

# 3. Run a quick test

ollama run llama3.2:3b "say hi"

# 4. Check GPU detection (Nvidia)

nvidia-smi

What's wrong? Diagnose fast

🔴

Server not starting

Port 11434 in use by another Ollama instance, or permission error on ~/.ollama directory. Kill existing process with: lsof -i :11434 (Linux/Mac) or netstat -ano | findstr 11434 (Windows).

📦

Model not found

Model not downloaded yet. Run: ollama pull modelname (e.g. ollama pull llama3.2). Model names are case-sensitive. Check available models at ollama.com/models.

🧠

CUDA out of memory

GPU VRAM too small. Rule: 8B model needs ~5GB VRAM at Q4_K_M. Use a smaller variant (llama3.2:3b instead of 8b), or set OLLAMA_NUM_GPU=0 to force CPU mode.

🐢

Running slowly (CPU mode)

GPU not being used. Check Ollama logs for "using CUDA" or "using Metal". On Windows/Linux: CUDA drivers must be installed. Apple Silicon uses Metal automatically.

💻

Windows / WSL2 issues

On Windows, prefer the native .exe installer over WSL2 for GPU access. If using WSL2, ensure CUDA drivers are installed on Windows host and GPU passthrough is configured.

🔗

API not responding to requests

Ensure OLLAMA_HOST=0.0.0.0:11434 if you need to access Ollama from another machine or Docker container. By default, Ollama only binds to localhost (127.0.0.1).

Ollama model VRAM requirements

Model Parameters VRAM (Q4_K_M) Ollama pull name Speed (Apple M4)
Llama 3.2 3B ~2 GB llama3.2:3b ~60 tok/s
Llama 3.2 8B ~5 GB llama3.2 ~30 tok/s
Mistral 7B ~4 GB mistral ~35 tok/s
Gemma 3 12B ~7 GB gemma3:12b ~20 tok/s
Llama 3.3 70B ~40 GB llama3.3:70b ~5 tok/s

VRAM estimates at Q4_K_M quantization. CPU RAM needed is roughly the same as VRAM. Speed varies by hardware — M4 Pro is ~2× faster than M1.

Step-by-step fix

  1. 1

    Check if the server is running

    Run curl http://localhost:11434. If you see "Ollama is running", the server is fine. If connection refused: start the server with ollama serve in a terminal. On Mac, click the Ollama menu-bar app to start it. Keep the terminal open — closing it stops the server.

  2. 2

    Fix port 11434 already in use

    On Linux/Mac: lsof -i :11434 finds the blocking process. Kill it with kill -9 {PID}. On Windows: netstat -ano | findstr 11434 then taskkill /PID {PID} /F. If you want Ollama on a different port: OLLAMA_HOST=0.0.0.0:11435 ollama serve.

  3. 3

    Pull the model before running

    If you see "model not found": run ollama pull modelname first. The pull downloads the model (can be 1–40GB). Then verify with ollama list. Model names must be lowercase and match exactly — check ollama.com/models for the exact name.

  4. 4

    Fix VRAM / out of memory

    If the model runs out of VRAM: (1) use a smaller model — ollama pull llama3.2:3b instead of llama3.2 (8B); (2) use a smaller quantization — add :q4_K_M tag; (3) force CPU: OLLAMA_NUM_GPU=0 ollama run modelname (slow but always works).

  5. 5

    Enable GPU acceleration

    Check if Ollama is using GPU: look for "using CUDA" or "using Metal" in ollama serve output. If not: Nvidia — install CUDA toolkit 11.8+ and update Nvidia drivers; run nvidia-smi to verify. AMD (Linux) — use the ROCm-enabled Ollama build. Apple Silicon — Metal is automatic, no config needed.

  6. 6

    Windows-specific: fix WSL2 GPU passthrough

    On Windows, prefer the native Ollama installer (ollama.com/download) over WSL2 for GPU access. If you need WSL2: (1) install CUDA drivers on Windows (not inside WSL2); (2) ensure WSL2 kernel version is 5.10+; (3) inside WSL2, install CUDA toolkit via apt, not the Windows installer. Alternatively, use OLLAMA_NUM_GPU=-1 ollama serve to auto-detect all available GPUs.

Common Ollama error messages

Error Cause Fix
connection refused (port 11434) Ollama not running ollama serve in terminal
model not found Not downloaded yet ollama pull modelname
address already in use Port 11434 taken kill existing process (lsof -i :11434)
CUDA out of memory VRAM too small Use smaller model or OLLAMA_NUM_GPU=0
failed to load model Corrupted download ollama rm modelname && ollama pull modelname
no such model Wrong model name Check exact name at ollama.com/models
🔔

Monitor all AI services you use

Track 75+ AI services alongside Ollama — get alerted when OpenAI, Anthropic, or any cloud AI goes down while you're running local models. Free.

Frequently asked questions

Why is Ollama not working?

Most Ollama issues are local: (1) server not running — start with ollama serve; (2) model not pulled — run ollama pull modelname first; (3) port 11434 conflict — kill existing process; (4) VRAM too small — use smaller model or OLLAMA_NUM_GPU=0; (5) GPU drivers not installed — CUDA for Nvidia, ROCm for AMD.

Ollama model not found — how to fix?

Run "ollama pull modelname" to download the model first. Find exact model names at ollama.com/models. Model names are case-sensitive and include tags (e.g. llama3.2:3b for the 3B version). After pulling, verify with "ollama list".

Ollama CUDA out of memory — how to fix?

Use a smaller model: 3B models need ~2GB VRAM, 8B needs ~5GB, 70B needs ~40GB at Q4_K_M. Try "ollama pull llama3.2:3b" if you have limited VRAM. Or force CPU mode: OLLAMA_NUM_GPU=0 ollama run modelname (much slower but always works).

Ollama not using GPU — how to fix?

On Nvidia: install CUDA toolkit 11.8+ and update GPU drivers. On AMD (Linux): use the ROCm build. On Apple Silicon: GPU acceleration is automatic via Metal — no config needed. Check logs for "using CUDA/Metal" when starting ollama serve.

Ollama API not accessible from other machines — how to fix?

By default Ollama only binds to localhost. To expose it on your network: set OLLAMA_HOST=0.0.0.0:11434 in your environment before running ollama serve. Also allow port 11434 in your firewall. Note: this exposes Ollama without authentication — use only on trusted networks.

Related local AI tools