Ollama server not starting — how to fix?

If `ollama serve` fails: (1) check if port 11434 is already in use: `lsof -i :11434` (Linux/Mac) or `netstat -ano | findstr 11434` (Windows); kill the conflicting process. (2) On Mac, check if the Ollama menu-bar app is already running — quit it before running `ollama serve`. (3) Check file permissions on the Ollama data directory (~/.ollama). (4) Update Ollama to the latest version — `ollama update`.

Free 6 min read

Ollama Not Working?

Server not starting, model not found, port 11434 in use, CUDA out of memory, or GPU not being used? Step-by-step fix for Mac, Windows, and Linux.

Ollama — live status

Updated every 5 minutes. Monitors ollama.com API and model registry at prismix.dev/service/ollama.

Full status →

Note: Ollama runs locally — most issues are on your machine (model not pulled, port conflict, VRAM), not server-side. The status badge monitors the Ollama model registry (ollama.com) for download issues.

Quick diagnosis

# 1. Is the server running?

curl http://localhost:11434

# 2. What models are downloaded?

ollama list

# 3. Run a quick test

ollama run llama3.2:3b "say hi"

# 4. Check GPU detection (Nvidia)

nvidia-smi

What's wrong? Diagnose fast

🔴

Server not starting

Port 11434 in use by another Ollama instance, or permission error on ~/.ollama directory. Kill existing process with: lsof -i :11434 (Linux/Mac) or netstat -ano | findstr 11434 (Windows).

📦

Model not found

Model not downloaded yet. Run: ollama pull modelname (e.g. ollama pull llama3.2). Model names are case-sensitive. Check available models at ollama.com/models.

🧠

CUDA out of memory

GPU VRAM too small. Rule: 8B model needs ~5GB VRAM at Q4_K_M. Use a smaller variant (llama3.2:3b instead of 8b), or set OLLAMA_NUM_GPU=0 to force CPU mode.

🐢

Running slowly (CPU mode)

GPU not being used. Check Ollama logs for "using CUDA" or "using Metal". On Windows/Linux: CUDA drivers must be installed. Apple Silicon uses Metal automatically.

💻

Windows / WSL2 issues

On Windows, prefer the native .exe installer over WSL2 for GPU access. If using WSL2, ensure CUDA drivers are installed on Windows host and GPU passthrough is configured.

🔗

API not responding to requests

Ensure OLLAMA_HOST=0.0.0.0:11434 if you need to access Ollama from another machine or Docker container. By default, Ollama only binds to localhost (127.0.0.1).

Ollama model VRAM requirements

Model	Parameters	VRAM (Q4_K_M)	Ollama pull name	Speed (Apple M4)
Llama 3.2	3B	~2 GB	llama3.2:3b	~60 tok/s
Llama 3.2	8B	~5 GB	llama3.2	~30 tok/s
Mistral	7B	~4 GB	mistral	~35 tok/s
Gemma 3	12B	~7 GB	gemma3:12b	~20 tok/s
Llama 3.3	70B	~40 GB	llama3.3:70b	~5 tok/s

VRAM estimates at Q4_K_M quantization. CPU RAM needed is roughly the same as VRAM. Speed varies by hardware — M4 Pro is ~2× faster than M1.

Step-by-step fix

1

Check if the server is running

Run curl http://localhost:11434. If you see "Ollama is running", the server is fine. If connection refused: start the server with ollama serve in a terminal. On Mac, click the Ollama menu-bar app to start it. Keep the terminal open — closing it stops the server.
2

Fix port 11434 already in use

On Linux/Mac: lsof -i :11434 finds the blocking process. Kill it with kill -9 {PID}. On Windows: netstat -ano | findstr 11434 then taskkill /PID {PID} /F. If you want Ollama on a different port: OLLAMA_HOST=0.0.0.0:11435 ollama serve.
3

Pull the model before running

If you see "model not found": run ollama pull modelname first. The pull downloads the model (can be 1–40GB). Then verify with ollama list. Model names must be lowercase and match exactly — check ollama.com/models for the exact name.
4

Fix VRAM / out of memory

If the model runs out of VRAM: (1) use a smaller model — ollama pull llama3.2:3b instead of llama3.2 (8B); (2) use a smaller quantization — add :q4_K_M tag; (3) force CPU: OLLAMA_NUM_GPU=0 ollama run modelname (slow but always works).
5

Enable GPU acceleration

Check if Ollama is using GPU: look for "using CUDA" or "using Metal" in ollama serve output. If not: Nvidia — install CUDA toolkit 11.8+ and update Nvidia drivers; run nvidia-smi to verify. AMD (Linux) — use the ROCm-enabled Ollama build. Apple Silicon — Metal is automatic, no config needed.
6

Windows-specific: fix WSL2 GPU passthrough

On Windows, prefer the native Ollama installer (ollama.com/download) over WSL2 for GPU access. If you need WSL2: (1) install CUDA drivers on Windows (not inside WSL2); (2) ensure WSL2 kernel version is 5.10+; (3) inside WSL2, install CUDA toolkit via apt, not the Windows installer. Alternatively, use OLLAMA_NUM_GPU=-1 ollama serve to auto-detect all available GPUs.

Common Ollama error messages

Error	Cause	Fix
connection refused (port 11434)	Ollama not running	ollama serve in terminal
model not found	Not downloaded yet	ollama pull modelname
address already in use	Port 11434 taken	kill existing process (lsof -i :11434)
CUDA out of memory	VRAM too small	Use smaller model or OLLAMA_NUM_GPU=0
failed to load model	Corrupted download	ollama rm modelname && ollama pull modelname
no such model	Wrong model name	Check exact name at ollama.com/models

🔔

Monitor all AI services you use

Track 75+ AI services alongside Ollama — get alerted when OpenAI, Anthropic, or any cloud AI goes down while you're running local models. Free.