Free 3 min read

Is Ollama Down?

Check live Ollama status — local LLM runner for Llama, Mistral, Gemma, and dozens of open models. Troubleshoot server connection issues, slow generation, and OpenAI-compatible API problems. Set up free email alerts.

Ollama live status

Ollama — live status

Ollama runs locally — this badge tracks the ollama.ai service availability. Full history at prismix.dev/service/ollama.

Full status →

Quick check: is Ollama responding right now?

  1. Prismix: prismix.dev/service/ollama — live status + 30-day uptime + incidents.
  2. API call: curl https://prismix.dev/api/v1/statuses | jq '.services[] | select(.id=="ollama")'
  3. Direct test (local): curl http://localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"hello"}' — a streaming response confirms your local Ollama server is running.

Common causes of "Ollama not working"

  • "Could not connect to Ollama server" — Ollama runs a local server at localhost:11434; start it with ollama serve (or via the menu bar app on macOS/Windows). Verify it's running with curl http://localhost:11434/api/tags. On Linux without the desktop app, you must run ollama serve manually or configure it as a systemd service.
  • Model not found (404 when calling API) — Models must be downloaded before use: ollama pull llama3.2. List available models with ollama list. Model names are case-sensitive and version-specific — llama3 is not the same as llama3.2; use the exact name shown by ollama list.
  • Very slow generation on CPU — Ollama uses GPU acceleration automatically when available (NVIDIA/AMD via CUDA/ROCm, Apple Silicon via Metal). If generation is painfully slow (under 1 token/sec), check whether the GPU is being used: OLLAMA_DEBUG=1 ollama run model. On macOS, GPU acceleration requires Apple Silicon (M1 or later).
  • Out of memory (OOM) or model unloads mid-response — Large models (13B+) require significant VRAM. Ollama automatically unloads models when VRAM is full; reduce model size by switching to a smaller quantization (try Q4_K_M instead of Q8_0). On machines with less than 8 GB VRAM, stick to 7B-class models. Set OLLAMA_MAX_LOADED_MODELS=1 to prevent multiple models from loading simultaneously.
  • Port 11434 already in use — Another process is occupying the port. Run ollama stop to kill the running daemon, then verify with netstat -an | grep 11434. On macOS, the menu bar app starts automatically at login — quit it before running ollama serve manually.
  • OpenAI-compatible API not working with third-party apps — Ollama exposes OpenAI-compatible endpoints at localhost:11434/v1; set your app's base_url to http://localhost:11434/v1 and pass any string as the API key (Ollama ignores it). Not all OpenAI features are supported — function calling is absent in many older models, and embeddings require pulling a dedicated embedding model separately.

Set up free email alerts for Ollama

  1. 1

    Sign in to Prismix

    Go to prismix.dev/sign-in — email OTP or GitHub sign-in.

  2. 2

    Star Ollama

    On prismix.dev/service/ollama, click the ☆ star icon.

  3. 3

    Alerts are live

    You'll get an email within minutes of any status change.

🔔

Stop manually checking — get alerts instead

Star Ollama on Prismix and get emailed the moment status changes. Free, no credit card.

Monitor other LLM APIs?

Full status dashboard: prismix.dev/status