Free 3 min read
Is Ollama Down?
Check live Ollama status — local LLM runner for Llama, Mistral, Gemma, and dozens of open models. Troubleshoot server connection issues, slow generation, and OpenAI-compatible API problems. Set up free email alerts.
Ollama — live status
Ollama runs locally — this badge tracks the ollama.ai service availability. Full history at prismix.dev/service/ollama.
Quick check: is Ollama responding right now?
- Prismix: prismix.dev/service/ollama — live status + 30-day uptime + incidents.
- API call:
curl https://prismix.dev/api/v1/statuses | jq '.services[] | select(.id=="ollama")' - Direct test (local):
curl http://localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"hello"}'— a streaming response confirms your local Ollama server is running.
Common causes of "Ollama not working"
- "Could not connect to Ollama server" — Ollama runs a local server at
localhost:11434; start it withollama serve(or via the menu bar app on macOS/Windows). Verify it's running withcurl http://localhost:11434/api/tags. On Linux without the desktop app, you must runollama servemanually or configure it as a systemd service. - Model not found (404 when calling API) — Models must be downloaded before use:
ollama pull llama3.2. List available models withollama list. Model names are case-sensitive and version-specific —llama3is not the same asllama3.2; use the exact name shown byollama list. - Very slow generation on CPU — Ollama uses GPU acceleration automatically when available (NVIDIA/AMD via CUDA/ROCm, Apple Silicon via Metal). If generation is painfully slow (under 1 token/sec), check whether the GPU is being used:
OLLAMA_DEBUG=1 ollama run model. On macOS, GPU acceleration requires Apple Silicon (M1 or later). - Out of memory (OOM) or model unloads mid-response — Large models (13B+) require significant VRAM. Ollama automatically unloads models when VRAM is full; reduce model size by switching to a smaller quantization (try
Q4_K_Minstead ofQ8_0). On machines with less than 8 GB VRAM, stick to 7B-class models. SetOLLAMA_MAX_LOADED_MODELS=1to prevent multiple models from loading simultaneously. - Port 11434 already in use — Another process is occupying the port. Run
ollama stopto kill the running daemon, then verify withnetstat -an | grep 11434. On macOS, the menu bar app starts automatically at login — quit it before runningollama servemanually. - OpenAI-compatible API not working with third-party apps — Ollama exposes OpenAI-compatible endpoints at
localhost:11434/v1; set your app'sbase_urltohttp://localhost:11434/v1and pass any string as the API key (Ollama ignores it). Not all OpenAI features are supported — function calling is absent in many older models, and embeddings require pulling a dedicated embedding model separately.
Set up free email alerts for Ollama
- 1
Sign in to Prismix
Go to prismix.dev/sign-in — email OTP or GitHub sign-in.
- 2
Star Ollama
On prismix.dev/service/ollama, click the ☆ star icon.
- 3
Alerts are live
You'll get an email within minutes of any status change.
🔔
Stop manually checking — get alerts instead
Star Ollama on Prismix and get emailed the moment status changes. Free, no credit card.
Monitor other LLM APIs?
Full status dashboard: prismix.dev/status