Replicate Not Working?
Replicate prediction stuck (cold start), API 401 error, model version 404, billing failed, or webhook not firing? Check live status and fix it fast.
Replicate — live status
Updated every 5 minutes. Full history at prismix.dev/service/replicate.
What's wrong? Diagnose fast
Prediction slow / cold start
Cold start = container boot after idle. Small models: 5–30s. Large models (70B+): 1–3 min. Solution: Replicate Deployments keeps a warm instance. Accept cold starts for low-traffic use cases.
API 401 error
Token must start with r8_. Header: Authorization: Bearer r8_TOKEN. Generate at replicate.com/account/api-tokens. Check token not revoked. Org tokens require org membership.
Model 404 / not found
Version hash in your URL is outdated. Get latest hash from replicate.com/owner/model → Versions tab. Or omit version hash to use latest automatically.
Billing / 402 error
402 = payment failed or credits exhausted. Check replicate.com/account/billing. Pay-as-you-go — no subscription. Add payment method under Billing. Replicate charges per GPU-second.
Webhook not firing
Webhook URL must be publicly reachable from Replicate. Use ngrok or similar for local testing. Webhook receives POST with prediction object. Verify URL returns 200 — Replicate retries on non-200.
Cog / custom model deploy issues
For Cog model pushes: confirm cog.yaml has correct python_version and run commands. Run cog build locally before push. Check replicate.com/deployments for build logs. GPU type selection affects cost.
Understanding Replicate cold starts
| Model type | Cold start time | Solution |
|---|---|---|
| Small image models (SD 1.5, SDXL) | 5–15s | Accept or use Deployment |
| SDXL-Lightning, SD Turbo | 3–10s | Fast enough for most use cases |
| Llama 3 8B (text) | 15–30s | Use Deployment for production |
| Llama 3 70B (text) | 60–180s | Deployment or async polling required |
| Video gen (Mochi, Kling) | 30–120s | Async polling, long timeout |
| Custom Cog model | Varies (image size) | Optimize image size, Deployment |
Replicate API quick reference
Run a prediction (async)
# Create prediction
curl -X POST https://api.replicate.com/v1/predictions \
-H "Authorization: Bearer r8_YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"version": "stability-ai/sdxl:39ed52f2319f9f807d0e...4d",
"input": {"prompt": "a photorealistic cat on a mountain"}
}'
# Returns: { "id": "abc123", "status": "starting", ... }
# Poll until done
curl https://api.replicate.com/v1/predictions/abc123 \
-H "Authorization: Bearer r8_YOUR_TOKEN"
# status: "starting" | "processing" | "succeeded" | "failed" Get latest model version
curl https://api.replicate.com/v1/models/stability-ai/sdxl/versions \ -H "Authorization: Bearer r8_YOUR_TOKEN" # Returns array of versions sorted newest-first
Step-by-step fix
- 1
Check live Replicate status
Visit prismix.dev/service/replicate. If Replicate is operational and your prediction is slow, it's a cold start — not a platform issue. Large models (70B+) can take up to 3 minutes to warm up.
- 2
Fix slow predictions / cold starts
For production traffic: use Replicate Deployments (replicate.com/deployments) to keep at least one warm instance. For development/low traffic: accept cold starts and implement async polling with a 5-second poll interval. For time-sensitive apps: choose smaller, faster models (SDXL-Lightning instead of SDXL, Llama 3 8B instead of 70B).
- 3
Fix API 401 authentication
Confirm: (1) token starts with
r8_; (2) header isAuthorization: Bearer r8_TOKEN; (3) token has not been revoked. Generate a fresh token at replicate.com/account/api-tokens. If using an organization's model: confirm you have been added to the org. - 4
Fix model 404 / version not found
The version hash embedded in your code is outdated. Go to replicate.com/owner/model → Versions tab → copy the latest version SHA. Update your code. Alternative: omit the version hash in your API call and Replicate will use the latest version automatically (less deterministic but always current).
- 5
Fix billing errors
Check replicate.com/account/billing. Replicate is pay-as-you-go — charged per GPU-second of compute. If payment failed: update the payment method under Billing. If you hit the spending limit: increase it in account settings. Estimated cost per model run is shown on each model page under "Run time and cost".
Get alerted when Replicate goes down
Star Replicate on Prismix and get emailed the moment status changes. Free, no credit card.
Frequently asked questions
Why is Replicate not working?
Replicate issues: (1) cold start delay (30s–3min for large models — use Deployments); (2) API 401 (token must be r8_, header Authorization: Bearer); (3) model 404 (version hash outdated — get latest from Versions tab); (4) billing error (check replicate.com/account/billing); (5) outage (check prismix.dev/service/replicate).
Is Replicate down right now?
Check prismix.dev/service/replicate for live status. Cold start slowness is not a true outage — the model container is warming up.
Why is my Replicate prediction taking so long?
Cold starts are the main cause of slow predictions. Large models (70B+): 1–3 minutes. Medium (7–13B): 15–60s. Small (image gen): 5–15s. Use Replicate Deployments for production to keep warm instances. Use async polling with 5s intervals rather than synchronous calls with timeouts.
Replicate API 401 — how to fix?
API 401: (1) token must start with r8_ — the format changed from older tokens; (2) header must be Authorization: Bearer r8_TOKEN — not Token; (3) generate a new token at replicate.com/account/api-tokens if revoked; (4) for org models, confirm you are a member of that org.
Replicate model 404 — how to fix?
Replicate 404: the version hash in your code is outdated. Get the current version hash from replicate.com/owner/model → Versions tab → latest version SHA. Update your API call. Alternatively, call the model without specifying a version hash — Replicate uses the latest version automatically, trading determinism for currency.