Free tier 4 min read

Fireworks AI Not Working?

API 401, model not found (accounts/fireworks/models/ format), rate limit 429, streaming cut off, or speculative decoding errors? Check live status and fix it fast.

Fireworks AI live status

Fireworks AI — live status

Updated every 5 minutes. Full history at prismix.dev/service/fireworks.

Full status →

What's wrong? Diagnose fast

🔑

API 401 — authentication failed

Fireworks AI keys start with fw_. Generate at fireworks.ai/account/api-keys. Header: Authorization: Bearer fw_YOUR_KEY. OpenAI SDK: api_key="fw_YOUR_KEY", base_url="https://api.fireworks.ai/inference/v1". Do NOT use https://api.openai.com/v1 as the base URL.

🔍

Model not found (404)

Model IDs require the full path: accounts/fireworks/models/llama-v3p3-70b-instruct. Note "v3p3" not "v3.3" (dots not allowed). Get exact IDs from fireworks.ai/models or GET /v1/models. Custom fine-tuned models use accounts/YOUR_ORG/models/YOUR_MODEL.

Rate limit 429

Free tier: 10 RPM, 100k tokens/day. Paid: scales with spend. Implement exponential backoff: retry after 2^attempt seconds on 429. For sustained throughput, use a Fireworks Deployment (dedicated GPU allocation) — it bypasses shared rate limits.

📡

Streaming cut off mid-response

Set stream=True in the request body. If the stream cuts off: your HTTP client timeout is too short (default 30s on many libraries). Set timeout to 60-120s. Also check for proxy or CDN intermediaries that buffer SSE streams — bypass with a direct connection.

🔧

Function calling / tools not working

Fireworks uses the OpenAI tools format (not the deprecated functions format). Pass tools=[{"type": "function", "function": {"name": ..., "parameters": ...}}] and tool_choice="auto". Not all models support function calling — check the model card at fireworks.ai/models for "Function Calling" badge.

💸

Account quota / billing issues

Check usage at fireworks.ai/account/billing. Free tier has daily token limits. Paid: pay-as-you-go per token, billed monthly. If you get 402, add a payment method and add credits. Deployments (dedicated GPU) are billed per hour whether idle or active.

Fireworks AI API quick reference

Python (OpenAI SDK drop-in)

from openai import OpenAI

client = OpenAI(
    api_key="fw_YOUR_KEY",
    base_url="https://api.fireworks.ai/inference/v1",
)

response = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

curl (with streaming)

curl -X POST "https://api.fireworks.ai/inference/v1/chat/completions" \
  -H "Authorization: Bearer fw_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "accounts/fireworks/models/llama-v3p3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

List available models

curl "https://api.fireworks.ai/inference/v1/models" \
  -H "Authorization: Bearer fw_YOUR_KEY"
# Returns full list with model IDs in accounts/fireworks/models/ format

Popular Fireworks AI model IDs

Model ID Context Notes
accounts/fireworks/models/llama-v3p3-70b-instruct 128K Top general purpose (note v3p3)
accounts/fireworks/models/llama-v3p1-405b-instruct 128K Largest Llama, highest quality
accounts/fireworks/models/llama-v3p2-11b-vision-instruct 128K Vision model, multimodal
accounts/fireworks/models/mixtral-8x7b-instruct 32K Fast, cost-efficient
accounts/fireworks/models/deepseek-r1 128K Reasoning model (slow, thorough)
accounts/fireworks/models/qwen2p5-72b-instruct 32K Strong multilingual
accounts/fireworks/models/phi-3-vision-128k-instruct 128K Small vision model

Step-by-step fix

  1. 1

    Check live Fireworks AI status

    Visit prismix.dev/service/fireworks. Full incident history at that link.

  2. 2

    Fix API 401 errors

    Your key must start with fw_. Generate at fireworks.ai/account/api-keys. Header: Authorization: Bearer fw_YOUR_KEY. OpenAI SDK: api_key="fw_YOUR_KEY", base_url="https://api.fireworks.ai/inference/v1".

  3. 3

    Fix model not found (404)

    Use the full model path: accounts/fireworks/models/llama-v3p3-70b-instruct. Note v3p3 (not v3.3 — dots not valid). Get current model IDs from the table above or GET /v1/models.

  4. 4

    Fix rate limit 429

    Free tier = 10 RPM, 100k tokens/day. Add exponential backoff: catch 429, retry after 2n seconds (max 32s). For sustained throughput: upgrade at fireworks.ai/pricing, or use a Fireworks Deployment (dedicated GPU, bypasses shared limits).

  5. 5

    Fix streaming and function calling

    Streaming cut off: set HTTP timeout to 120s+. Function calling: use tools parameter (not deprecated functions). Not all models support function calling — check the model card at fireworks.ai/models for the "Function Calling" badge before using.

🔔

Get alerted when Fireworks AI goes down

Star Fireworks AI on Prismix and get emailed the moment status changes. Free, no credit card.

Frequently asked questions

Why is Fireworks AI not working?

Fireworks AI issues: (1) 401 (fw_ key, Authorization: Bearer fw_KEY, base_url https://api.fireworks.ai/inference/v1); (2) model 404 (use full path accounts/fireworks/models/MODEL_NAME — browse at fireworks.ai/models); (3) rate limit 429 (free 10 RPM/100k daily — backoff or upgrade); (4) streaming cut off (set HTTP timeout to 120s+); (5) outage (check prismix.dev/service/fireworks).

Is Fireworks AI down right now?

Check prismix.dev/service/fireworks for live Fireworks AI status. Also see status.fireworks.ai.

Fireworks AI API 401 — how to fix?

Fireworks AI 401: key must start with fw_. Generate at fireworks.ai/account/api-keys. Header: Authorization: Bearer fw_YOUR_KEY. OpenAI SDK: api_key="fw_YOUR_KEY", base_url="https://api.fireworks.ai/inference/v1". Common mistake: using the OpenAI base URL (api.openai.com/v1) instead of Fireworks'.

Fireworks AI model IDs — correct format?

Fireworks model IDs use full path: accounts/fireworks/models/llama-v3p3-70b-instruct. "v3p3" means Llama 3.3 (dots replaced with p). Common IDs: llama-v3p1-405b-instruct, mixtral-8x7b-instruct, deepseek-r1, qwen2p5-72b-instruct. Full list at fireworks.ai/models or GET https://api.fireworks.ai/inference/v1/models.

Fireworks AI vs Groq vs Together AI — which is best?

Fireworks AI: best for speculative decoding, vision models, large model selection (Llama 405B, DeepSeek R1), and custom fine-tuned model hosting via Deployments. Groq: best raw speed (750-1500 tok/s on LPU), small model catalog. Together AI: best model catalog breadth including fine-tuning. All three are OpenAI-SDK compatible — switch by changing base_url and api_key.

Related AI inference APIs