Fireworks AI Not Working?
API 401, model not found (accounts/fireworks/models/ format), rate limit 429, streaming cut off, or speculative decoding errors? Check live status and fix it fast.
Fireworks AI — live status
Updated every 5 minutes. Full history at prismix.dev/service/fireworks.
What's wrong? Diagnose fast
API 401 — authentication failed
Fireworks AI keys start with fw_. Generate at fireworks.ai/account/api-keys. Header: Authorization: Bearer fw_YOUR_KEY. OpenAI SDK: api_key="fw_YOUR_KEY", base_url="https://api.fireworks.ai/inference/v1". Do NOT use https://api.openai.com/v1 as the base URL.
Model not found (404)
Model IDs require the full path: accounts/fireworks/models/llama-v3p3-70b-instruct. Note "v3p3" not "v3.3" (dots not allowed). Get exact IDs from fireworks.ai/models or GET /v1/models. Custom fine-tuned models use accounts/YOUR_ORG/models/YOUR_MODEL.
Rate limit 429
Free tier: 10 RPM, 100k tokens/day. Paid: scales with spend. Implement exponential backoff: retry after 2^attempt seconds on 429. For sustained throughput, use a Fireworks Deployment (dedicated GPU allocation) — it bypasses shared rate limits.
Streaming cut off mid-response
Set stream=True in the request body. If the stream cuts off: your HTTP client timeout is too short (default 30s on many libraries). Set timeout to 60-120s. Also check for proxy or CDN intermediaries that buffer SSE streams — bypass with a direct connection.
Function calling / tools not working
Fireworks uses the OpenAI tools format (not the deprecated functions format). Pass tools=[{"type": "function", "function": {"name": ..., "parameters": ...}}] and tool_choice="auto". Not all models support function calling — check the model card at fireworks.ai/models for "Function Calling" badge.
Account quota / billing issues
Check usage at fireworks.ai/account/billing. Free tier has daily token limits. Paid: pay-as-you-go per token, billed monthly. If you get 402, add a payment method and add credits. Deployments (dedicated GPU) are billed per hour whether idle or active.
Fireworks AI API quick reference
Python (OpenAI SDK drop-in)
from openai import OpenAI
client = OpenAI(
api_key="fw_YOUR_KEY",
base_url="https://api.fireworks.ai/inference/v1",
)
response = client.chat.completions.create(
model="accounts/fireworks/models/llama-v3p3-70b-instruct",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content) curl (with streaming)
curl -X POST "https://api.fireworks.ai/inference/v1/chat/completions" \
-H "Authorization: Bearer fw_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "accounts/fireworks/models/llama-v3p3-70b-instruct",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}' List available models
curl "https://api.fireworks.ai/inference/v1/models" \ -H "Authorization: Bearer fw_YOUR_KEY" # Returns full list with model IDs in accounts/fireworks/models/ format
Popular Fireworks AI model IDs
| Model ID | Context | Notes |
|---|---|---|
| accounts/fireworks/models/llama-v3p3-70b-instruct | 128K | Top general purpose (note v3p3) |
| accounts/fireworks/models/llama-v3p1-405b-instruct | 128K | Largest Llama, highest quality |
| accounts/fireworks/models/llama-v3p2-11b-vision-instruct | 128K | Vision model, multimodal |
| accounts/fireworks/models/mixtral-8x7b-instruct | 32K | Fast, cost-efficient |
| accounts/fireworks/models/deepseek-r1 | 128K | Reasoning model (slow, thorough) |
| accounts/fireworks/models/qwen2p5-72b-instruct | 32K | Strong multilingual |
| accounts/fireworks/models/phi-3-vision-128k-instruct | 128K | Small vision model |
Step-by-step fix
- 1
Check live Fireworks AI status
Visit prismix.dev/service/fireworks. Full incident history at that link.
- 2
Fix API 401 errors
Your key must start with
fw_. Generate at fireworks.ai/account/api-keys. Header:Authorization: Bearer fw_YOUR_KEY. OpenAI SDK:api_key="fw_YOUR_KEY",base_url="https://api.fireworks.ai/inference/v1". - 3
Fix model not found (404)
Use the full model path:
accounts/fireworks/models/llama-v3p3-70b-instruct. Notev3p3(not v3.3 — dots not valid). Get current model IDs from the table above orGET /v1/models. - 4
Fix rate limit 429
Free tier = 10 RPM, 100k tokens/day. Add exponential backoff: catch 429, retry after 2n seconds (max 32s). For sustained throughput: upgrade at fireworks.ai/pricing, or use a Fireworks Deployment (dedicated GPU, bypasses shared limits).
- 5
Fix streaming and function calling
Streaming cut off: set HTTP timeout to 120s+. Function calling: use
toolsparameter (not deprecatedfunctions). Not all models support function calling — check the model card at fireworks.ai/models for the "Function Calling" badge before using.
Get alerted when Fireworks AI goes down
Star Fireworks AI on Prismix and get emailed the moment status changes. Free, no credit card.
Frequently asked questions
Why is Fireworks AI not working?
Fireworks AI issues: (1) 401 (fw_ key, Authorization: Bearer fw_KEY, base_url https://api.fireworks.ai/inference/v1); (2) model 404 (use full path accounts/fireworks/models/MODEL_NAME — browse at fireworks.ai/models); (3) rate limit 429 (free 10 RPM/100k daily — backoff or upgrade); (4) streaming cut off (set HTTP timeout to 120s+); (5) outage (check prismix.dev/service/fireworks).
Is Fireworks AI down right now?
Check prismix.dev/service/fireworks for live Fireworks AI status. Also see status.fireworks.ai.
Fireworks AI API 401 — how to fix?
Fireworks AI 401: key must start with fw_. Generate at fireworks.ai/account/api-keys. Header: Authorization: Bearer fw_YOUR_KEY. OpenAI SDK: api_key="fw_YOUR_KEY", base_url="https://api.fireworks.ai/inference/v1". Common mistake: using the OpenAI base URL (api.openai.com/v1) instead of Fireworks'.
Fireworks AI model IDs — correct format?
Fireworks model IDs use full path: accounts/fireworks/models/llama-v3p3-70b-instruct. "v3p3" means Llama 3.3 (dots replaced with p). Common IDs: llama-v3p1-405b-instruct, mixtral-8x7b-instruct, deepseek-r1, qwen2p5-72b-instruct. Full list at fireworks.ai/models or GET https://api.fireworks.ai/inference/v1/models.
Fireworks AI vs Groq vs Together AI — which is best?
Fireworks AI: best for speculative decoding, vision models, large model selection (Llama 405B, DeepSeek R1), and custom fine-tuned model hosting via Deployments. Groq: best raw speed (750-1500 tok/s on LPU), small model catalog. Together AI: best model catalog breadth including fine-tuning. All three are OpenAI-SDK compatible — switch by changing base_url and api_key.