Fireworks AI model IDs — what is the correct format?

Fireworks model IDs: accounts/fireworks/models/llama-v3p3-70b-instruct, accounts/fireworks/models/mixtral-8x7b-instruct, accounts/fireworks/models/deepseek-r1, accounts/fireworks/models/phi-3-vision-128k-instruct. Note 'v3p3' (not v3.3) in Llama names. Get the full list at fireworks.ai/models or via GET https://api.fireworks.ai/inference/v1/models.

Fireworks AI vs Groq vs Together AI — which should I use?

Fireworks AI: best for production deployments, speculative decoding, vision models, and custom fine-tuned model hosting. Groq: fastest raw throughput on LPU hardware (750-1500 tok/s) for Llama/Mixtral, but limited model selection. Together AI: widest model catalog including fine-tuning support. All three are OpenAI-compatible — switch by changing base_url and api_key.

Free tier 4 min read

Fireworks AI Not Working?

API 401, model not found (accounts/fireworks/models/ format), rate limit 429, streaming cut off, or speculative decoding errors? Check live status and fix it fast.

Fireworks AI — live status

Updated every 5 minutes. Full history at prismix.dev/service/fireworks.

Full status →

What's wrong? Diagnose fast

🔑

API 401 — authentication failed

Fireworks AI keys start with fw_. Generate at fireworks.ai/account/api-keys. Header: Authorization: Bearer fw_YOUR_KEY. OpenAI SDK: api_key="fw_YOUR_KEY", base_url="https://api.fireworks.ai/inference/v1". Do NOT use https://api.openai.com/v1 as the base URL.

🔍

Model not found (404)

Model IDs require the full path: accounts/fireworks/models/llama-v3p3-70b-instruct. Note "v3p3" not "v3.3" (dots not allowed). Get exact IDs from fireworks.ai/models or GET /v1/models. Custom fine-tuned models use accounts/YOUR_ORG/models/YOUR_MODEL.

⏱

Rate limit 429

Free tier: 10 RPM, 100k tokens/day. Paid: scales with spend. Implement exponential backoff: retry after 2^attempt seconds on 429. For sustained throughput, use a Fireworks Deployment (dedicated GPU allocation) — it bypasses shared rate limits.

📡

Streaming cut off mid-response

Set stream=True in the request body. If the stream cuts off: your HTTP client timeout is too short (default 30s on many libraries). Set timeout to 60-120s. Also check for proxy or CDN intermediaries that buffer SSE streams — bypass with a direct connection.

🔧

Function calling / tools not working

Fireworks uses the OpenAI tools format (not the deprecated functions format). Pass tools=[{"type": "function", "function": {"name": ..., "parameters": ...}}] and tool_choice="auto". Not all models support function calling — check the model card at fireworks.ai/models for "Function Calling" badge.

💸

Account quota / billing issues

Check usage at fireworks.ai/account/billing. Free tier has daily token limits. Paid: pay-as-you-go per token, billed monthly. If you get 402, add a payment method and add credits. Deployments (dedicated GPU) are billed per hour whether idle or active.

Fireworks AI API quick reference

Python (OpenAI SDK drop-in)

from openai import OpenAI

client = OpenAI(
    api_key="fw_YOUR_KEY",
    base_url="https://api.fireworks.ai/inference/v1",
)

response = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

curl (with streaming)

curl -X POST "https://api.fireworks.ai/inference/v1/chat/completions" \
  -H "Authorization: Bearer fw_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "accounts/fireworks/models/llama-v3p3-70b-instruct",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

List available models

curl "https://api.fireworks.ai/inference/v1/models" \
  -H "Authorization: Bearer fw_YOUR_KEY"
# Returns full list with model IDs in accounts/fireworks/models/ format

Popular Fireworks AI model IDs

Model ID	Context	Notes
accounts/fireworks/models/llama-v3p3-70b-instruct	128K	Top general purpose (note v3p3)
accounts/fireworks/models/llama-v3p1-405b-instruct	128K	Largest Llama, highest quality
accounts/fireworks/models/llama-v3p2-11b-vision-instruct	128K	Vision model, multimodal
accounts/fireworks/models/mixtral-8x7b-instruct	32K	Fast, cost-efficient
accounts/fireworks/models/deepseek-r1	128K	Reasoning model (slow, thorough)
accounts/fireworks/models/qwen2p5-72b-instruct	32K	Strong multilingual
accounts/fireworks/models/phi-3-vision-128k-instruct	128K	Small vision model

Step-by-step fix

1

Check live Fireworks AI status

Visit prismix.dev/service/fireworks. Full incident history at that link.
2

Fix API 401 errors

Your key must start with fw_. Generate at fireworks.ai/account/api-keys. Header: Authorization: Bearer fw_YOUR_KEY. OpenAI SDK: api_key="fw_YOUR_KEY", base_url="https://api.fireworks.ai/inference/v1".
3

Fix model not found (404)

Use the full model path: accounts/fireworks/models/llama-v3p3-70b-instruct. Note v3p3 (not v3.3 — dots not valid). Get current model IDs from the table above or GET /v1/models.
4

Fix rate limit 429

Free tier = 10 RPM, 100k tokens/day. Add exponential backoff: catch 429, retry after 2ⁿ seconds (max 32s). For sustained throughput: upgrade at fireworks.ai/pricing, or use a Fireworks Deployment (dedicated GPU, bypasses shared limits).
5

Fix streaming and function calling

Streaming cut off: set HTTP timeout to 120s+. Function calling: use tools parameter (not deprecated functions). Not all models support function calling — check the model card at fireworks.ai/models for the "Function Calling" badge before using.

🔔

Get alerted when Fireworks AI goes down

Star Fireworks AI on Prismix and get emailed the moment status changes. Free, no credit card.