Together AI Not Working?
Together AI API returning 401, 429 rate limit, model not found (owner/model format), fine-tuning job failing, or inference slow? Check live status and fix it fast.
Together AI — live status
Updated every 5 minutes. Full history at prismix.dev/service/together.
What's wrong? Diagnose fast
API 401 — unauthorized
Key from api.together.xyz/settings/api-keys. Header: Authorization: Bearer YOUR_KEY. OpenAI SDK: set base_url="https://api.together.xyz/v1". Together keys have no fixed prefix (unlike gsk_ for Groq). No expiry by default — if 401 persists, regenerate your key.
429 — rate limit
Free tier: 60 requests/min, 1M tokens/day shared across models. Paid plans unlock higher limits. Implement exponential backoff on 429. Check current usage at api.together.xyz/settings/api-keys.
Model not found (404)
Model IDs are owner/model-name format: meta-llama/Llama-3.3-70B-Instruct-Turbo. Names are case-sensitive. Get current list from api.together.xyz/models or GET /v1/models. Outdated model IDs (from tutorials/docs) may have been deprecated.
Fine-tuning job failing
JSONL format required: {"messages": [{"role": "user", ...}, {"role": "assistant", ...}]}. Minimum 10 examples, 100+ recommended. Check base model supports fine-tuning at api.together.xyz/playground. Poll job status via GET /v1/fine-tunes/{id}.
Slow inference
Together AI speeds vary by model: Turbo variants (Llama-3.3-70B-Instruct-Turbo) are faster than standard variants. Cold starts add 1-3s on first request. For latency-critical use: Groq is faster on the same models if they are available there.
Context length exceeded
Check model context window: LLaMA 3.3 70B = 128K, Mixtral 8x7B = 32K, older LLaMA 2 = 4K. Use the exact context window for the model version — LLaMA 2 vs 3 have very different context limits. GET /v1/models returns context_length per model.
Together AI API quick reference
curl (OpenAI-compatible)
curl -X POST "https://api.together.xyz/v1/chat/completions" \
-H "Authorization: Bearer YOUR_TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
"messages": [{"role": "user", "content": "Hello!"}],
"temperature": 0.7,
"max_tokens": 512
}' Python (OpenAI SDK drop-in)
from openai import OpenAI
client = OpenAI(
api_key="YOUR_TOGETHER_API_KEY",
base_url="https://api.together.xyz/v1",
)
response = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=[{"role": "user", "content": "Hello!"}],
) List available models
curl "https://api.together.xyz/v1/models" \ -H "Authorization: Bearer YOUR_TOGETHER_API_KEY" | jq '.[].id'
Popular Together AI model IDs
| Model | Model ID | Context |
|---|---|---|
| LLaMA 3.3 70B Turbo | meta-llama/Llama-3.3-70B-Instruct-Turbo | 128K |
| LLaMA 3.1 8B Turbo | meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo | 128K |
| LLaMA 3.1 70B Turbo | meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo | 128K |
| Mixtral 8x7B | mistralai/Mixtral-8x7B-Instruct-v0.1 | 32K |
| Qwen 2.5 72B Turbo | Qwen/Qwen2.5-72B-Instruct-Turbo | 128K |
| Gemma 2 9B | google/gemma-2-9b-it | 8K |
200+ models available. Model IDs are case-sensitive. Get the full current list: GET /v1/models. Models are added and deprecated regularly.
Step-by-step fix
- 1
Check live Together AI status
Visit prismix.dev/service/together and status.together.ai. If operational, the issue is local configuration.
- 2
Fix API 401 errors
Generate your key at api.together.xyz/settings/api-keys. Header:
Authorization: Bearer YOUR_KEY. OpenAI SDK: setbase_url="https://api.together.xyz/v1". Together keys have no fixed prefix. If 401 persists with a seemingly valid key, regenerate it (keys can become invalid if regenerated from another session). - 3
Fix 429 rate limit errors
Free tier: 60 requests/min, 1M tokens/day. Add exponential backoff: start at 1s, double on each retry, max 32s. To permanently increase: visit api.together.xyz/settings/billing and add a payment method for pay-as-you-go higher limits.
- 4
Fix model not found errors
Model IDs are case-sensitive owner/model-name strings. Get the current list:
GET https://api.together.xyz/v1/models. Common mistake: using short names (llama3) or OpenAI-style names (gpt-4). Use exact IDs likemeta-llama/Llama-3.3-70B-Instruct-Turbo. - 5
Fix fine-tuning job failures
Training data must be valid JSONL — one JSON object per line. Format:
{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}. Minimum 10 training examples required. Check job status viaGET /v1/fine-tunes/{id}— error details are in the status response.
Get alerted when Together AI goes down
Star Together AI on Prismix and get emailed the moment status changes. Free, no credit card.
Frequently asked questions
Why is Together AI not working?
Together AI issues: (1) 401 (key from api.together.xyz/settings/api-keys, header: Authorization: Bearer YOUR_KEY); (2) 429 (free: 60 RPM, 1M tokens/day); (3) model 404 (case-sensitive owner/model-name format — get list from /v1/models); (4) fine-tuning failure (JSONL format, min 10 examples); (5) outage (check prismix.dev/service/together).
Is Together AI down right now?
Check prismix.dev/service/together for live Together AI status. Also see status.together.ai for official incident reports.
How to use Together AI with the OpenAI Python SDK?
Together AI is OpenAI-compatible. Python: from openai import OpenAI; client = OpenAI(api_key="YOUR_KEY", base_url="https://api.together.xyz/v1"). Use Together model IDs (meta-llama/Llama-3.3-70B-Instruct-Turbo) not OpenAI names (gpt-4). The Together SDK (pip install together) provides the same functionality with Together-specific defaults.
Together AI model IDs — what format?
Together AI model IDs follow owner/model-name format. Examples: meta-llama/Llama-3.3-70B-Instruct-Turbo, mistralai/Mixtral-8x7B-Instruct-v0.1, Qwen/Qwen2.5-72B-Instruct-Turbo. Names are case-sensitive. Get the full current list with: GET https://api.together.xyz/v1/models (requires auth). The model ID in the API response is the exact string to use.
Together AI vs Groq vs Fireworks AI — which is best?
Groq is fastest (LPU hardware, 750-1500 tok/s) but has the smallest model catalog and stricter rate limits. Together AI has 200+ models including LLaMA, Mixtral, Qwen, Falcon, Code Llama, and supports fine-tuning. Fireworks AI has strong production SLAs and competitive speeds. For model breadth and fine-tuning, Together AI is strongest. For raw inference speed on open-source models, Groq wins. For production reliability, Fireworks is solid.