Free 4 min read

Together AI Not Working?

Together AI API returning 401, 429 rate limit, model not found (owner/model format), fine-tuning job failing, or inference slow? Check live status and fix it fast.

Together AI — live status

Updated every 5 minutes. Full history at prismix.dev/service/together.

Full status →

What's wrong? Diagnose fast

🔑

API 401 — unauthorized

Key from api.together.xyz/settings/api-keys. Header: Authorization: Bearer YOUR_KEY. OpenAI SDK: set base_url="https://api.together.xyz/v1". Together keys have no fixed prefix (unlike gsk_ for Groq). No expiry by default — if 401 persists, regenerate your key.

🚫

429 — rate limit

Free tier: 60 requests/min, 1M tokens/day shared across models. Paid plans unlock higher limits. Implement exponential backoff on 429. Check current usage at api.together.xyz/settings/api-keys.

🔍

Model not found (404)

Model IDs are owner/model-name format: meta-llama/Llama-3.3-70B-Instruct-Turbo. Names are case-sensitive. Get current list from api.together.xyz/models or GET /v1/models. Outdated model IDs (from tutorials/docs) may have been deprecated.

🔧

Fine-tuning job failing

JSONL format required: {"messages": [{"role": "user", ...}, {"role": "assistant", ...}]}. Minimum 10 examples, 100+ recommended. Check base model supports fine-tuning at api.together.xyz/playground. Poll job status via GET /v1/fine-tunes/{id}.

⏱

Slow inference

Together AI speeds vary by model: Turbo variants (Llama-3.3-70B-Instruct-Turbo) are faster than standard variants. Cold starts add 1-3s on first request. For latency-critical use: Groq is faster on the same models if they are available there.

📏

Context length exceeded

Check model context window: LLaMA 3.3 70B = 128K, Mixtral 8x7B = 32K, older LLaMA 2 = 4K. Use the exact context window for the model version — LLaMA 2 vs 3 have very different context limits. GET /v1/models returns context_length per model.

Together AI API quick reference

curl (OpenAI-compatible)

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
    "messages": [{"role": "user", "content": "Hello!"}],
    "temperature": 0.7,
    "max_tokens": 512
  }'

Python (OpenAI SDK drop-in)

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_TOGETHER_API_KEY",
    base_url="https://api.together.xyz/v1",
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Hello!"}],
)

List available models

curl "https://api.together.xyz/v1/models" \
  -H "Authorization: Bearer YOUR_TOGETHER_API_KEY" | jq '.[].id'

Popular Together AI model IDs

Model	Model ID	Context
LLaMA 3.3 70B Turbo	meta-llama/Llama-3.3-70B-Instruct-Turbo	128K
LLaMA 3.1 8B Turbo	meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo	128K
LLaMA 3.1 70B Turbo	meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo	128K
Mixtral 8x7B	mistralai/Mixtral-8x7B-Instruct-v0.1	32K
Qwen 2.5 72B Turbo	Qwen/Qwen2.5-72B-Instruct-Turbo	128K
Gemma 2 9B	google/gemma-2-9b-it	8K

200+ models available. Model IDs are case-sensitive. Get the full current list: GET /v1/models. Models are added and deprecated regularly.

Step-by-step fix

1

Check live Together AI status

Visit prismix.dev/service/together and status.together.ai. If operational, the issue is local configuration.
2

Fix API 401 errors

Generate your key at api.together.xyz/settings/api-keys. Header: Authorization: Bearer YOUR_KEY. OpenAI SDK: set base_url="https://api.together.xyz/v1". Together keys have no fixed prefix. If 401 persists with a seemingly valid key, regenerate it (keys can become invalid if regenerated from another session).
3

Fix 429 rate limit errors

Free tier: 60 requests/min, 1M tokens/day. Add exponential backoff: start at 1s, double on each retry, max 32s. To permanently increase: visit api.together.xyz/settings/billing and add a payment method for pay-as-you-go higher limits.
4

Fix model not found errors

Model IDs are case-sensitive owner/model-name strings. Get the current list: GET https://api.together.xyz/v1/models. Common mistake: using short names (llama3) or OpenAI-style names (gpt-4). Use exact IDs like meta-llama/Llama-3.3-70B-Instruct-Turbo.
5

Fix fine-tuning job failures

Training data must be valid JSONL — one JSON object per line. Format: {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}. Minimum 10 training examples required. Check job status via GET /v1/fine-tunes/{id} — error details are in the status response.

🔔

Get alerted when Together AI goes down

Star Together AI on Prismix and get emailed the moment status changes. Free, no credit card.