Free 4 min read

Together AI Not Working?

Together AI API returning 401, 429 rate limit, model not found (owner/model format), fine-tuning job failing, or inference slow? Check live status and fix it fast.

Together AI live status

Together AI — live status

Updated every 5 minutes. Full history at prismix.dev/service/together.

Full status →

What's wrong? Diagnose fast

🔑

API 401 — unauthorized

Key from api.together.xyz/settings/api-keys. Header: Authorization: Bearer YOUR_KEY. OpenAI SDK: set base_url="https://api.together.xyz/v1". Together keys have no fixed prefix (unlike gsk_ for Groq). No expiry by default — if 401 persists, regenerate your key.

🚫

429 — rate limit

Free tier: 60 requests/min, 1M tokens/day shared across models. Paid plans unlock higher limits. Implement exponential backoff on 429. Check current usage at api.together.xyz/settings/api-keys.

🔍

Model not found (404)

Model IDs are owner/model-name format: meta-llama/Llama-3.3-70B-Instruct-Turbo. Names are case-sensitive. Get current list from api.together.xyz/models or GET /v1/models. Outdated model IDs (from tutorials/docs) may have been deprecated.

🔧

Fine-tuning job failing

JSONL format required: {"messages": [{"role": "user", ...}, {"role": "assistant", ...}]}. Minimum 10 examples, 100+ recommended. Check base model supports fine-tuning at api.together.xyz/playground. Poll job status via GET /v1/fine-tunes/{id}.

Slow inference

Together AI speeds vary by model: Turbo variants (Llama-3.3-70B-Instruct-Turbo) are faster than standard variants. Cold starts add 1-3s on first request. For latency-critical use: Groq is faster on the same models if they are available there.

📏

Context length exceeded

Check model context window: LLaMA 3.3 70B = 128K, Mixtral 8x7B = 32K, older LLaMA 2 = 4K. Use the exact context window for the model version — LLaMA 2 vs 3 have very different context limits. GET /v1/models returns context_length per model.

Together AI API quick reference

curl (OpenAI-compatible)

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
    "messages": [{"role": "user", "content": "Hello!"}],
    "temperature": 0.7,
    "max_tokens": 512
  }'

Python (OpenAI SDK drop-in)

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_TOGETHER_API_KEY",
    base_url="https://api.together.xyz/v1",
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Hello!"}],
)

List available models

curl "https://api.together.xyz/v1/models" \
  -H "Authorization: Bearer YOUR_TOGETHER_API_KEY" | jq '.[].id'

Popular Together AI model IDs

Model Model ID Context
LLaMA 3.3 70B Turbo meta-llama/Llama-3.3-70B-Instruct-Turbo 128K
LLaMA 3.1 8B Turbo meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo 128K
LLaMA 3.1 70B Turbo meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo 128K
Mixtral 8x7B mistralai/Mixtral-8x7B-Instruct-v0.1 32K
Qwen 2.5 72B Turbo Qwen/Qwen2.5-72B-Instruct-Turbo 128K
Gemma 2 9B google/gemma-2-9b-it 8K

200+ models available. Model IDs are case-sensitive. Get the full current list: GET /v1/models. Models are added and deprecated regularly.

Step-by-step fix

  1. 1

    Check live Together AI status

    Visit prismix.dev/service/together and status.together.ai. If operational, the issue is local configuration.

  2. 2

    Fix API 401 errors

    Generate your key at api.together.xyz/settings/api-keys. Header: Authorization: Bearer YOUR_KEY. OpenAI SDK: set base_url="https://api.together.xyz/v1". Together keys have no fixed prefix. If 401 persists with a seemingly valid key, regenerate it (keys can become invalid if regenerated from another session).

  3. 3

    Fix 429 rate limit errors

    Free tier: 60 requests/min, 1M tokens/day. Add exponential backoff: start at 1s, double on each retry, max 32s. To permanently increase: visit api.together.xyz/settings/billing and add a payment method for pay-as-you-go higher limits.

  4. 4

    Fix model not found errors

    Model IDs are case-sensitive owner/model-name strings. Get the current list: GET https://api.together.xyz/v1/models. Common mistake: using short names (llama3) or OpenAI-style names (gpt-4). Use exact IDs like meta-llama/Llama-3.3-70B-Instruct-Turbo.

  5. 5

    Fix fine-tuning job failures

    Training data must be valid JSONL — one JSON object per line. Format: {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}. Minimum 10 training examples required. Check job status via GET /v1/fine-tunes/{id} — error details are in the status response.

🔔

Get alerted when Together AI goes down

Star Together AI on Prismix and get emailed the moment status changes. Free, no credit card.

Frequently asked questions

Why is Together AI not working?

Together AI issues: (1) 401 (key from api.together.xyz/settings/api-keys, header: Authorization: Bearer YOUR_KEY); (2) 429 (free: 60 RPM, 1M tokens/day); (3) model 404 (case-sensitive owner/model-name format — get list from /v1/models); (4) fine-tuning failure (JSONL format, min 10 examples); (5) outage (check prismix.dev/service/together).

Is Together AI down right now?

Check prismix.dev/service/together for live Together AI status. Also see status.together.ai for official incident reports.

How to use Together AI with the OpenAI Python SDK?

Together AI is OpenAI-compatible. Python: from openai import OpenAI; client = OpenAI(api_key="YOUR_KEY", base_url="https://api.together.xyz/v1"). Use Together model IDs (meta-llama/Llama-3.3-70B-Instruct-Turbo) not OpenAI names (gpt-4). The Together SDK (pip install together) provides the same functionality with Together-specific defaults.

Together AI model IDs — what format?

Together AI model IDs follow owner/model-name format. Examples: meta-llama/Llama-3.3-70B-Instruct-Turbo, mistralai/Mixtral-8x7B-Instruct-v0.1, Qwen/Qwen2.5-72B-Instruct-Turbo. Names are case-sensitive. Get the full current list with: GET https://api.together.xyz/v1/models (requires auth). The model ID in the API response is the exact string to use.

Together AI vs Groq vs Fireworks AI — which is best?

Groq is fastest (LPU hardware, 750-1500 tok/s) but has the smallest model catalog and stricter rate limits. Together AI has 200+ models including LLaMA, Mixtral, Qwen, Falcon, Code Llama, and supports fine-tuning. Fireworks AI has strong production SLAs and competitive speeds. For model breadth and fine-tuning, Together AI is strongest. For raw inference speed on open-source models, Groq wins. For production reliability, Fireworks is solid.

Related AI inference APIs