Anthropic Claude API Fix 5 min read

Anthropic Claude API Not Working? Fix Auth, Rate Limits & SDK Errors

Troubleshoot Anthropic Claude API errors — 401 invalid API key, 429 rate limits (RPM, TPM, concurrent), 529 overloaded, context window exceeded, and streaming issues when calling the API with the Python or TypeScript SDK.

Anthropic API live status

Anthropic API — live status

Updated every 5 minutes · Full incident history →

Full status →

Common errors and fixes

401 Unauthorized / invalid API key

The most common cause is a missing, expired, or incorrectly formatted API key. Use the official SDK with an environment variable:

import anthropic
import os

client = anthropic.Anthropic(
    api_key=os.environ.get("ANTHROPIC_API_KEY")  # sk-ant-api03-...
)

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello"}
    ]
)
print(message.content[0].text)
// TypeScript
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const message = await client.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello' }],
});
  • Key format: all Anthropic API keys start with sk-ant-api03- — verify this prefix exactly.
  • Get or rotate keys: go to console.anthropic.com/settings/keys — deleted or expired keys show as inactive.
  • Claude.ai ≠ API: a Claude.ai subscription does NOT include API access — you need a separate API account at console.anthropic.com with billing enabled.
  • Workspace keys: if using workspace-scoped keys, ensure the workspace has API credits and the key has not been restricted.

429 Rate limit — RPM, TPM, concurrent

Anthropic enforces three separate rate limit types simultaneously: requests per minute (RPM), tokens per minute (TPM, input + output combined), and concurrent requests. The Anthropic SDK has built-in retry logic you can configure:

# The Anthropic SDK has built-in retry for 429 and 529
client = anthropic.Anthropic(
    api_key=os.environ["ANTHROPIC_API_KEY"],
    max_retries=5,  # default is 2
)

# Or manual retry with backoff
import time
import random

def create_with_retry(client, **kwargs):
    for attempt in range(5):
        try:
            return client.messages.create(**kwargs)
        except anthropic.RateLimitError as e:
            if attempt == 4:
                raise
            wait = (2 ** attempt) + random.random()
            print(f"Rate limited. Retry {attempt + 1}/5 in {wait:.1f}s")
            time.sleep(wait)
  • Check rate limit headers: inspect x-ratelimit-limit-requests, x-ratelimit-remaining-requests, and x-ratelimit-reset-requests to see which limit you hit.
  • Tier progression: Anthropic tiers by historical spend — Tier 1 unlocks after a $5 payment; Tier 4 after $40k spent. Check your tier at console.anthropic.com/settings/limits.
  • Concurrent limit: easy to hit in parallel code — use asyncio.Semaphore or a queue to cap simultaneous in-flight requests.

529 Overloaded — Anthropic server issue

Status 529 means Anthropic's servers are overloaded — it is a temporary server-side issue, not your code. Always retry 529 with exponential backoff:

# 529 should always be retried — it's a temporary server issue
from anthropic import APIStatusError

def robust_create(client, **kwargs):
    for attempt in range(6):
        try:
            return client.messages.create(**kwargs)
        except APIStatusError as e:
            if e.status_code in (429, 529) and attempt < 5:
                wait = min(60, (2 ** attempt) + random.random())
                time.sleep(wait)
            else:
                raise
Note: Check prismix.dev/service/anthropic for active incidents. During outages, 529s may last minutes to hours — implement a circuit breaker pattern for production workloads. Switching to claude-3-haiku (lighter server load) can reduce 529 frequency during partial outages.

Wrong model name

Claude model IDs are exact strings — any typo or wrong format returns a model-not-found error. Current valid model IDs (as of June 2026):

  • claude-opus-4-8 — most capable
  • claude-sonnet-4-6 — balanced performance
  • claude-3-5-sonnet-20241022 — previous gen, stable
  • claude-3-5-haiku-20241022 — fastest, cheapest
  • claude-3-opus-20240229 — previous Opus
  • Common wrong names that fail: claude-3.5-sonnet (dot instead of dash), claude-opus-4 (missing version suffix), claude-3-5-sonnet (missing date suffix for older models).
  • Always check: console.anthropic.com/docs/models for the current canonical model IDs.

Streaming with the SDK

Use the stream() context manager and iterate — do not call message.content on a stream object before it completes:

# Python streaming
with client.messages.stream(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a poem"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

# Get the final message after streaming
final_message = stream.get_final_message()
print(f"\nUsage: {final_message.usage}")
// TypeScript streaming
const stream = await client.messages.stream({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Write a poem' }],
});

for await (const chunk of stream) {
  if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
    process.stdout.write(chunk.delta.text);
  }
}
  • Common mistake: calling message.content on a stream object before it completes — always use the stream iterator or await stream.finalMessage().
  • Usage stats: token counts are available via stream.get_final_message().usage (Python) or await stream.finalMessage() (TypeScript) after the stream completes.
🔔

Know when the Anthropic Claude API has an outage

Free email alerts. Star Anthropic on Prismix — no credit card needed.

FAQ

Does Claude API support function calling / tool use?

Yes. Use the tools parameter in messages.create(). Define tools with name, description, and input_schema (JSON Schema). Claude returns tool_use content blocks which you execute and send back as tool_result. The SDK includes helpers for tool use.

Anthropic API pricing — most cost-effective model?

claude-3-5-haiku-20241022 is the cheapest at $0.80/1M input tokens. claude-3-5-sonnet-20241022 is the best value for quality at $3/1M input. claude-3-opus is $15/1M input for the hardest tasks. Prompt caching reduces costs by 90% for repeated context (system prompts, documents).

Prompt caching — how to enable?

Add cache_control: {type: 'ephemeral'} to content blocks you want cached. The cache lasts 5 minutes. Cached tokens cost 10% of normal input price on re-use. Best for: long system prompts, reference documents, few-shot examples that stay constant across requests.

Monitor related services