Why is the OpenAI API not working?

Common causes: (1) 401 invalid API key — key may have been deleted or rotated; (2) 429 rate limit — too many requests or exceeded monthly spend limit; (3) 500/503 server error — OpenAI infrastructure issue, check prismix.dev/service/openai; (4) model not found — model name typo or model was deprecated; (5) context length exceeded — total tokens (prompt + response) exceed model limit.

OpenAI API 401 Unauthorized — how to fix?

(1) check your API key at platform.openai.com/api-keys; (2) ensure the key is not expired or deleted; (3) verify you're passing it correctly — Authorization: Bearer sk-... header or OPENAI_API_KEY env var; (4) if using an organization, check the org ID matches your key's organization; (5) free trial keys expire after 3 months of inactivity or when trial credit runs out.

OpenAI API 429 rate limit exceeded?

Two types of 429 errors: (a) RPM/TPM limit — too many requests per minute, add retry with exponential backoff; (b) billing limit — monthly spend limit reached, go to platform.openai.com/settings/billing to increase it. Check your tier: Tier 1 starts at $5 payment, each tier unlocks higher rate limits.

OpenAI API timeout or slow response?

(1) set a reasonable timeout on your client (30-60 seconds for complex prompts); (2) use streaming to get tokens as they generate instead of waiting for full response; (3) reduce max_tokens if you only need a short response; (4) check openai.com/status for ongoing incidents; (5) gpt-4o is faster than gpt-4-turbo for most tasks.

OpenAI API model deprecated or not found?

Deprecated models still accept requests for a period after deprecation but are eventually removed. Current valid models: gpt-4o (recommended, multimodal), gpt-4o-mini (fast+cheap), gpt-4-turbo, o1, o3-mini, text-embedding-3-small/large, dall-e-3, whisper-1. Deprecated: gpt-4-0613, gpt-3.5-turbo (being retired). Use client.models.list() to see all available.

OpenAI GPT-4 Fix 5 min read

OpenAI API Not Working? Fix Authentication, Rate Limits & SDK Errors

Troubleshoot OpenAI API errors — 401 invalid API key, 429 rate limit exceeded, 500/503 server errors, model not found, context length exceeded, and streaming issues when calling the API with Python or Node.js.

OpenAI API — live status

Updated every 5 minutes · Full incident history →

Full status →

Common errors and fixes

401 Unauthorized / Invalid API key

The most common cause is a missing, expired, or incorrectly passed API key. Use the official SDK with an environment variable:

# Python
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

// Node.js
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

Key format: standard keys start with sk-; project-scoped keys start with sk-proj- — both are valid.
Check expiry: go to platform.openai.com/api-keys — deleted or expired keys show as inactive.
Organization ID: if your account belongs to an organization, verify the org ID matches the key's organization in the API dashboard.
Trial credit: free trial API keys expire after 3 months of inactivity or when the $5 trial credit runs out — add a payment method to continue.

429 Rate Limit — RPM/TPM vs billing limit

There are two distinct 429 errors. RPM/TPM limit: too many requests or tokens per minute — back off and retry. Billing limit: monthly spend cap reached — raise the limit at platform.openai.com/settings/billing. Add exponential backoff for RPM errors:

import time
import random
from openai import RateLimitError

def completion_with_retry(client, **kwargs):
    for attempt in range(5):
        try:
            return client.chat.completions.create(**kwargs)
        except RateLimitError as e:
            if attempt == 4:
                raise
            wait = (2 ** attempt) + random.random()
            print(f"Rate limited. Waiting {wait:.1f}s...")
            time.sleep(wait)

Use tenacity in production: the tenacity library provides battle-tested retry decorators with jitter and max-wait configuration.
Tier progression: Tier 1 unlocks after a $5 payment; Tier 2 after $50 cumulative spend at 7 days; higher tiers unlock higher RPM/TPM limits.
Check billing limit: platform.openai.com/settings/billing → Usage limits — raise or remove the monthly cap.

500 / 503 Server Errors

These are OpenAI infrastructure errors — not caused by your code. Check prismix.dev/service/openai or status.openai.com for active incidents. Always implement retry for 5xx errors:

Note: 502 Bad Gateway often means an upstream timeout rather than a server crash — try reducing prompt length or lowering max_tokens before retrying.

500 Internal Server Error: transient OpenAI-side failure — retry with backoff, usually resolves within seconds.
503 Service Unavailable: OpenAI is under heavy load or in maintenance — check status page and retry.
502 Bad Gateway: request timed out upstream — reduce prompt length, lower max_tokens, or switch to a faster model like gpt-4o-mini.

Context length exceeded

The error message includes the exact limit: "maximum context length is X tokens but you requested Y tokens". Solutions:

Switch to gpt-4o: supports 128K tokens context — large enough for most documents and long conversations.
Truncate conversation history: keep only the last N messages in the messages array instead of the full history.

Count tokens before sending with tiktoken:

import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
tokens = len(enc.encode(your_text))
print(f"Token count: {tokens}")

Streaming issues

Use the context manager form and iterate chunks — do not call response.content on a streaming response:

# Python streaming
with client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
) as stream:
    for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")

Do not access response.content on a stream: iterate chunk.choices[0].delta.content inside the loop — accessing the full response at once raises an error on a streaming object.
Usage stats in streaming: token counts (prompt_tokens, completion_tokens) are only included in the final chunk when you pass stream_options={"include_usage": true}.

🔔

Know when the OpenAI API has an outage

Free email alerts. Star OpenAI on Prismix — no credit card needed.

View status Sign in free →

FAQ

OpenAI API vs ChatGPT — same outage?

Not always. ChatGPT and the API use the same underlying models but different infrastructure. ChatGPT could be down while the API works normally (or vice versa). Check prismix.dev/service/openai for API-specific status.

How do I check my API usage and costs?

Visit platform.openai.com/usage for token usage and costs broken down by model and day. Set a monthly spend limit at platform.openai.com/settings/billing/limits to avoid unexpected charges.

OpenAI API pricing — cheapest option

gpt-4o-mini is the cheapest GPT-4 quality model at $0.15/1M input tokens. For embeddings, text-embedding-3-small is $0.02/1M tokens. For highest quality multimodal tasks, gpt-4o is $2.50/1M input tokens.

Monitor related services

OpenAI API status → Gemini API not working → Anthropic status → All AI status → All guides →