OpenAI API Not Working? Fix Authentication, Rate Limits & SDK Errors
Troubleshoot OpenAI API errors — 401 invalid API key, 429 rate limit exceeded, 500/503 server errors, model not found, context length exceeded, and streaming issues when calling the API with Python or Node.js.
Common errors and fixes
401 Unauthorized / Invalid API key
The most common cause is a missing, expired, or incorrectly passed API key. Use the official SDK with an environment variable:
# Python
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
) // Node.js
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
}); - Key format: standard keys start with
sk-; project-scoped keys start withsk-proj-— both are valid. - Check expiry: go to platform.openai.com/api-keys — deleted or expired keys show as inactive.
- Organization ID: if your account belongs to an organization, verify the org ID matches the key's organization in the API dashboard.
- Trial credit: free trial API keys expire after 3 months of inactivity or when the $5 trial credit runs out — add a payment method to continue.
429 Rate Limit — RPM/TPM vs billing limit
There are two distinct 429 errors. RPM/TPM limit: too many requests or tokens per minute — back off and retry. Billing limit: monthly spend cap reached — raise the limit at platform.openai.com/settings/billing. Add exponential backoff for RPM errors:
import time
import random
from openai import RateLimitError
def completion_with_retry(client, **kwargs):
for attempt in range(5):
try:
return client.chat.completions.create(**kwargs)
except RateLimitError as e:
if attempt == 4:
raise
wait = (2 ** attempt) + random.random()
print(f"Rate limited. Waiting {wait:.1f}s...")
time.sleep(wait) - Use tenacity in production: the
tenacitylibrary provides battle-tested retry decorators with jitter and max-wait configuration. - Tier progression: Tier 1 unlocks after a $5 payment; Tier 2 after $50 cumulative spend at 7 days; higher tiers unlock higher RPM/TPM limits.
- Check billing limit: platform.openai.com/settings/billing → Usage limits — raise or remove the monthly cap.
500 / 503 Server Errors
These are OpenAI infrastructure errors — not caused by your code. Check prismix.dev/service/openai or status.openai.com for active incidents. Always implement retry for 5xx errors:
502 Bad Gateway often means an upstream timeout rather than a server crash — try reducing prompt length or lowering max_tokens before retrying.
- 500 Internal Server Error: transient OpenAI-side failure — retry with backoff, usually resolves within seconds.
- 503 Service Unavailable: OpenAI is under heavy load or in maintenance — check status page and retry.
- 502 Bad Gateway: request timed out upstream — reduce prompt length, lower max_tokens, or switch to a faster model like gpt-4o-mini.
Context length exceeded
The error message includes the exact limit: "maximum context length is X tokens but you requested Y tokens". Solutions:
- Switch to gpt-4o: supports 128K tokens context — large enough for most documents and long conversations.
- Truncate conversation history: keep only the last N messages in the
messagesarray instead of the full history.
Count tokens before sending with tiktoken:
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
tokens = len(enc.encode(your_text))
print(f"Token count: {tokens}") Streaming issues
Use the context manager form and iterate chunks — do not call response.content on a streaming response:
# Python streaming
with client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True,
) as stream:
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="") - Do not access response.content on a stream: iterate
chunk.choices[0].delta.contentinside the loop — accessing the full response at once raises an error on a streaming object. - Usage stats in streaming: token counts (
prompt_tokens,completion_tokens) are only included in the final chunk when you passstream_options={"include_usage": true}.
Know when the OpenAI API has an outage
Free email alerts. Star OpenAI on Prismix — no credit card needed.
FAQ
OpenAI API vs ChatGPT — same outage?
Not always. ChatGPT and the API use the same underlying models but different infrastructure. ChatGPT could be down while the API works normally (or vice versa). Check prismix.dev/service/openai for API-specific status.
How do I check my API usage and costs?
Visit platform.openai.com/usage for token usage and costs broken down by model and day. Set a monthly spend limit at platform.openai.com/settings/billing/limits to avoid unexpected charges.
OpenAI API pricing — cheapest option
gpt-4o-mini is the cheapest GPT-4 quality model at $0.15/1M input tokens. For embeddings, text-embedding-3-small is $0.02/1M tokens. For highest quality multimodal tasks, gpt-4o is $2.50/1M input tokens.