OpenRouter Guide 2025: Access 100+ AI Models with One API Key
OpenRouter is a unified API gateway for 100+ AI models — Claude, GPT-4o, Gemini 2.0, Llama 3.3, Mistral, DeepSeek, and more — all through one API key and one billing account. This guide covers setup, pricing, Python code examples, model selection, free tier, rate limits, and when to use OpenRouter vs direct provider APIs.
1. What is OpenRouter?
OpenRouter is an API aggregator that sits between your application and multiple AI providers. Instead of managing separate API keys for Anthropic, OpenAI, Google, and Meta, you get a single OpenRouter key and access everything through one unified endpoint.
Single API key: one key (sk-or-...) replaces separate keys from Anthropic, OpenAI, Google AI, Groq, Mistral, and 50+ other providers.
OpenAI-compatible: uses the exact same API format as OpenAI — change base_url and api_key, nothing else. Works with the OpenAI Python/Node SDK out of the box.
Unified billing: add credits once ($5 minimum) and spend across all models. No separate billing accounts per provider.
Provider routing: for models available on multiple providers (e.g., Claude is on Anthropic, AWS Bedrock, and GCP Vertex), OpenRouter auto-routes to the cheapest or fastest available provider. You can also pin to a specific provider.
Fallback routing: specify a list of models in priority order — if the first model's provider is down, automatically fall back to the next one. Great for high-availability production apps.
2. Setup & API key
Step 1: Go to openrouter.ai and sign up with Google or email. No credit card required to start exploring.
Step 2: Go to openrouter.ai/keys, click "Create Key", name it (e.g., "prod-app"), and copy the key. Keys start with sk-or-.
Step 3: Store in environment variable: OPENROUTER_API_KEY=sk-or-YOUR_KEY
Step 4: Add credits at openrouter.ai/credits (minimum $5 via Stripe or crypto) to access paid models. Free tier requires no credit card — use :free model suffixes.
API endpoint: https://openrouter.ai/api/v1/chat/completions — identical to OpenAI except the base URL.
3. How pricing works — per-token + provider routing
OpenRouter charges the provider's listed price plus a 5% service fee. There's no subscription — you pay only for what you use from your pre-loaded credit balance.
| Model (via OpenRouter) | Input /1M | Output /1M | Context |
|---|---|---|---|
| anthropic/claude-sonnet-4-6 | $3.15 | $15.75 | 200k |
| openai/gpt-4o | $2.63 | $10.50 | 128k |
| google/gemini-2.0-flash | $0.079 | $0.315 | 1M |
| meta-llama/llama-3.3-70b-instruct | $0.63 | $0.63 | 128k |
| mistralai/mistral-large | $2.10 | $6.30 | 128k |
| deepseek/deepseek-v3 | $0.28 | $1.16 | 64k |
| meta-llama/llama-3.1-8b-instruct:free | $0.00 | $0.00 | 128k |
Prices are approximate (provider listed price + ~5% OpenRouter fee). Check openrouter.ai/models for current exact pricing. Free models (:free suffix) have lower rate limits and may have queuing during peak hours.
4. Top models available on OpenRouter
Claude (Anthropic): anthropic/claude-sonnet-4-6, anthropic/claude-opus-4-5 — best coding and writing quality, 200k context, prompt caching.
GPT-4o (OpenAI): openai/gpt-4o, openai/gpt-4o-mini — multimodal (vision), function calling, best ecosystem compatibility.
Gemini (Google): google/gemini-2.0-flash — cheapest frontier model ($0.075/1M tokens), 1M context window, best for long-document tasks.
Llama 3.3 (Meta): meta-llama/llama-3.3-70b-instruct — open weights, strong performance, free via :free suffix (rate limited), or cheap at $0.59/1M via Groq backend.
Mistral: mistralai/mistral-large, mistralai/codestral-latest — EU-hosted, GDPR-friendly, best for European languages and code generation.
DeepSeek: deepseek/deepseek-v3 — cheapest capable model at $0.27/1M, good for bulk tasks where cost matters more than quality.
5. Python code examples
Two approaches: raw requests library, or using the OpenAI Python SDK with a changed base_url. Both work identically.
import requests
response = requests.post(
url="https://openrouter.ai/api/v1/chat/completions",
headers={
"Authorization": "Bearer sk-or-YOUR_KEY",
"Content-Type": "application/json",
"HTTP-Referer": "https://yoursite.com", # optional
"X-Title": "Your App Name", # optional
},
json={
"model": "anthropic/claude-sonnet-4-6",
"messages": [
{"role": "user", "content": "What is OpenRouter?"}
]
}
)
print(response.json()["choices"][0]["message"]["content"]) from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="sk-or-YOUR_KEY",
)
completion = client.chat.completions.create(
model="meta-llama/llama-3.3-70b-instruct",
messages=[
{"role": "user", "content": "Explain RAG in one paragraph."}
],
extra_headers={
"HTTP-Referer": "https://yoursite.com",
}
)
print(completion.choices[0].message.content) # Use :nitro suffix for lowest latency routing
# Use :free suffix for free model variants
# Use provider routing for fallback
response = requests.post(
url="https://openrouter.ai/api/v1/chat/completions",
headers={"Authorization": "Bearer sk-or-YOUR_KEY"},
json={
"model": "openai/gpt-4o",
"route": "fallback", # fallback to next provider if primary is down
"models": [ # ordered fallback list
"openai/gpt-4o",
"anthropic/claude-sonnet-4-6",
"meta-llama/llama-3.3-70b-instruct"
],
"messages": [
{"role": "user", "content": "Hello"}
]
}
) 6. Model selection strategies — cost vs speed vs quality
Optimize for cost: Use google/gemini-2.0-flash ($0.075/1M) or deepseek/deepseek-v3 ($0.27/1M) for bulk classification, summarization, or simple tasks. 10-40x cheaper than Claude or GPT-4o.
Optimize for speed: Use meta-llama/llama-3.1-8b-instruct via Groq backend (add "provider": {"only": ["Groq"]} to your request body) for 800+ tokens/sec inference — best for real-time chat applications.
Optimize for quality: Use anthropic/claude-sonnet-4-6 for coding, analysis, and long-context tasks. Use openai/gpt-4o for tasks requiring multimodal input or function calling with complex schemas.
Model suffix options: append :free for zero-cost (rate limited), :nitro for lowest latency routing, :online to add web search capability to any model.
Two-tier routing pattern: Use cheap/fast model (Gemini Flash, Llama 8B) for first pass — if confidence is low or the task requires deeper reasoning, escalate to Claude Sonnet or GPT-4o. Reduces average cost by 80%+ while maintaining quality for complex queries.
7. Free tier & rate limits
Free models: append :free to any supported model (e.g., meta-llama/llama-3.1-8b-instruct:free). These are served by volunteer compute providers and have lower rate limits. No credits required.
Rate limits on free tier: 20 requests/minute and 200 requests/day for free models. Paid accounts (with credits) get 60-600 RPM depending on the model provider.
Credit-based rate limits: Adding credits unlocks higher rate limits automatically. Anthropic Claude via OpenRouter has the same RPM/TPM limits as going direct — OpenRouter doesn't throttle paid traffic beyond provider limits.
x-ratelimit headers: OpenRouter returns standard rate limit headers in responses: x-ratelimit-limit-requests, x-ratelimit-remaining-requests, x-ratelimit-reset-requests. Use these for adaptive rate limiting in your code.
Credit minimum: $5 minimum top-up. Credits don't expire. Auto-reload available — set a minimum balance threshold at openrouter.ai/credits.
8. OpenRouter vs direct API — when to use which
| Scenario | Use OpenRouter | Use direct API |
|---|---|---|
| Testing multiple models | ✓ Best choice | ✗ Multiple accounts |
| Production app with one model | Optional (5% markup) | ✓ Cheaper, direct SLAs |
| High-availability with fallback | ✓ Built-in fallback routing | Requires custom logic |
| Prompt caching (Anthropic) | Supported (check docs) | ✓ Full native support |
| EU data residency required | ✗ US-based | ✓ Mistral EU, Anthropic EU |
| Cost optimization across models | ✓ Auto-cheapest routing | Manual per-provider setup |
| Unified billing / startups | ✓ One account, all models | Multiple billing accounts |
Summary: Use OpenRouter for prototyping, multi-model apps, and startups. Use direct APIs in production when you commit to one model, need the cheapest price, or have specific compliance/data residency requirements.
Monitor OpenRouter and provider status
OpenRouter can be degraded even when individual providers are up — and providers can be down even when OpenRouter is up. Monitor OpenRouter, Anthropic, OpenAI, and Google AI status at prismix.dev with free email alerts.
FAQ
What is OpenRouter?
OpenRouter is a unified API gateway that gives you access to 100+ AI models (Claude, GPT-4o, Gemini, Llama, Mistral, DeepSeek) through a single API key and endpoint. You pay per token based on each model's pricing plus a 5% service fee. Works with the OpenAI Python SDK out of the box — just change base_url.
Is OpenRouter free?
OpenRouter has free model variants (append :free to a model name). Paid models require pre-loaded credits ($5 minimum via Stripe or crypto). There's no monthly subscription — you only pay per token for what you use. Credits don't expire.
How does OpenRouter pricing work?
OpenRouter charges the provider's listed API price plus a 5% service fee. For example, Claude Sonnet at $3/1M tokens becomes ~$3.15/1M via OpenRouter. You pre-load credits and pay as you go. OpenRouter auto-routes to the cheapest provider for multi-provider models, or you can pin to a specific provider.
Can I use OpenRouter with the OpenAI Python SDK?
Yes. OpenRouter is fully OpenAI API-compatible. Change base_url to "https://openrouter.ai/api/v1" and api_key to your OpenRouter key (starts sk-or-). All other code stays identical. Model names use provider/model-name format: "anthropic/claude-sonnet-4-6", "openai/gpt-4o".