OpenRouter has a limited free tier with free model variants (append :free to a model name, e.g. meta-llama/llama-3.1-8b-instruct:free). Free models have lower rate limits. For paid models, you add credits via Stripe or cryptocurrency and pay per token as you go. There's no monthly subscription — you only pay for what you use. The minimum credit top-up is $5.

API Developers 8 min read

OpenRouter Guide 2025: Access 100+ AI Models with One API Key

Q: Can I use OpenRouter with the OpenAI Python SDK?

Yes. OpenRouter is fully OpenAI API-compatible. Just change base_url to https://openrouter.ai/api/v1 and api_key to your OpenRouter key (starts sk-or-). All other code stays identical. This makes it easy to switch between direct OpenAI and OpenRouter without changing your application code.

OpenRouter is a unified API gateway for 100+ AI models — Claude, GPT-4o, Gemini 2.0, Llama 3.3, Mistral, DeepSeek, and more — all through one API key and one billing account. This guide covers setup, pricing, Python code examples, model selection, free tier, rate limits, and when to use OpenRouter vs direct provider APIs.

1. What is OpenRouter?

OpenRouter is an API aggregator that sits between your application and multiple AI providers. Instead of managing separate API keys for Anthropic, OpenAI, Google, and Meta, you get a single OpenRouter key and access everything through one unified endpoint.

Single API key: one key (sk-or-...) replaces separate keys from Anthropic, OpenAI, Google AI, Groq, Mistral, and 50+ other providers.

OpenAI-compatible: uses the exact same API format as OpenAI — change base_url and api_key, nothing else. Works with the OpenAI Python/Node SDK out of the box.

Unified billing: add credits once ($5 minimum) and spend across all models. No separate billing accounts per provider.

Provider routing: for models available on multiple providers (e.g., Claude is on Anthropic, AWS Bedrock, and GCP Vertex), OpenRouter auto-routes to the cheapest or fastest available provider. You can also pin to a specific provider.

Fallback routing: specify a list of models in priority order — if the first model's provider is down, automatically fall back to the next one. Great for high-availability production apps.

2. Setup & API key

Step 1: Go to openrouter.ai and sign up with Google or email. No credit card required to start exploring.

Step 2: Go to openrouter.ai/keys, click "Create Key", name it (e.g., "prod-app"), and copy the key. Keys start with sk-or-.

Step 3: Store in environment variable: OPENROUTER_API_KEY=sk-or-YOUR_KEY

Step 4: Add credits at openrouter.ai/credits (minimum $5 via Stripe or crypto) to access paid models. Free tier requires no credit card — use :free model suffixes.

API endpoint: https://openrouter.ai/api/v1/chat/completions — identical to OpenAI except the base URL.

3. How pricing works — per-token + provider routing

OpenRouter charges the provider's listed price plus a 5% service fee. There's no subscription — you pay only for what you use from your pre-loaded credit balance.

Model (via OpenRouter)	Input /1M	Output /1M	Context
anthropic/claude-sonnet-4-6	$3.15	$15.75	200k
openai/gpt-4o	$2.63	$10.50	128k
google/gemini-2.0-flash	$0.079	$0.315	1M
meta-llama/llama-3.3-70b-instruct	$0.63	$0.63	128k
mistralai/mistral-large	$2.10	$6.30	128k
deepseek/deepseek-v3	$0.28	$1.16	64k
meta-llama/llama-3.1-8b-instruct:free	$0.00	$0.00	128k

Prices are approximate (provider listed price + ~5% OpenRouter fee). Check openrouter.ai/models for current exact pricing. Free models (:free suffix) have lower rate limits and may have queuing during peak hours.

4. Top models available on OpenRouter

Claude (Anthropic): anthropic/claude-sonnet-4-6, anthropic/claude-opus-4-5 — best coding and writing quality, 200k context, prompt caching.

GPT-4o (OpenAI): openai/gpt-4o, openai/gpt-4o-mini — multimodal (vision), function calling, best ecosystem compatibility.

Gemini (Google): google/gemini-2.0-flash — cheapest frontier model ($0.075/1M tokens), 1M context window, best for long-document tasks.

Llama 3.3 (Meta): meta-llama/llama-3.3-70b-instruct — open weights, strong performance, free via :free suffix (rate limited), or cheap at $0.59/1M via Groq backend.

Mistral: mistralai/mistral-large, mistralai/codestral-latest — EU-hosted, GDPR-friendly, best for European languages and code generation.

DeepSeek: deepseek/deepseek-v3 — cheapest capable model at $0.27/1M, good for bulk tasks where cost matters more than quality.

5. Python code examples

Two approaches: raw requests library, or using the OpenAI Python SDK with a changed base_url. Both work identically.

requests library (no extra dependencies)

import requests

response = requests.post(
    url="https://openrouter.ai/api/v1/chat/completions",
    headers={
        "Authorization": "Bearer sk-or-YOUR_KEY",
        "Content-Type": "application/json",
        "HTTP-Referer": "https://yoursite.com",  # optional
        "X-Title": "Your App Name",              # optional
    },
    json={
        "model": "anthropic/claude-sonnet-4-6",
        "messages": [
            {"role": "user", "content": "What is OpenRouter?"}
        ]
    }
)
print(response.json()["choices"][0]["message"]["content"])

OpenAI Python SDK (pip install openai)

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="sk-or-YOUR_KEY",
)

completion = client.chat.completions.create(
    model="meta-llama/llama-3.3-70b-instruct",
    messages=[
        {"role": "user", "content": "Explain RAG in one paragraph."}
    ],
    extra_headers={
        "HTTP-Referer": "https://yoursite.com",
    }
)
print(completion.choices[0].message.content)

Fallback routing — high availability

# Use :nitro suffix for lowest latency routing
# Use :free suffix for free model variants
# Use provider routing for fallback

response = requests.post(
    url="https://openrouter.ai/api/v1/chat/completions",
    headers={"Authorization": "Bearer sk-or-YOUR_KEY"},
    json={
        "model": "openai/gpt-4o",
        "route": "fallback",   # fallback to next provider if primary is down
        "models": [            # ordered fallback list
            "openai/gpt-4o",
            "anthropic/claude-sonnet-4-6",
            "meta-llama/llama-3.3-70b-instruct"
        ],
        "messages": [
            {"role": "user", "content": "Hello"}
        ]
    }
)

6. Model selection strategies — cost vs speed vs quality

Optimize for cost: Use google/gemini-2.0-flash ($0.075/1M) or deepseek/deepseek-v3 ($0.27/1M) for bulk classification, summarization, or simple tasks. 10-40x cheaper than Claude or GPT-4o.

Optimize for speed: Use meta-llama/llama-3.1-8b-instruct via Groq backend (add "provider": {"only": ["Groq"]} to your request body) for 800+ tokens/sec inference — best for real-time chat applications.

Optimize for quality: Use anthropic/claude-sonnet-4-6 for coding, analysis, and long-context tasks. Use openai/gpt-4o for tasks requiring multimodal input or function calling with complex schemas.

Model suffix options: append :free for zero-cost (rate limited), :nitro for lowest latency routing, :online to add web search capability to any model.

Two-tier routing pattern: Use cheap/fast model (Gemini Flash, Llama 8B) for first pass — if confidence is low or the task requires deeper reasoning, escalate to Claude Sonnet or GPT-4o. Reduces average cost by 80%+ while maintaining quality for complex queries.

7. Free tier & rate limits

Free models: append :free to any supported model (e.g., meta-llama/llama-3.1-8b-instruct:free). These are served by volunteer compute providers and have lower rate limits. No credits required.

Rate limits on free tier: 20 requests/minute and 200 requests/day for free models. Paid accounts (with credits) get 60-600 RPM depending on the model provider.

Credit-based rate limits: Adding credits unlocks higher rate limits automatically. Anthropic Claude via OpenRouter has the same RPM/TPM limits as going direct — OpenRouter doesn't throttle paid traffic beyond provider limits.

x-ratelimit headers: OpenRouter returns standard rate limit headers in responses: x-ratelimit-limit-requests, x-ratelimit-remaining-requests, x-ratelimit-reset-requests. Use these for adaptive rate limiting in your code.

Credit minimum: $5 minimum top-up. Credits don't expire. Auto-reload available — set a minimum balance threshold at openrouter.ai/credits.

8. OpenRouter vs direct API — when to use which

Scenario	Use OpenRouter	Use direct API
Testing multiple models	✓ Best choice	✗ Multiple accounts
Production app with one model	Optional (5% markup)	✓ Cheaper, direct SLAs
High-availability with fallback	✓ Built-in fallback routing	Requires custom logic
Prompt caching (Anthropic)	Supported (check docs)	✓ Full native support
EU data residency required	✗ US-based	✓ Mistral EU, Anthropic EU
Cost optimization across models	✓ Auto-cheapest routing	Manual per-provider setup
Unified billing / startups	✓ One account, all models	Multiple billing accounts

Summary: Use OpenRouter for prototyping, multi-model apps, and startups. Use direct APIs in production when you commit to one model, need the cheapest price, or have specific compliance/data residency requirements.

🔔

Monitor OpenRouter and provider status

OpenRouter can be degraded even when individual providers are up — and providers can be down even when OpenRouter is up. Monitor OpenRouter, Anthropic, OpenAI, and Google AI status at prismix.dev with free email alerts.

OpenRouter status Get alerts free →

FAQ

What is OpenRouter?

OpenRouter is a unified API gateway that gives you access to 100+ AI models (Claude, GPT-4o, Gemini, Llama, Mistral, DeepSeek) through a single API key and endpoint. You pay per token based on each model's pricing plus a 5% service fee. Works with the OpenAI Python SDK out of the box — just change base_url.

Is OpenRouter free?

OpenRouter has free model variants (append :free to a model name). Paid models require pre-loaded credits ($5 minimum via Stripe or crypto). There's no monthly subscription — you only pay per token for what you use. Credits don't expire.

How does OpenRouter pricing work?

OpenRouter charges the provider's listed API price plus a 5% service fee. For example, Claude Sonnet at $3/1M tokens becomes ~$3.15/1M via OpenRouter. You pre-load credits and pay as you go. OpenRouter auto-routes to the cheapest provider for multi-provider models, or you can pin to a specific provider.

Can I use OpenRouter with the OpenAI Python SDK?

Yes. OpenRouter is fully OpenAI API-compatible. Change base_url to "https://openrouter.ai/api/v1" and api_key to your OpenRouter key (starts sk-or-). All other code stays identical. Model names use provider/model-name format: "anthropic/claude-sonnet-4-6", "openai/gpt-4o".

OpenRouter not working → OpenRouter status → Claude API tutorial → Groq vs Together AI → All guides →