Google Gemini API Not Working? Fix Authentication, Quota & Model Errors
Troubleshoot Gemini API errors — 403 invalid API key, 429 quota exceeded, SAFETY-blocked responses, model not found, and streaming issues when calling the API programmatically.
Common errors and fixes
403 API key not valid / permission denied
The most common cause of 403 errors is an API key created in the wrong place or missing the required API enabled. Use this setup:
import google.generativeai as genai
import os
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-pro")
response = model.generate_content("Hello") - Create the key at aistudio.google.com — keys created in the Google Cloud Console do not work with the Generative Language API by default.
- Enable "Generative Language API" in the Google Cloud project linked to your key.
- Check for IP or referrer restrictions — if you restricted the key to specific IPs, requests from other IPs will be denied.
- Vertex AI uses different auth — if you are using
vertexai, you need Application Default Credentials (ADC), not an API key.
429 Too Many Requests / quota exceeded
Free tier limits: 15 RPM, 1,500 RPD, 1M tokens/day. Add exponential backoff to handle transient quota errors:
import time
import random
def call_with_retry(model, prompt, max_retries=5):
for attempt in range(max_retries):
try:
return model.generate_content(prompt)
except Exception as e:
if "429" in str(e) and attempt < max_retries - 1:
wait = (2 ** attempt) + random.random()
time.sleep(wait)
else:
raise - Switch to gemini-1.5-flash — the Flash model has a higher free quota than gemini-1.5-pro.
- Cache repeated prompts — if you call the API with identical prompts, cache the response locally.
- Enable billing — upgrading to pay-as-you-go removes most free tier rate limits.
SAFETY finish_reason — blocked response
response.text raises a ValueError when the response is blocked. Always check finish_reason first.
from google.generativeai.types import HarmCategory, HarmBlockThreshold
response = model.generate_content(
prompt,
safety_settings={
HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_ONLY_HIGH,
}
)
print(response.candidates[0].finish_reason) # STOP = success, SAFETY = blocked - Adjustable categories: HARM_CATEGORY_HARASSMENT, HARM_CATEGORY_HATE_SPEECH, HARM_CATEGORY_SEXUALLY_EXPLICIT, HARM_CATEGORY_DANGEROUS_CONTENT.
- Available thresholds: BLOCK_LOW_AND_ABOVE, BLOCK_MEDIUM_AND_ABOVE, BLOCK_ONLY_HIGH, BLOCK_NONE.
- Free tier restriction: some categories cannot be fully disabled (set to BLOCK_NONE) on the free tier — enable billing first.
Model not found / model name errors
Model IDs must be exact. Use genai.list_models() to see all available models for your account. Current valid model IDs:
| Model ID | Notes |
|---|---|
gemini-2.0-flash | Fastest, recommended for production |
gemini-2.0-flash-lite | Ultra-fast, low cost |
gemini-1.5-pro | Most capable, 2M context window |
gemini-1.5-flash | Balanced speed and quality |
gemini-1.0-pro | Deprecated — being retired |
Vertex AI: uses a different model path format — publishers/google/models/gemini-1.5-pro instead of the plain ID.
Streaming not working
Use stream=True and iterate over chunks — do not call response.text on the streaming response object:
for chunk in model.generate_content("Write a poem", stream=True):
print(chunk.text, end="") - Do not call response.text on a stream: calling
response.texton a streaming response raisesValueError— usechunk.textinside the loop instead. - Function calling + streaming: if you use tool/function calling with streaming, call
response.resolve()after the loop before accessingtool_calls.
Know when the Gemini API has an outage
Free email alerts. Star Gemini API on Prismix — no credit card needed.
FAQ
Google AI Studio vs Vertex AI Gemini — which to use?
AI Studio is simpler: free tier available, API key authentication, no billing required to start. Vertex AI is enterprise-grade: billing required, uses Application Default Credentials (ADC) instead of an API key, higher quotas, data residency controls, and SLA guarantees. For prototyping and side projects, use AI Studio. For production workloads or enterprise compliance requirements, use Vertex AI.
Gemini API vs OpenAI API — compatibility
There is no direct drop-in compatibility. The OpenAI Python SDK does NOT work with Gemini — you must use the google-generativeai SDK or the REST API directly. However, as of late 2024, Vertex AI introduced an OpenAI-compatible endpoint that accepts the OpenAI request format, which can ease migration for some use cases.
Gemini API context window sizes
Context window by model: gemini-1.5-pro — 2M tokens; gemini-1.5-flash — 1M tokens; gemini-2.0-flash — 1M tokens. These are among the largest context windows available in any commercial LLM API.