Gemini API Google AI Fix 5 min read

Google Gemini API Not Working? Fix Authentication, Quota & Model Errors

Troubleshoot Gemini API errors — 403 invalid API key, 429 quota exceeded, SAFETY-blocked responses, model not found, and streaming issues when calling the API programmatically.

Gemini API live status

Gemini API — live status

Updated every 5 minutes · Full incident history →

Full status →

Common errors and fixes

403 API key not valid / permission denied

The most common cause of 403 errors is an API key created in the wrong place or missing the required API enabled. Use this setup:

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-pro")
response = model.generate_content("Hello")
  • Create the key at aistudio.google.com — keys created in the Google Cloud Console do not work with the Generative Language API by default.
  • Enable "Generative Language API" in the Google Cloud project linked to your key.
  • Check for IP or referrer restrictions — if you restricted the key to specific IPs, requests from other IPs will be denied.
  • Vertex AI uses different auth — if you are using vertexai, you need Application Default Credentials (ADC), not an API key.

429 Too Many Requests / quota exceeded

Free tier limits: 15 RPM, 1,500 RPD, 1M tokens/day. Add exponential backoff to handle transient quota errors:

import time
import random

def call_with_retry(model, prompt, max_retries=5):
    for attempt in range(max_retries):
        try:
            return model.generate_content(prompt)
        except Exception as e:
            if "429" in str(e) and attempt < max_retries - 1:
                wait = (2 ** attempt) + random.random()
                time.sleep(wait)
            else:
                raise
  • Switch to gemini-1.5-flash — the Flash model has a higher free quota than gemini-1.5-pro.
  • Cache repeated prompts — if you call the API with identical prompts, cache the response locally.
  • Enable billing — upgrading to pay-as-you-go removes most free tier rate limits.

SAFETY finish_reason — blocked response

Note: Accessing response.text raises a ValueError when the response is blocked. Always check finish_reason first.
from google.generativeai.types import HarmCategory, HarmBlockThreshold

response = model.generate_content(
    prompt,
    safety_settings={
        HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_ONLY_HIGH,
    }
)
print(response.candidates[0].finish_reason)  # STOP = success, SAFETY = blocked
  • Adjustable categories: HARM_CATEGORY_HARASSMENT, HARM_CATEGORY_HATE_SPEECH, HARM_CATEGORY_SEXUALLY_EXPLICIT, HARM_CATEGORY_DANGEROUS_CONTENT.
  • Available thresholds: BLOCK_LOW_AND_ABOVE, BLOCK_MEDIUM_AND_ABOVE, BLOCK_ONLY_HIGH, BLOCK_NONE.
  • Free tier restriction: some categories cannot be fully disabled (set to BLOCK_NONE) on the free tier — enable billing first.

Model not found / model name errors

Model IDs must be exact. Use genai.list_models() to see all available models for your account. Current valid model IDs:

Model ID Notes
gemini-2.0-flashFastest, recommended for production
gemini-2.0-flash-liteUltra-fast, low cost
gemini-1.5-proMost capable, 2M context window
gemini-1.5-flashBalanced speed and quality
gemini-1.0-proDeprecated — being retired

Vertex AI: uses a different model path format — publishers/google/models/gemini-1.5-pro instead of the plain ID.

Streaming not working

Use stream=True and iterate over chunks — do not call response.text on the streaming response object:

for chunk in model.generate_content("Write a poem", stream=True):
    print(chunk.text, end="")
  • Do not call response.text on a stream: calling response.text on a streaming response raises ValueError — use chunk.text inside the loop instead.
  • Function calling + streaming: if you use tool/function calling with streaming, call response.resolve() after the loop before accessing tool_calls.
🔔

Know when the Gemini API has an outage

Free email alerts. Star Gemini API on Prismix — no credit card needed.

FAQ

Google AI Studio vs Vertex AI Gemini — which to use?

AI Studio is simpler: free tier available, API key authentication, no billing required to start. Vertex AI is enterprise-grade: billing required, uses Application Default Credentials (ADC) instead of an API key, higher quotas, data residency controls, and SLA guarantees. For prototyping and side projects, use AI Studio. For production workloads or enterprise compliance requirements, use Vertex AI.

Gemini API vs OpenAI API — compatibility

There is no direct drop-in compatibility. The OpenAI Python SDK does NOT work with Gemini — you must use the google-generativeai SDK or the REST API directly. However, as of late 2024, Vertex AI introduced an OpenAI-compatible endpoint that accepts the OpenAI request format, which can ease migration for some use cases.

Gemini API context window sizes

Context window by model: gemini-1.5-pro — 2M tokens; gemini-1.5-flash — 1M tokens; gemini-2.0-flash — 1M tokens. These are among the largest context windows available in any commercial LLM API.

Monitor related services