How to Fix Gemini 429 Too Many Requests (Rate Limit) — Practical Checklist (2026)

By Joe @ SimpleMetrics
Published 21 February, 2026

Table of Contents

429 Too Many Requests from Gemini usually means your app is sending requests faster than the service allows (rate limits) or you hit a quota for your project/account. This guide gives you a practical checklist to get back to stable responses.

What Gemini 429 means (and what it does NOT mean)

  • 429 = rate limit / quota: slow down, batch work, and retry with backoff.
  • 503 = capacity: the service is overloaded; see Gemini 503 fixes.
  • 401/403 = auth or policy: API key/project issues; see Gemini API key setup.

Quick fix checklist (do these first)

  1. Stop parallel bursts: cap concurrency (e.g. 1–5 in-flight requests) instead of firing 50 at once.
  2. Add exponential backoff + jitter: retry after 1s → 2s → 4s → 8s (with randomness).
  3. Cache results: if the input prompt + context is the same, reuse the output.
  4. Reduce token usage: shorter prompts + smaller context reduce cost and pressure on quotas.
  5. Batch work: combine multiple small tasks into one request when it’s safe.

Common causes of Gemini 429

1) Too many requests per minute (RPM)

Typical trigger: a script loops through rows and calls the API once per row. If you have 1,000 rows, you can hit limits quickly.

2) Too much concurrency

Even if your average rate is OK, sending requests in parallel spikes instantaneous load. Many environments (serverless, queue workers) accidentally do this by default.

3) Shared project usage (multiple apps/users)

If multiple services share the same Google project, they also share quotas. One noisy job can push everyone into 429.

A safe retry strategy (exponential backoff + jitter)

If you retry immediately, you usually make 429 worse. Use exponential backoff with a random jitter.

// Pseudocode (language-agnostic)
maxRetries = 6
baseDelayMs = 1000

for attempt in 0..maxRetries:
  res = callGemini()
  if res.ok:
    return res

  if res.status != 429:
    throw res.error

  // backoff with jitter
  delay = baseDelayMs * (2 ** attempt)
  jitter = random(0, 250)
  sleep(delay + jitter)

throw "429 persisted after retries"

Batching: fix the “one row = one request” trap

If you’re classifying or summarizing rows, batch them. For example, send 20–50 rows in one prompt, return JSON, then split results back into the sheet.

If you prefer not to manage API keys at all, you can use Gemini in Google Sheets with AI for Sheets formulas (no coding and no API key required).

Monitoring and prevention

  • Log 429 rate: track how often it happens and which job causes it.
  • Queue work: a simple queue smooths bursts and keeps you under limits.
  • Separate projects: isolate quotas per app if you run multiple workloads.

FAQ

Is 429 a permanent ban?

Usually no. It is a temporary rate-limit response. When you slow down and retry with backoff, requests typically succeed.

Why do I see 429 in bursts even with a low average rate?

Concurrency spikes are the most common reason. Cap in-flight requests and batch rows.

Should I switch models when I hit 429?

Switching models can help if different models have different limits in your environment, but the reliable fix is controlling request rate and adding backoff.

Next step

If you’re unsure whether your error is rate limiting or service capacity, compare this guide with Gemini 503 overload and Gemini API key troubleshooting.

Found this useful? Share it!

If this helped you, I'd appreciate you sharing it with colleagues.

Was this page helpful?

Your feedback helps improve this content.

Related Posts