429 Too Many Requests from Gemini usually means your app is sending requests faster than the service allows (rate limits) or you hit a quota for your project/account. This guide gives you a practical checklist to get back to stable responses.
What Gemini 429 means (and what it does NOT mean)
- 429 = rate limit / quota: slow down, batch work, and retry with backoff.
- 503 = capacity: the service is overloaded; see Gemini 503 fixes.
- 401/403 = auth or policy: API key/project issues; see Gemini API key setup.
Quick fix checklist (do these first)
- Stop parallel bursts: cap concurrency (e.g. 1–5 in-flight requests) instead of firing 50 at once.
- Add exponential backoff + jitter: retry after 1s → 2s → 4s → 8s (with randomness).
- Cache results: if the input prompt + context is the same, reuse the output.
- Reduce token usage: shorter prompts + smaller context reduce cost and pressure on quotas.
- Batch work: combine multiple small tasks into one request when it’s safe.
Common causes of Gemini 429
1) Too many requests per minute (RPM)
Typical trigger: a script loops through rows and calls the API once per row. If you have 1,000 rows, you can hit limits quickly.
2) Too much concurrency
Even if your average rate is OK, sending requests in parallel spikes instantaneous load. Many environments (serverless, queue workers) accidentally do this by default.
3) Shared project usage (multiple apps/users)
If multiple services share the same Google project, they also share quotas. One noisy job can push everyone into 429.
A safe retry strategy (exponential backoff + jitter)
If you retry immediately, you usually make 429 worse. Use exponential backoff with a random jitter.
// Pseudocode (language-agnostic)
maxRetries = 6
baseDelayMs = 1000
for attempt in 0..maxRetries:
res = callGemini()
if res.ok:
return res
if res.status != 429:
throw res.error
// backoff with jitter
delay = baseDelayMs * (2 ** attempt)
jitter = random(0, 250)
sleep(delay + jitter)
throw "429 persisted after retries" Batching: fix the “one row = one request” trap
If you’re classifying or summarizing rows, batch them. For example, send 20–50 rows in one prompt, return JSON, then split results back into the sheet.
If you prefer not to manage API keys at all, you can use Gemini in Google Sheets with AI for Sheets formulas (no coding and no API key required).
Monitoring and prevention
- Log 429 rate: track how often it happens and which job causes it.
- Queue work: a simple queue smooths bursts and keeps you under limits.
- Separate projects: isolate quotas per app if you run multiple workloads.
FAQ
Is 429 a permanent ban?
Usually no. It is a temporary rate-limit response. When you slow down and retry with backoff, requests typically succeed.
Why do I see 429 in bursts even with a low average rate?
Concurrency spikes are the most common reason. Cap in-flight requests and batch rows.
Should I switch models when I hit 429?
Switching models can help if different models have different limits in your environment, but the reliable fix is controlling request rate and adding backoff.
Next step
If you’re unsure whether your error is rate limiting or service capacity, compare this guide with Gemini 503 overload and Gemini API key troubleshooting.