API Security Performance

Rate Limiting &
Throttling Patterns

Token buckets, sliding windows, and distributed rate limiting. Protect your APIs without frustrating legitimate users.

๐Ÿ“– 13 min read January 24, 2026

Rate limiting is the difference between a stable API and a crashed one. But bad rate limiting frustrates users, breaks integrations, and costs you customers. The goal isn't just protectionโ€”it's invisible protection.

Here's how we rate limit across 28 Workers handling millions of requests.

Algorithm Best For Pros Cons
Fixed Window Simple APIs Easy to implement Burst at boundaries
Sliding Window Most APIs Smooth limiting More storage
Token Bucket Bursty traffic Allows bursts Complex tuning
Leaky Bucket Steady output Constant rate No burst allowed

Pattern 1: Sliding Window Counter

The best balance of accuracy and simplicity for most APIs:

sliding-window.ts
interface RateLimitResult { allowed: boolean; remaining: number; resetAt: number; } async function checkRateLimit( key: string, limit: number, windowMs: number, env: Env ): Promise<RateLimitResult> { const now = Date.now(); const windowStart = now - windowMs; const currentWindow = Math.floor(now / windowMs); const previousWindow = currentWindow - 1; // Get counts for current and previous windows const [current, previous] = await Promise.all([ env.KV.get(`ratelimit:${key}:${currentWindow}`), env.KV.get(`ratelimit:${key}:${previousWindow}`) ]); const currentCount = parseInt(current || '0'); const previousCount = parseInt(previous || '0'); // Calculate weighted count (sliding window approximation) const windowProgress = (now % windowMs) / windowMs; const weightedCount = currentCount + previousCount * (1 - windowProgress); const allowed = weightedCount < limit; if (allowed) { // Increment counter await env.KV.put( `ratelimit:${key}:${currentWindow}`, (currentCount + 1).toString(), { expirationTtl: Math.ceil(windowMs / 1000) * 2 } ); } return { allowed, remaining: Math.max(0, Math.floor(limit - weightedCount)), resetAt: (currentWindow + 1) * windowMs }; }

Pattern 2: Token Bucket

Allow bursts while maintaining average rate:

token-bucket.ts
interface TokenBucket { tokens: number; lastRefill: number; } async function tokenBucketLimit( key: string, maxTokens: number, refillRate: number, // tokens per second tokensNeeded: number, env: Env ): Promise<RateLimitResult> { const now = Date.now(); // Get current bucket state const bucketKey = `bucket:${key}`; let bucket = await env.KV.get(bucketKey, 'json') as TokenBucket | null; if (!bucket) { bucket = { tokens: maxTokens, lastRefill: now }; } // Calculate tokens to add since last request const elapsed = (now - bucket.lastRefill) / 1000; const tokensToAdd = elapsed * refillRate; bucket.tokens = Math.min(maxTokens, bucket.tokens + tokensToAdd); bucket.lastRefill = now; // Check if we have enough tokens const allowed = bucket.tokens >= tokensNeeded; if (allowed) { bucket.tokens -= tokensNeeded; } // Save bucket state await env.KV.put(bucketKey, JSON.stringify(bucket), { expirationTtl: 3600 // 1 hour }); return { allowed, remaining: Math.floor(bucket.tokens), resetAt: now + ((maxTokens - bucket.tokens) / refillRate) * 1000 }; }

Pattern 3: Rate Limit Middleware

rate-limit-middleware.ts
interface RateLimitConfig { limit: number; windowMs: number; keyGenerator: (request: Request) => string; } function rateLimit(config: RateLimitConfig) { return (handler: Handler): Handler => { return async (request, env, ctx) => { const key = config.keyGenerator(request); const result = await checkRateLimit( key, config.limit, config.windowMs, env ); // Add rate limit headers to all responses const headers = { 'X-RateLimit-Limit': config.limit.toString(), 'X-RateLimit-Remaining': result.remaining.toString(), 'X-RateLimit-Reset': Math.ceil(result.resetAt / 1000).toString() }; if (!result.allowed) { return new Response(JSON.stringify({ error: 'RATE_LIMITED', message: 'Too many requests', retryAfter: Math.ceil((result.resetAt - Date.now()) / 1000) }), { status: 429, headers: { ...headers, 'Retry-After': Math.ceil((result.resetAt - Date.now()) / 1000).toString(), 'Content-Type': 'application/json' } }); } const response = await handler(request, env, ctx); // Add headers to successful response Object.entries(headers).forEach(([k, v]) => { response.headers.set(k, v); }); return response; }; }; } // Usage with different strategies const apiLimiter = rateLimit({ limit: 100, windowMs: 60000, // 100 requests per minute keyGenerator: (req) => req.headers.get('X-API-Key') || getIP(req) }); const authLimiter = rateLimit({ limit: 5, windowMs: 300000, // 5 attempts per 5 minutes keyGenerator: (req) => `auth:${getIP(req)}` });
User Experience Tip
Always return helpful 429 responses with Retry-After headers. Tell users exactly when they can try again. Include remaining quota in every response so clients can throttle themselves.

Rate Limiting Checklist

  • Use sliding window for most APIs (best accuracy)
  • Use token bucket if you need to allow controlled bursts
  • Always return X-RateLimit-* headers on every response
  • Include Retry-After header on 429 responses
  • Rate limit by API key first, IP address as fallback
  • Set different limits for different endpoints
  • Use higher limits for authenticated users
  • Monitor rate limit hits to tune thresholds

Good rate limiting is invisible to legitimate users. They should never notice it existsโ€”until it saves your API from a traffic spike or attack.

Related Articles

API Gateway Patterns
Read more โ†’
Authentication at the Edge
Read more โ†’
Caching Strategies
Read more โ†’

Need API Protection?

We build APIs that scale without breaking.

โ†’ Get Started