What's the best rate limiting algorithm for APIs?

Sliding window is the best balance of accuracy and simplicity. Token bucket works well for bursty traffic. Fixed window is simplest but allows burst attacks at window boundaries. Choose based on your traffic patterns.

How do you implement distributed rate limiting?

Use Durable Objects for exact counting across edge locations, or KV with eventual consistency for approximate limits. For most APIs, KV with a small buffer (allow 10% over limit) is sufficient and much faster.

What headers should rate-limited APIs return?

Return X-RateLimit-Limit (max requests), X-RateLimit-Remaining (requests left), X-RateLimit-Reset (when limit resets as Unix timestamp), and Retry-After header on 429 responses.

API Security Performance

Rate Limiting &
Throttling Patterns

Token buckets, sliding windows, and distributed rate limiting. Protect your APIs without frustrating legitimate users.

📖 13 min read January 24, 2026

Rate limiting is the difference between a stable API and a crashed one. But bad rate limiting frustrates users, breaks integrations, and costs you customers. The goal isn't just protection—it's invisible protection.

Here's how we rate limit across 28 Workers handling millions of requests.

Algorithm	Best For	Pros	Cons
Fixed Window	Simple APIs	Easy to implement	Burst at boundaries
Sliding Window	Most APIs	Smooth limiting	More storage
Token Bucket	Bursty traffic	Allows bursts	Complex tuning
Leaky Bucket	Steady output	Constant rate	No burst allowed

Pattern 1: Sliding Window Counter

The best balance of accuracy and simplicity for most APIs:

sliding-window.ts

interface RateLimitResult {
  allowed: boolean;
  remaining: number;
  resetAt: number;
}

async function checkRateLimit(
  key: string,
  limit: number,
  windowMs: number,
  env: Env
): Promise<RateLimitResult> {
  const now = Date.now();
  const windowStart = now - windowMs;
  const currentWindow = Math.floor(now / windowMs);
  const previousWindow = currentWindow - 1;
  
  // Get counts for current and previous windows
  const [current, previous] = await Promise.all([
    env.KV.get(`ratelimit:${key}:${currentWindow}`),
    env.KV.get(`ratelimit:${key}:${previousWindow}`)
  ]);
  
  const currentCount = parseInt(current || '0');
  const previousCount = parseInt(previous || '0');
  
  // Calculate weighted count (sliding window approximation)
  const windowProgress = (now % windowMs) / windowMs;
  const weightedCount = currentCount + previousCount * (1 - windowProgress);
  
  const allowed = weightedCount < limit;
  
  if (allowed) {
    // Increment counter
    await env.KV.put(
      `ratelimit:${key}:${currentWindow}`,
      (currentCount + 1).toString(),
      { expirationTtl: Math.ceil(windowMs / 1000) * 2 }
    );
  }
  
  return {
    allowed,
    remaining: Math.max(0, Math.floor(limit - weightedCount)),
    resetAt: (currentWindow + 1) * windowMs
  };
}
                

Pattern 2: Token Bucket

Allow bursts while maintaining average rate:

token-bucket.ts

interface TokenBucket {
  tokens: number;
  lastRefill: number;
}

async function tokenBucketLimit(
  key: string,
  maxTokens: number,
  refillRate: number, // tokens per second
  tokensNeeded: number,
  env: Env
): Promise<RateLimitResult> {
  const now = Date.now();
  
  // Get current bucket state
  const bucketKey = `bucket:${key}`;
  let bucket = await env.KV.get(bucketKey, 'json') as TokenBucket | null;
  
  if (!bucket) {
    bucket = { tokens: maxTokens, lastRefill: now };
  }
  
  // Calculate tokens to add since last request
  const elapsed = (now - bucket.lastRefill) / 1000;
  const tokensToAdd = elapsed * refillRate;
  bucket.tokens = Math.min(maxTokens, bucket.tokens + tokensToAdd);
  bucket.lastRefill = now;
  
  // Check if we have enough tokens
  const allowed = bucket.tokens >= tokensNeeded;
  
  if (allowed) {
    bucket.tokens -= tokensNeeded;
  }
  
  // Save bucket state
  await env.KV.put(bucketKey, JSON.stringify(bucket), {
    expirationTtl: 3600 // 1 hour
  });
  
  return {
    allowed,
    remaining: Math.floor(bucket.tokens),
    resetAt: now + ((maxTokens - bucket.tokens) / refillRate) * 1000
  };
}
                

Pattern 3: Rate Limit Middleware

rate-limit-middleware.ts

interface RateLimitConfig {
  limit: number;
  windowMs: number;
  keyGenerator: (request: Request) => string;
}

function rateLimit(config: RateLimitConfig) {
  return (handler: Handler): Handler => {
    return async (request, env, ctx) => {
      const key = config.keyGenerator(request);
      const result = await checkRateLimit(
        key, config.limit, config.windowMs, env
      );
      
      // Add rate limit headers to all responses
      const headers = {
        'X-RateLimit-Limit': config.limit.toString(),
        'X-RateLimit-Remaining': result.remaining.toString(),
        'X-RateLimit-Reset': Math.ceil(result.resetAt / 1000).toString()
      };
      
      if (!result.allowed) {
        return new Response(JSON.stringify({
          error: 'RATE_LIMITED',
          message: 'Too many requests',
          retryAfter: Math.ceil((result.resetAt - Date.now()) / 1000)
        }), {
          status: 429,
          headers: {
            ...headers,
            'Retry-After': Math.ceil((result.resetAt - Date.now()) / 1000).toString(),
            'Content-Type': 'application/json'
          }
        });
      }
      
      const response = await handler(request, env, ctx);
      
      // Add headers to successful response
      Object.entries(headers).forEach(([k, v]) => {
        response.headers.set(k, v);
      });
      
      return response;
    };
  };
}

// Usage with different strategies
const apiLimiter = rateLimit({
  limit: 100,
  windowMs: 60000, // 100 requests per minute
  keyGenerator: (req) => req.headers.get('X-API-Key') || getIP(req)
});

const authLimiter = rateLimit({
  limit: 5,
  windowMs: 300000, // 5 attempts per 5 minutes
  keyGenerator: (req) => `auth:${getIP(req)}`
});
                

User Experience Tip

Always return helpful 429 responses with Retry-After headers. Tell users exactly when they can try again. Include remaining quota in every response so clients can throttle themselves.

Rate Limiting Checklist

Use sliding window for most APIs (best accuracy)
Use token bucket if you need to allow controlled bursts
Always return X-RateLimit-* headers on every response
Include Retry-After header on 429 responses
Rate limit by API key first, IP address as fallback
Set different limits for different endpoints
Use higher limits for authenticated users
Monitor rate limit hits to tune thresholds

Good rate limiting is invisible to legitimate users. They should never notice it exists—until it saves your API from a traffic spike or attack.

API Gateway Patterns

Authentication at the Edge

Caching Strategies

Need API Protection?

We build APIs that scale without breaking.

→ Get Started

Pattern 1: Sliding Window Counter

Pattern 2: Token Bucket

Pattern 3: Rate Limit Middleware

Rate Limiting Checklist

Related Articles

Need API Protection?