api-design api rate limitingapi gatewayredis rate limiting

API Gateway Rate Limiting: Redis vs In-Memory Strategies

Master API rate limiting strategies with Redis and in-memory solutions. Compare performance, scalability, and implementation patterns for optimal API gateway design.

📖 11 min read 📅 June 2, 2026 ✍ By PropTechUSA AI
11m
Read Time
2.1k
Words
22
Sections

When your [API](/workers) gateway starts handling thousands of requests per second, rate limiting becomes the difference between a resilient system and a catastrophic failure. The choice between Redis-based and in-memory rate limiting strategies can make or break your application's performance under load.

In the PropTech industry, where [real estate](/offer-check) platforms process millions of property searches, listing updates, and user interactions daily, implementing the right rate limiting strategy is crucial for maintaining service quality while protecting backend resources.

Understanding Rate Limiting Fundamentals

The Critical Role of API Rate Limiting

API rate limiting serves as your first line of defense against service degradation, protecting your infrastructure from both malicious attacks and legitimate traffic spikes. In modern distributed systems, rate limiting operates at multiple layers, with the API gateway serving as the primary enforcement point.

Effective rate limiting prevents resource exhaustion, ensures fair usage across clients, and maintains consistent response times even during peak traffic periods. For PropTech platforms handling real-time property data feeds and user interactions, this protection is essential for delivering reliable service.

Common Rate Limiting Algorithms

Before diving into storage strategies, understanding the core algorithms helps inform architectural decisions:

Token Bucket Algorithm provides burst capacity while maintaining average rate limits, ideal for APIs that need to handle occasional traffic spikes while enforcing long-term limits.

Fixed Window offers simple implementation with predictable resource usage but can allow traffic bursts at window boundaries that may overwhelm downstream services.

Sliding Window delivers smoother traffic distribution by maintaining granular request history, though at the cost of increased memory usage and computational overhead.

Sliding Window Log provides the most accurate rate limiting by tracking individual request timestamps, but requires significant storage and processing resources.

State Management Challenges

The fundamental challenge in API gateway rate limiting lies in maintaining accurate request counters across distributed systems. Traditional single-server applications can rely on local memory, but modern microservices architectures require shared state management.

This shared state requirement introduces latency, consistency, and availability trade-offs that directly impact your rate limiting effectiveness. Understanding these trade-offs guides the choice between centralized Redis storage and distributed in-memory approaches.

Redis-Based Rate Limiting Architecture

Centralized State Management Benefits

Redis excels as a centralized rate limiting store due to its atomic operations, built-in expiration handling, and high-performance networking. By maintaining global request counters in Redis, all API gateway instances share consistent rate limiting state.

This centralized approach eliminates the "thundering herd" problem where distributed counters allow traffic bursts that exceed intended limits. For PropTech platforms with multiple gateway instances serving global traffic, Redis ensures uniform rate limiting enforcement regardless of request routing.

Implementation Patterns with Redis

Here's a robust Redis-based rate limiting implementation using the sliding window approach:

typescript
import Redis from 'ioredis';

class RedisRateLimiter {

private redis: Redis;

private windowSizeMs: number;

private maxRequests: number;

constructor(redis: Redis, windowSizeMs: number, maxRequests: number) {

this.redis = redis;

this.windowSizeMs = windowSizeMs;

this.maxRequests = maxRequests;

}

async isAllowed(clientId: string): Promise<{ allowed: boolean; remaining: number; resetTime: number }> {

const now = Date.now();

const windowStart = now - this.windowSizeMs;

const key = rate_limit:${clientId};

const [pipeline](/custom-crm) = this.redis.pipeline();

// Remove expired entries

pipeline.zremrangebyscore(key, 0, windowStart);

// Count current requests in window

pipeline.zcard(key);

// Add current request

pipeline.zadd(key, now, ${now}-${Math.random()});

// Set key expiration

pipeline.expire(key, Math.ceil(this.windowSizeMs / 1000));

const results = await pipeline.exec();

const currentCount = results[1][1] as number;

if (currentCount >= this.maxRequests) {

// Remove the request we just added since it's not allowed

await this.redis.zrem(key, ${now}-${Math.random()});

return {

allowed: false,

remaining: 0,

resetTime: now + this.windowSizeMs

};

}

return {

allowed: true,

remaining: this.maxRequests - currentCount - 1,

resetTime: now + this.windowSizeMs

};

}

}

Redis Lua Scripts for Atomic Operations

For production systems requiring absolute consistency, Lua scripts ensure atomic rate limit checks:

lua
local key = KEYS[1]

local window_size = tonumber(ARGV[1])

local max_requests = tonumber(ARGV[2])

local now = tonumber(ARGV[3])

local window_start = now - window_size

-- Clean expired entries

redis.call('ZREMRANGEBYSCORE', key, 0, window_start)

-- Get current count

local current_count = redis.call('ZCARD', key)

-- Check if request is allowed

if current_count >= max_requests then

local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')

local reset_time = oldest[2] and (oldest[2] + window_size) or (now + window_size)

return {0, 0, reset_time}

end

-- Add current request

redis.call('ZADD', key, now, now .. '-' .. math.random())

redis.call('EXPIRE', key, math.ceil(window_size / 1000))

return {1, max_requests - current_count - 1, now + window_size}

Performance Optimization Strategies

Redis performance optimization requires careful attention to connection pooling, pipeline usage, and data structure selection. Connection pooling prevents the overhead of establishing new Redis connections for each rate limit check.

Pipelining multiple Redis commands reduces network round trips, crucial for high-throughput scenarios. The sliding window log approach using sorted sets provides accurate counting but consider simpler fixed windows for extremely high-volume APIs where slight accuracy trade-offs are acceptable.

💡
Pro TipUse Redis EXPIRE commands judiciously. Setting expiration on every operation can impact performance. Instead, use background cleanup processes for high-volume scenarios.

In-Memory Rate Limiting Approaches

Local Cache Advantages

In-memory rate limiting offers microsecond response times and eliminates external dependencies. For applications with predictable traffic patterns or those requiring ultra-low latency, local memory stores provide optimal performance.

This approach works particularly well for single-instance applications or scenarios where slight over-limiting is acceptable in exchange for performance gains. PropTech APIs serving real-time property price updates benefit from in-memory caching when response speed is critical.

Distributed In-Memory Solutions

Modern distributed caching solutions bridge the gap between pure local memory and centralized Redis storage:

typescript
class DistributedMemoryRateLimiter {

private localCache: Map<string, RequestWindow>;

private syncInterval: number;

private gossipNetwork: GossipProtocol;

constructor(syncIntervalMs: number = 1000) {

this.localCache = new Map();

this.syncInterval = syncIntervalMs;

this.setupGossipSync();

}

async checkRate(clientId: string, limit: number, windowMs: number): Promise<RateLimitResult> {

const now = Date.now();

const window = this.getOrCreateWindow(clientId, windowMs);

// Clean expired requests

window.requests = window.requests.filter(timestamp => timestamp > now - windowMs);

if (window.requests.length >= limit) {

return {

allowed: false,

remaining: 0,

resetTime: Math.min(...window.requests) + windowMs

};

}

window.requests.push(now);

return {

allowed: true,

remaining: limit - window.requests.length,

resetTime: now + windowMs

};

}

private setupGossipSync(): void {

setInterval(() => {

this.syncCountersWithPeers();

}, this.syncInterval);

}

private async syncCountersWithPeers(): Promise<void> {

const localState = this.getLocalState();

const peerUpdates = await this.gossipNetwork.exchange(localState);

this.mergeRemoteUpdates(peerUpdates);

}

}

Hybrid Approaches

Hybrid architectures combine local caching with periodic synchronization, balancing performance with accuracy:

typescript
class HybridRateLimiter {

private localCache: Map<string, LocalCounter>;

private redisClient: Redis;

private syncThreshold: number;

async checkRateLimit(clientId: string, limit: number): Promise<RateLimitResult> {

const localCounter = this.localCache.get(clientId) || this.createLocalCounter();

// Fast path: check local counter first

if (localCounter.count < Math.floor(limit * 0.8)) {

localCounter.count++;

return { allowed: true, remaining: limit - localCounter.count };

}

// Slow path: check Redis for accurate count

return await this.checkRedisCounter(clientId, limit);

}

private async syncToRedis(clientId: string, localCount: number): Promise<void> {

if (localCount >= this.syncThreshold) {

await this.redisClient.incrby(counter:${clientId}, localCount);

this.localCache.get(clientId).count = 0;

}

}

}

Performance Analysis and Best Practices

Latency Characteristics

Redis-based rate limiting typically introduces 1-5ms latency depending on network conditions and Redis server performance. This latency is acceptable for most API scenarios but can become significant for ultra-high-frequency trading or real-time gaming applications.

In-memory solutions operate in microseconds but require careful coordination in distributed environments. The choice depends on your specific latency requirements and consistency needs.

Scalability Considerations

Redis scaling follows different patterns than in-memory approaches. Vertical scaling (larger Redis instances) works well for most scenarios, while horizontal scaling requires sharding or clustering strategies.

In-memory solutions scale naturally with application instances but require sophisticated synchronization mechanisms to maintain accuracy. Consider your growth projections and operational complexity when choosing approaches.

Monitoring and Observability

Effective rate limiting requires comprehensive monitoring:

typescript
class MonitoredRateLimiter {

private [metrics](/dashboards): MetricsCollector;

private rateLimiter: RateLimiter;

async checkRate(clientId: string, limit: number): Promise<RateLimitResult> {

const startTime = performance.now();

try {

const result = await this.rateLimiter.checkRate(clientId, limit);

this.metrics.recordLatency('rate_limit_check', performance.now() - startTime);

this.metrics.increment(rate_limit.${result.allowed ? 'allowed' : 'blocked'});

if (!result.allowed) {

this.metrics.increment('rate_limit.blocked', { client: clientId });

}

return result;

} catch (error) {

this.metrics.increment('rate_limit.error');

// Fail open or closed based on your requirements

return this.handleRateLimitError(error);

}

}

}

Error Handling Strategies

Rate limiting systems must gracefully handle failures. "Fail open" approaches allow requests through when rate limiting is unavailable, prioritizing availability over protection. "Fail closed" approaches block requests, prioritizing security over availability.

⚠️
WarningAlways implement circuit breakers for external rate limiting stores like Redis. A failed rate limiter should not bring down your entire API gateway.

Strategic Implementation Guidelines

Choosing the Right Strategy

Your rate limiting strategy should align with specific requirements:

Choose Redis when:

Choose in-memory when:

Choose hybrid when:

Integration with Modern API Gateways

At PropTechUSA.ai, we've implemented flexible rate limiting that adapts to different property data APIs' unique requirements. Real estate listing APIs need burst capacity for market updates, while user authentication APIs require strict limiting to prevent abuse.

Modern API gateway solutions should support pluggable rate limiting strategies, allowing different endpoints to use optimal approaches based on their specific needs.

Future-Proofing Your Implementation

Design rate limiting systems with evolution in mind. Abstract rate limiting logic behind interfaces that can swap implementations as requirements change. Consider emerging patterns like adaptive rate limiting that adjusts limits based on system health and traffic patterns.

Implement comprehensive testing strategies that validate rate limiting behavior under various failure scenarios. Your rate limiting system is only as reliable as its weakest failure mode.

The choice between Redis and in-memory rate limiting strategies ultimately depends on your specific requirements for accuracy, latency, and operational complexity. By understanding the trade-offs and implementing robust monitoring and error handling, you can build rate limiting systems that scale with your growing API ecosystem.

Ready to implement enterprise-grade rate limiting for your API gateway? Explore how PropTechUSA.ai's [platform](/saas-platform) provides battle-tested rate limiting strategies optimized for high-performance property technology applications.

🚀 Ready to Build?

Let's discuss how we can help with your project.

Start Your Project →