Rate Limiting in Go for High-Traffic APIs
Rate limiting protects your API from abuse, prevents cascade failures, and ensures fair resource allocation. Here’s how I implement it in Go services.
Token Bucket: The Standard
The token bucket algorithm is the most common approach. Tokens are added at a fixed rate. Each request consumes a token. No tokens = rejected.
Go’s golang.org/x/time/rate implements this:
import "golang.org/x/time/rate"
func RateLimitMiddleware(rps float64, burst int) func(http.Handler) http.Handler {
limiter := rate.NewLimiter(rate.Limit(rps), burst)
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if !limiter.Allow() {
w.Header().Set("Retry-After", "1")
http.Error(w, "rate limit exceeded", http.StatusTooManyRequests)
return
}
next.ServeHTTP(w, r)
})
}
}
This is a global rate limit — all users share the same bucket. Good for protecting your service, but unfair: one heavy user can starve everyone else.
Per-User Rate Limiting
Each user gets their own bucket:
type UserRateLimiter struct {
mu sync.Mutex
limiters map[string]*rate.Limiter
rps float64
burst int
}
func NewUserRateLimiter(rps float64, burst int) *UserRateLimiter {
return &UserRateLimiter{
limiters: make(map[string]*rate.Limiter),
rps: rps,
burst: burst,
}
}
func (l *UserRateLimiter) GetLimiter(userID string) *rate.Limiter {
l.mu.Lock()
defer l.mu.Unlock()
limiter, exists := l.limiters[userID]
if !exists {
limiter = rate.NewLimiter(rate.Limit(l.rps), l.burst)
l.limiters[userID] = limiter
}
return limiter
}
func (l *UserRateLimiter) Middleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
userID := getUserID(r)
if !l.GetLimiter(userID).Allow() {
w.Header().Set("Retry-After", "1")
http.Error(w, "rate limit exceeded", http.StatusTooManyRequests)
return
}
next.ServeHTTP(w, r)
})
}
Problem: limiters accumulate in memory. Clean up periodically:
func (l *UserRateLimiter) cleanup() {
ticker := time.NewTicker(10 * time.Minute)
for range ticker.C {
l.mu.Lock()
// Remove limiters that haven't been used recently
// In practice, track last access time per limiter
if len(l.limiters) > 100_000 {
l.limiters = make(map[string]*rate.Limiter)
}
l.mu.Unlock()
}
}
Distributed Rate Limiting with Redis
When you have multiple service instances, in-memory rate limiting doesn’t work — each instance has its own counters. Use Redis for a shared state.
Sliding window counter:
type RedisRateLimiter struct {
redis *redis.Client
limit int
window time.Duration
}
func (l *RedisRateLimiter) Allow(ctx context.Context, key string) (bool, error) {
now := time.Now().UnixMilli()
windowStart := now - l.window.Milliseconds()
pipe := l.redis.Pipeline()
// Remove expired entries
pipe.ZRemRangeByScore(ctx, key, "0", strconv.FormatInt(windowStart, 10))
// Count current window
countCmd := pipe.ZCard(ctx, key)
// Add current request
pipe.ZAdd(ctx, key, redis.Z{Score: float64(now), Member: now})
// Set expiry on the key
pipe.Expire(ctx, key, l.window)
_, err := pipe.Exec(ctx)
if err != nil {
return false, err
}
return countCmd.Val() < int64(l.limit), nil
}
Fixed window with Lua for atomicity:
var rateLimitScript = redis.NewScript(`
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local current = redis.call('INCR', key)
if current == 1 then
redis.call('EXPIRE', key, window)
end
if current > limit then
return 0
end
return 1
`)
func (l *RedisRateLimiter) AllowFixed(ctx context.Context, key string) (bool, error) {
windowKey := fmt.Sprintf("rl:%s:%d", key, time.Now().Unix()/int64(l.window.Seconds()))
result, err := rateLimitScript.Run(ctx, l.redis, []string{windowKey}, l.limit, int(l.window.Seconds())).Int()
if err != nil {
return false, err
}
return result == 1, nil
}
Response Headers
Always tell clients about their rate limit status:
func (l *RateLimiter) Middleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
userID := getUserID(r)
remaining, resetAt, allowed := l.Check(r.Context(), userID)
w.Header().Set("X-RateLimit-Limit", strconv.Itoa(l.limit))
w.Header().Set("X-RateLimit-Remaining", strconv.Itoa(remaining))
w.Header().Set("X-RateLimit-Reset", strconv.FormatInt(resetAt.Unix(), 10))
if !allowed {
w.Header().Set("Retry-After", strconv.Itoa(int(time.Until(resetAt).Seconds())))
http.Error(w, "rate limit exceeded", http.StatusTooManyRequests)
return
}
next.ServeHTTP(w, r)
})
}
Tiered Rate Limits
Different limits for different users:
type Tier struct {
RPS float64
Burst int
}
var tiers = map[string]Tier{
"free": {RPS: 10, Burst: 20},
"pro": {RPS: 100, Burst: 200},
"enterprise": {RPS: 1000, Burst: 2000},
}
func getTier(ctx context.Context, userID string) Tier {
plan := getUserPlan(ctx, userID)
if tier, ok := tiers[plan]; ok {
return tier
}
return tiers["free"]
}
Which Strategy When?
| Strategy | Use When |
|---|---|
| Global token bucket | Simple API protection |
| Per-user in-memory | Single instance, moderate users |
| Redis sliding window | Multi-instance, needs accuracy |
| Redis fixed window | Multi-instance, simpler, slight inaccuracy at window boundaries |
Rate limiting is about protecting your system while being fair to legitimate users. Always return helpful headers so clients can back off intelligently instead of hammering your API.