Rate Limiting in Go for High-Traffic APIs

Rate limiting protects your API from abuse, prevents cascade failures, and ensures fair resource allocation. Here’s how I implement it in Go services.

Token Bucket: The Standard

The token bucket algorithm is the most common approach. Tokens are added at a fixed rate. Each request consumes a token. No tokens = rejected.

Go’s golang.org/x/time/rate implements this:

import "golang.org/x/time/rate"

func RateLimitMiddleware(rps float64, burst int) func(http.Handler) http.Handler {
    limiter := rate.NewLimiter(rate.Limit(rps), burst)

    return func(next http.Handler) http.Handler {
        return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
            if !limiter.Allow() {
                w.Header().Set("Retry-After", "1")
                http.Error(w, "rate limit exceeded", http.StatusTooManyRequests)
                return
            }
            next.ServeHTTP(w, r)
        })
    }
}

This is a global rate limit — all users share the same bucket. Good for protecting your service, but unfair: one heavy user can starve everyone else.

Per-User Rate Limiting

Each user gets their own bucket:

type UserRateLimiter struct {
    mu       sync.Mutex
    limiters map[string]*rate.Limiter
    rps      float64
    burst    int
}

func NewUserRateLimiter(rps float64, burst int) *UserRateLimiter {
    return &UserRateLimiter{
        limiters: make(map[string]*rate.Limiter),
        rps:      rps,
        burst:    burst,
    }
}

func (l *UserRateLimiter) GetLimiter(userID string) *rate.Limiter {
    l.mu.Lock()
    defer l.mu.Unlock()

    limiter, exists := l.limiters[userID]
    if !exists {
        limiter = rate.NewLimiter(rate.Limit(l.rps), l.burst)
        l.limiters[userID] = limiter
    }
    return limiter
}

func (l *UserRateLimiter) Middleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        userID := getUserID(r)
        if !l.GetLimiter(userID).Allow() {
            w.Header().Set("Retry-After", "1")
            http.Error(w, "rate limit exceeded", http.StatusTooManyRequests)
            return
        }
        next.ServeHTTP(w, r)
    })
}

Problem: limiters accumulate in memory. Clean up periodically:

func (l *UserRateLimiter) cleanup() {
    ticker := time.NewTicker(10 * time.Minute)
    for range ticker.C {
        l.mu.Lock()
        // Remove limiters that haven't been used recently
        // In practice, track last access time per limiter
        if len(l.limiters) > 100_000 {
            l.limiters = make(map[string]*rate.Limiter)
        }
        l.mu.Unlock()
    }
}

Distributed Rate Limiting with Redis

When you have multiple service instances, in-memory rate limiting doesn’t work — each instance has its own counters. Use Redis for a shared state.

Sliding window counter:

type RedisRateLimiter struct {
    redis  *redis.Client
    limit  int
    window time.Duration
}

func (l *RedisRateLimiter) Allow(ctx context.Context, key string) (bool, error) {
    now := time.Now().UnixMilli()
    windowStart := now - l.window.Milliseconds()

    pipe := l.redis.Pipeline()

    // Remove expired entries
    pipe.ZRemRangeByScore(ctx, key, "0", strconv.FormatInt(windowStart, 10))

    // Count current window
    countCmd := pipe.ZCard(ctx, key)

    // Add current request
    pipe.ZAdd(ctx, key, redis.Z{Score: float64(now), Member: now})

    // Set expiry on the key
    pipe.Expire(ctx, key, l.window)

    _, err := pipe.Exec(ctx)
    if err != nil {
        return false, err
    }

    return countCmd.Val() < int64(l.limit), nil
}

Fixed window with Lua for atomicity:

var rateLimitScript = redis.NewScript(`
    local key = KEYS[1]
    local limit = tonumber(ARGV[1])
    local window = tonumber(ARGV[2])

    local current = redis.call('INCR', key)
    if current == 1 then
        redis.call('EXPIRE', key, window)
    end

    if current > limit then
        return 0
    end
    return 1
`)

func (l *RedisRateLimiter) AllowFixed(ctx context.Context, key string) (bool, error) {
    windowKey := fmt.Sprintf("rl:%s:%d", key, time.Now().Unix()/int64(l.window.Seconds()))
    result, err := rateLimitScript.Run(ctx, l.redis, []string{windowKey}, l.limit, int(l.window.Seconds())).Int()
    if err != nil {
        return false, err
    }
    return result == 1, nil
}

Response Headers

Always tell clients about their rate limit status:

func (l *RateLimiter) Middleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        userID := getUserID(r)
        remaining, resetAt, allowed := l.Check(r.Context(), userID)

        w.Header().Set("X-RateLimit-Limit", strconv.Itoa(l.limit))
        w.Header().Set("X-RateLimit-Remaining", strconv.Itoa(remaining))
        w.Header().Set("X-RateLimit-Reset", strconv.FormatInt(resetAt.Unix(), 10))

        if !allowed {
            w.Header().Set("Retry-After", strconv.Itoa(int(time.Until(resetAt).Seconds())))
            http.Error(w, "rate limit exceeded", http.StatusTooManyRequests)
            return
        }

        next.ServeHTTP(w, r)
    })
}

Tiered Rate Limits

Different limits for different users:

type Tier struct {
    RPS   float64
    Burst int
}

var tiers = map[string]Tier{
    "free":       {RPS: 10, Burst: 20},
    "pro":        {RPS: 100, Burst: 200},
    "enterprise": {RPS: 1000, Burst: 2000},
}

func getTier(ctx context.Context, userID string) Tier {
    plan := getUserPlan(ctx, userID)
    if tier, ok := tiers[plan]; ok {
        return tier
    }
    return tiers["free"]
}

Which Strategy When?

Strategy	Use When
Global token bucket	Simple API protection
Per-user in-memory	Single instance, moderate users
Redis sliding window	Multi-instance, needs accuracy
Redis fixed window	Multi-instance, simpler, slight inaccuracy at window boundaries

Rate limiting is about protecting your system while being fair to legitimate users. Always return helpful headers so clients can back off intelligently instead of hammering your API.