Designing Go Services That Scale Horizontally

Horizontal scaling means adding more instances of your service to handle more load. It sounds simple — just run more copies. But your service needs to be designed for it.

The Stateless Requirement

A horizontally scalable service must be stateless. Any instance can handle any request. No local state that other instances need.

Things that break horizontal scaling:

In-memory sessions
Local file storage
In-process caches that must be consistent
Goroutines that hold state across requests

Externalize Everything

Sessions → Redis:

type SessionStore struct {
    redis *redis.Client
}

func (s *SessionStore) Get(ctx context.Context, token string) (*Session, error) {
    data, err := s.redis.Get(ctx, "session:"+token).Bytes()
    if err != nil {
        return nil, ErrSessionNotFound
    }
    var session Session
    json.Unmarshal(data, &session)
    return &session, nil
}

func (s *SessionStore) Set(ctx context.Context, session *Session) error {
    data, _ := json.Marshal(session)
    return s.redis.Set(ctx, "session:"+session.Token, data, 24*time.Hour).Err()
}

File uploads → Object storage (S3):

func (s *UploadService) Upload(ctx context.Context, file io.Reader, name string) (string, error) {
    key := fmt.Sprintf("uploads/%s/%s", time.Now().Format("2006/01/02"), name)
    _, err := s.s3.PutObject(ctx, &s3.PutObjectInput{
        Bucket: &s.bucket,
        Key:    &key,
        Body:   file,
    })
    return key, err
}

Scheduled tasks → Distributed scheduler:

Don’t use in-process cron. Use a distributed lock so only one instance runs each job:

func (s *Scheduler) RunExclusive(ctx context.Context, name string, fn func() error) error {
    lockKey := "lock:job:" + name
    acquired, err := s.redis.SetNX(ctx, lockKey, s.instanceID, 5*time.Minute).Result()
    if err != nil || !acquired {
        return nil // Another instance has it
    }
    defer s.redis.Del(ctx, lockKey)

    return fn()
}

Health Checks for Load Balancers

Load balancers need to know which instances are healthy:

func (s *Server) readinessHandler(w http.ResponseWriter, r *http.Request) {
    ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
    defer cancel()

    if err := s.db.Ping(ctx); err != nil {
        w.WriteHeader(http.StatusServiceUnavailable)
        json.NewEncoder(w).Encode(map[string]string{"db": err.Error()})
        return
    }

    w.WriteHeader(http.StatusOK)
    json.NewEncoder(w).Encode(map[string]string{"status": "ready"})
}

func (s *Server) livenessHandler(w http.ResponseWriter, r *http.Request) {
    // Liveness: is the process alive? Don't check dependencies.
    w.WriteHeader(http.StatusOK)
}

Separate liveness from readiness. A service can be alive but not ready (warming up, loading cache). A service that fails liveness gets restarted. A service that fails readiness stops receiving traffic.

Graceful Shutdown

When scaling down, instances must drain connections before exiting:

func main() {
    server := &http.Server{Addr: ":8080", Handler: mux}

    go func() {
        if err := server.ListenAndServe(); err != http.ErrServerClosed {
            slog.Error("server error", "error", err)
        }
    }()

    // Wait for shutdown signal
    ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGTERM)
    defer stop()
    <-ctx.Done()

    // Give in-flight requests time to complete
    shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()

    slog.Info("shutting down gracefully")
    server.Shutdown(shutdownCtx)
}

Kubernetes sends SIGTERM, waits terminationGracePeriodSeconds (default 30s), then sends SIGKILL. Your graceful shutdown must complete within that window.

Cache Strategies

In-process caches are fine for read-heavy, stale-tolerant data. But be aware:

// Each instance has its own cache — writes are not immediately visible
// across instances. This is OK if you accept eventual consistency.
type LocalCache struct {
    data sync.Map
    ttl  time.Duration
}

For data that must be consistent across instances, use Redis:

// Shared cache — all instances see the same data
func (c *RedisCache) GetOrSet(ctx context.Context, key string, fn func() (interface{}, error)) (interface{}, error) {
    val, err := c.redis.Get(ctx, key).Result()
    if err == nil {
        return val, nil
    }

    result, err := fn()
    if err != nil {
        return nil, err
    }

    c.redis.Set(ctx, key, result, c.ttl)
    return result, nil
}

Instance Identity

Sometimes you need to know which instance is handling a request (for debugging):

var instanceID = os.Getenv("HOSTNAME") // Set by Kubernetes

func LoggingMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        slog.Info("request",
            "instance", instanceID,
            "method", r.Method,
            "path", r.URL.Path,
        )
        w.Header().Set("X-Instance-ID", instanceID)
        next.ServeHTTP(w, r)
    })
}

Horizontal scaling isn’t a feature you bolt on. It’s a constraint you design for from the start: no local state, externalized storage, health checks, and graceful shutdown.