Designing Go Services That Scale Horizontally
Horizontal scaling means adding more instances of your service to handle more load. It sounds simple — just run more copies. But your service needs to be designed for it.
The Stateless Requirement
A horizontally scalable service must be stateless. Any instance can handle any request. No local state that other instances need.
Things that break horizontal scaling:
- In-memory sessions
- Local file storage
- In-process caches that must be consistent
- Goroutines that hold state across requests
Externalize Everything
Sessions → Redis:
type SessionStore struct {
redis *redis.Client
}
func (s *SessionStore) Get(ctx context.Context, token string) (*Session, error) {
data, err := s.redis.Get(ctx, "session:"+token).Bytes()
if err != nil {
return nil, ErrSessionNotFound
}
var session Session
json.Unmarshal(data, &session)
return &session, nil
}
func (s *SessionStore) Set(ctx context.Context, session *Session) error {
data, _ := json.Marshal(session)
return s.redis.Set(ctx, "session:"+session.Token, data, 24*time.Hour).Err()
}
File uploads → Object storage (S3):
func (s *UploadService) Upload(ctx context.Context, file io.Reader, name string) (string, error) {
key := fmt.Sprintf("uploads/%s/%s", time.Now().Format("2006/01/02"), name)
_, err := s.s3.PutObject(ctx, &s3.PutObjectInput{
Bucket: &s.bucket,
Key: &key,
Body: file,
})
return key, err
}
Scheduled tasks → Distributed scheduler:
Don’t use in-process cron. Use a distributed lock so only one instance runs each job:
func (s *Scheduler) RunExclusive(ctx context.Context, name string, fn func() error) error {
lockKey := "lock:job:" + name
acquired, err := s.redis.SetNX(ctx, lockKey, s.instanceID, 5*time.Minute).Result()
if err != nil || !acquired {
return nil // Another instance has it
}
defer s.redis.Del(ctx, lockKey)
return fn()
}
Health Checks for Load Balancers
Load balancers need to know which instances are healthy:
func (s *Server) readinessHandler(w http.ResponseWriter, r *http.Request) {
ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
defer cancel()
if err := s.db.Ping(ctx); err != nil {
w.WriteHeader(http.StatusServiceUnavailable)
json.NewEncoder(w).Encode(map[string]string{"db": err.Error()})
return
}
w.WriteHeader(http.StatusOK)
json.NewEncoder(w).Encode(map[string]string{"status": "ready"})
}
func (s *Server) livenessHandler(w http.ResponseWriter, r *http.Request) {
// Liveness: is the process alive? Don't check dependencies.
w.WriteHeader(http.StatusOK)
}
Separate liveness from readiness. A service can be alive but not ready (warming up, loading cache). A service that fails liveness gets restarted. A service that fails readiness stops receiving traffic.
Graceful Shutdown
When scaling down, instances must drain connections before exiting:
func main() {
server := &http.Server{Addr: ":8080", Handler: mux}
go func() {
if err := server.ListenAndServe(); err != http.ErrServerClosed {
slog.Error("server error", "error", err)
}
}()
// Wait for shutdown signal
ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGTERM)
defer stop()
<-ctx.Done()
// Give in-flight requests time to complete
shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
slog.Info("shutting down gracefully")
server.Shutdown(shutdownCtx)
}
Kubernetes sends SIGTERM, waits terminationGracePeriodSeconds (default 30s), then sends SIGKILL. Your graceful shutdown must complete within that window.
Cache Strategies
In-process caches are fine for read-heavy, stale-tolerant data. But be aware:
// Each instance has its own cache — writes are not immediately visible
// across instances. This is OK if you accept eventual consistency.
type LocalCache struct {
data sync.Map
ttl time.Duration
}
For data that must be consistent across instances, use Redis:
// Shared cache — all instances see the same data
func (c *RedisCache) GetOrSet(ctx context.Context, key string, fn func() (interface{}, error)) (interface{}, error) {
val, err := c.redis.Get(ctx, key).Result()
if err == nil {
return val, nil
}
result, err := fn()
if err != nil {
return nil, err
}
c.redis.Set(ctx, key, result, c.ttl)
return result, nil
}
Instance Identity
Sometimes you need to know which instance is handling a request (for debugging):
var instanceID = os.Getenv("HOSTNAME") // Set by Kubernetes
func LoggingMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
slog.Info("request",
"instance", instanceID,
"method", r.Method,
"path", r.URL.Path,
)
w.Header().Set("X-Instance-ID", instanceID)
next.ServeHTTP(w, r)
})
}
Horizontal scaling isn’t a feature you bolt on. It’s a constraint you design for from the start: no local state, externalized storage, health checks, and graceful shutdown.