Graceful Shutdown in Go Microservices Done Right
A service that dies cleanly is a service you can deploy with confidence. Graceful shutdown means: stop accepting new work, finish what you’re doing, clean up resources, then exit. Here’s how to get it right.
The Complete Shutdown Sequence
func main() {
ctx, cancel := signal.NotifyContext(context.Background(),
syscall.SIGINT, syscall.SIGTERM)
defer cancel()
// Initialize resources
db := mustInitDB()
defer db.Close()
cache := mustInitRedis()
defer cache.Close()
publisher := mustInitKafka()
defer publisher.Close()
// Start components
server := newHTTPServer(db, cache, publisher)
worker := newWorker(db, publisher)
// Run everything
g, gCtx := errgroup.WithContext(ctx)
g.Go(func() error {
slog.Info("http server starting", "addr", ":8080")
return server.ListenAndServe()
})
g.Go(func() error {
return worker.Run(gCtx)
})
g.Go(func() error {
<-gCtx.Done()
slog.Info("initiating graceful shutdown")
shutdownCtx, shutdownCancel := context.WithTimeout(
context.Background(), 30*time.Second)
defer shutdownCancel()
return server.Shutdown(shutdownCtx)
})
if err := g.Wait(); err != nil && !errors.Is(err, http.ErrServerClosed) {
slog.Error("shutdown error", "error", err)
os.Exit(1)
}
slog.Info("shutdown complete")
}
The flow:
- SIGTERM arrives → context cancels
- HTTP server stops accepting new connections
- In-flight requests complete (up to 30s)
- Workers stop polling for new jobs, finish current jobs
- Database and Redis connections close
- Process exits
HTTP Server Shutdown
http.Server.Shutdown does the heavy lifting:
- Closes listeners (no new connections)
- Waits for active requests to complete
- Returns when all handlers finish or context expires
func newHTTPServer(deps ...interface{}) *http.Server {
return &http.Server{
Addr: ":8080",
Handler: mux,
ReadTimeout: 5 * time.Second,
WriteTimeout: 10 * time.Second,
IdleTimeout: 60 * time.Second,
}
}
Set WriteTimeout to be less than your graceful shutdown deadline. Otherwise, slow handlers might cause a forced kill.
Database Connection Draining
Close the pool after all handlers finish — not before:
// The defer order matters!
db := mustInitDB()
defer db.Close() // Closes AFTER server.Shutdown returns
// server.Shutdown waits for handlers to finish
// handlers use db connections
// db.Close after all handlers done = no connection errors
Worker Shutdown
Workers need to finish their current job before exiting:
func (w *Worker) Run(ctx context.Context) error {
for {
select {
case <-ctx.Done():
slog.Info("worker received shutdown signal")
return nil
default:
}
job, err := w.queue.Dequeue(ctx, time.Second)
if err != nil {
if ctx.Err() != nil {
return nil // Shutting down
}
continue
}
// Process with a separate context so the job can finish
// even if the parent context is cancelled
jobCtx, jobCancel := context.WithTimeout(
context.Background(), job.Timeout)
err = w.process(jobCtx, job)
jobCancel()
if err != nil {
w.handleFailure(ctx, job, err)
}
}
}
Key detail: the job runs with context.Background(), not the cancelled parent context. This lets in-flight jobs finish. The for loop checks ctx.Done() before dequeuing — so no new jobs are started after shutdown signal.
Kubernetes Integration
Kubernetes shutdown sequence:
- Pod marked for termination
- Pod removed from Service endpoints (no new traffic)
preStophook runs (if configured)- SIGTERM sent to container
- Wait
terminationGracePeriodSeconds(default 30s) - SIGKILL if still running
Add a preStop hook to handle the gap between endpoint removal and SIGTERM:
lifecycle:
preStop:
exec:
command: ["sleep", "5"]
This 5-second sleep gives load balancers time to stop routing traffic before your app starts shutting down.
Health Check During Shutdown
During shutdown, your readiness probe should fail so the load balancer stops sending traffic:
var shuttingDown atomic.Bool
func readinessHandler(w http.ResponseWriter, r *http.Request) {
if shuttingDown.Load() {
w.WriteHeader(http.StatusServiceUnavailable)
return
}
w.WriteHeader(http.StatusOK)
}
// In shutdown sequence:
func gracefulShutdown(server *http.Server) {
shuttingDown.Store(true)
time.Sleep(5 * time.Second) // Let LB detect unhealthy
server.Shutdown(context.Background())
}
Testing Shutdown
func TestGracefulShutdown(t *testing.T) {
server := setupTestServer()
// Start a long-running request
var requestCompleted atomic.Bool
go func() {
resp, err := http.Get("http://localhost:8080/slow")
require.NoError(t, err)
assert.Equal(t, 200, resp.StatusCode)
requestCompleted.Store(true)
}()
time.Sleep(100 * time.Millisecond) // Let request start
// Trigger shutdown
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
err := server.Shutdown(ctx)
require.NoError(t, err)
// The in-flight request should have completed
assert.True(t, requestCompleted.Load())
}
Graceful shutdown is a deploy-time safety net. Get it right once, and every deploy becomes a non-event. Get it wrong, and every deploy risks dropped requests and data loss.