← back to posts

Graceful Shutdown in Go Microservices Done Right

A service that dies cleanly is a service you can deploy with confidence. Graceful shutdown means: stop accepting new work, finish what you’re doing, clean up resources, then exit. Here’s how to get it right.

The Complete Shutdown Sequence

func main() {
    ctx, cancel := signal.NotifyContext(context.Background(),
        syscall.SIGINT, syscall.SIGTERM)
    defer cancel()

    // Initialize resources
    db := mustInitDB()
    defer db.Close()

    cache := mustInitRedis()
    defer cache.Close()

    publisher := mustInitKafka()
    defer publisher.Close()

    // Start components
    server := newHTTPServer(db, cache, publisher)
    worker := newWorker(db, publisher)

    // Run everything
    g, gCtx := errgroup.WithContext(ctx)

    g.Go(func() error {
        slog.Info("http server starting", "addr", ":8080")
        return server.ListenAndServe()
    })

    g.Go(func() error {
        return worker.Run(gCtx)
    })

    g.Go(func() error {
        <-gCtx.Done()
        slog.Info("initiating graceful shutdown")

        shutdownCtx, shutdownCancel := context.WithTimeout(
            context.Background(), 30*time.Second)
        defer shutdownCancel()

        return server.Shutdown(shutdownCtx)
    })

    if err := g.Wait(); err != nil && !errors.Is(err, http.ErrServerClosed) {
        slog.Error("shutdown error", "error", err)
        os.Exit(1)
    }

    slog.Info("shutdown complete")
}

The flow:

  1. SIGTERM arrives → context cancels
  2. HTTP server stops accepting new connections
  3. In-flight requests complete (up to 30s)
  4. Workers stop polling for new jobs, finish current jobs
  5. Database and Redis connections close
  6. Process exits

HTTP Server Shutdown

http.Server.Shutdown does the heavy lifting:

  • Closes listeners (no new connections)
  • Waits for active requests to complete
  • Returns when all handlers finish or context expires
func newHTTPServer(deps ...interface{}) *http.Server {
    return &http.Server{
        Addr:         ":8080",
        Handler:      mux,
        ReadTimeout:  5 * time.Second,
        WriteTimeout: 10 * time.Second,
        IdleTimeout:  60 * time.Second,
    }
}

Set WriteTimeout to be less than your graceful shutdown deadline. Otherwise, slow handlers might cause a forced kill.

Database Connection Draining

Close the pool after all handlers finish — not before:

// The defer order matters!
db := mustInitDB()
defer db.Close() // Closes AFTER server.Shutdown returns

// server.Shutdown waits for handlers to finish
// handlers use db connections
// db.Close after all handlers done = no connection errors

Worker Shutdown

Workers need to finish their current job before exiting:

func (w *Worker) Run(ctx context.Context) error {
    for {
        select {
        case <-ctx.Done():
            slog.Info("worker received shutdown signal")
            return nil
        default:
        }

        job, err := w.queue.Dequeue(ctx, time.Second)
        if err != nil {
            if ctx.Err() != nil {
                return nil // Shutting down
            }
            continue
        }

        // Process with a separate context so the job can finish
        // even if the parent context is cancelled
        jobCtx, jobCancel := context.WithTimeout(
            context.Background(), job.Timeout)
        err = w.process(jobCtx, job)
        jobCancel()

        if err != nil {
            w.handleFailure(ctx, job, err)
        }
    }
}

Key detail: the job runs with context.Background(), not the cancelled parent context. This lets in-flight jobs finish. The for loop checks ctx.Done() before dequeuing — so no new jobs are started after shutdown signal.

Kubernetes Integration

Kubernetes shutdown sequence:

  1. Pod marked for termination
  2. Pod removed from Service endpoints (no new traffic)
  3. preStop hook runs (if configured)
  4. SIGTERM sent to container
  5. Wait terminationGracePeriodSeconds (default 30s)
  6. SIGKILL if still running

Add a preStop hook to handle the gap between endpoint removal and SIGTERM:

lifecycle:
  preStop:
    exec:
      command: ["sleep", "5"]

This 5-second sleep gives load balancers time to stop routing traffic before your app starts shutting down.

Health Check During Shutdown

During shutdown, your readiness probe should fail so the load balancer stops sending traffic:

var shuttingDown atomic.Bool

func readinessHandler(w http.ResponseWriter, r *http.Request) {
    if shuttingDown.Load() {
        w.WriteHeader(http.StatusServiceUnavailable)
        return
    }
    w.WriteHeader(http.StatusOK)
}

// In shutdown sequence:
func gracefulShutdown(server *http.Server) {
    shuttingDown.Store(true)
    time.Sleep(5 * time.Second) // Let LB detect unhealthy
    server.Shutdown(context.Background())
}

Testing Shutdown

func TestGracefulShutdown(t *testing.T) {
    server := setupTestServer()

    // Start a long-running request
    var requestCompleted atomic.Bool
    go func() {
        resp, err := http.Get("http://localhost:8080/slow")
        require.NoError(t, err)
        assert.Equal(t, 200, resp.StatusCode)
        requestCompleted.Store(true)
    }()

    time.Sleep(100 * time.Millisecond) // Let request start

    // Trigger shutdown
    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()
    err := server.Shutdown(ctx)
    require.NoError(t, err)

    // The in-flight request should have completed
    assert.True(t, requestCompleted.Load())
}

Graceful shutdown is a deploy-time safety net. Get it right once, and every deploy becomes a non-event. Get it wrong, and every deploy risks dropped requests and data loss.