← back to posts

Production-Ready Logging and Observability in Go

You can’t debug what you can’t see. Observability isn’t a nice-to-have — it’s the difference between a 5-minute fix and a 5-hour investigation. Here’s the stack I use for every Go service.

Structured Logging with slog

Go 1.21’s slog package is now my default. No more third-party logging libraries.

func setupLogger(env string) *slog.Logger {
    var handler slog.Handler
    if env == "production" {
        handler = slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
            Level: slog.LevelInfo,
        })
    } else {
        handler = slog.NewTextHandler(os.Stdout, &slog.HandlerOptions{
            Level: slog.LevelDebug,
        })
    }
    return slog.New(handler)
}

JSON in production (machine-parseable), text in development (human-readable).

Request Context in Every Log

The most important pattern: every log line includes the request context.

func RequestLogger(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        correlationID := r.Header.Get("X-Correlation-ID")
        if correlationID == "" {
            correlationID = uuid.New().String()
        }

        logger := slog.With(
            "correlation_id", correlationID,
            "method", r.Method,
            "path", r.URL.Path,
        )

        ctx := withLogger(r.Context(), logger)
        next.ServeHTTP(w, r.WithContext(ctx))
    })
}

// Anywhere in the codebase:
func processOrder(ctx context.Context, order Order) error {
    log := loggerFromContext(ctx)
    log.Info("processing order", "order_id", order.ID)
    // ...
}

Every log line automatically includes correlation_id, method, and path. When debugging, filter by correlation_id to see the complete request flow.

Metrics with Prometheus

Four golden signals: latency, traffic, errors, saturation.

var (
    httpRequestDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "http_request_duration_seconds",
            Buckets: []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5},
        },
        []string{"method", "path", "status"},
    )

    httpRequestsTotal = prometheus.NewCounterVec(
        prometheus.CounterOpts{Name: "http_requests_total"},
        []string{"method", "path", "status"},
    )

    dbQueryDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "db_query_duration_seconds",
            Buckets: []float64{.001, .005, .01, .025, .05, .1, .25, .5, 1},
        },
        []string{"query"},
    )

    dbConnectionsInUse = prometheus.NewGauge(
        prometheus.GaugeOpts{Name: "db_connections_in_use"},
    )
)

func MetricsMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        start := time.Now()
        wrapped := &statusRecorder{ResponseWriter: w, statusCode: 200}

        next.ServeHTTP(wrapped, r)

        status := strconv.Itoa(wrapped.statusCode)
        duration := time.Since(start).Seconds()

        httpRequestDuration.WithLabelValues(r.Method, r.URL.Path, status).Observe(duration)
        httpRequestsTotal.WithLabelValues(r.Method, r.URL.Path, status).Inc()
    })
}

Distributed Tracing

For microservices, traces show the complete journey of a request:

import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
    "go.opentelemetry.io/otel/sdk/trace"
)

func initTracer(ctx context.Context, serviceName string) (func(), error) {
    exporter, err := otlptracehttp.New(ctx)
    if err != nil {
        return nil, err
    }

    tp := trace.NewTracerProvider(
        trace.WithBatcher(exporter),
        trace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceName(serviceName),
        )),
    )
    otel.SetTracerProvider(tp)

    return func() { tp.Shutdown(context.Background()) }, nil
}

Add spans to your service calls:

var tracer = otel.Tracer("order-service")

func (s *OrderService) Create(ctx context.Context, order Order) (*Order, error) {
    ctx, span := tracer.Start(ctx, "OrderService.Create")
    defer span.End()

    span.SetAttributes(
        attribute.String("customer_id", order.CustomerID),
        attribute.Int("item_count", len(order.Items)),
    )

    // Downstream calls automatically create child spans
    if err := s.inventory.Reserve(ctx, order.Items); err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, err
    }

    return s.repo.Create(ctx, order)
}

Alerting Rules

Metrics are useless without alerts. The essentials:

# High error rate
- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
  for: 5m

# High latency
- alert: HighLatency
  expr: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) > 2
  for: 5m

# Database connection pool exhaustion
- alert: DBPoolExhausted
  expr: db_connections_in_use / db_connections_max > 0.9
  for: 2m

The Three Pillars Together

A single request should be traceable through all three:

  1. Logs: detailed event-by-event record, filterable by correlation_id
  2. Metrics: aggregate trends — is error rate increasing? Is latency growing?
  3. Traces: visual timeline of a request across services
User request → API Gateway → Order Service → Payment Service → DB
     │              │              │               │           │
     └── trace_id: abc-123 links all spans together
     └── correlation_id: abc-123 in every log line
     └── http_request_duration_seconds metric updated at each service

When an alert fires (metrics), you look at traces to identify which service is slow, then look at logs for that service to find the specific error. The three pillars work together.

Observability is the most important infrastructure you’ll build. Everything else — debugging, performance optimization, capacity planning — depends on it.