← back to posts

Profiling Go Services in Production: CPU, Memory, and Goroutine Leaks

“It’s slow” is the most common and least useful bug report. Profiling turns vague complaints into precise diagnoses. Here’s how I profile Go services in production without disrupting traffic.

Always-On pprof Endpoint

Every Go service I deploy exposes pprof:

import _ "net/http/pprof"

func main() {
    // Serve pprof on a separate port — don't expose it publicly
    go func() {
        slog.Info("pprof listening", "addr", ":6060")
        http.ListenAndServe(":6060", nil)
    }()

    // Your actual server
    runServer()
}

In Kubernetes, this port isn’t exposed externally. I access it via kubectl port-forward.

CPU Profiling

When latency spikes, grab a 30-second CPU profile:

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

Inside pprof:

(pprof) top 20
(pprof) web          # Visualize call graph
(pprof) list funcName # Source-level view

Real example: we found 40% of CPU was spent in json.Marshal. The fix: switch to encoding/json/v2 for hot paths and cache serialized responses where possible.

// Before: marshal on every request
func handler(w http.ResponseWriter, r *http.Request) {
    data := getExpensiveData()
    json.NewEncoder(w).Encode(data)
}

// After: cache serialized response
var cache sync.Map

func handler(w http.ResponseWriter, r *http.Request) {
    key := r.URL.Path
    if cached, ok := cache.Load(key); ok {
        w.Header().Set("Content-Type", "application/json")
        w.Write(cached.([]byte))
        return
    }

    data := getExpensiveData()
    bytes, _ := json.Marshal(data)
    cache.Store(key, bytes)
    w.Header().Set("Content-Type", "application/json")
    w.Write(bytes)
}

Memory Profiling

For memory issues, grab a heap profile:

go tool pprof http://localhost:6060/debug/pprof/heap

Two views matter:

(pprof) top -inuse_space    # What's currently allocated
(pprof) top -alloc_space    # What's been allocated total (GC pressure)

inuse_space shows leaks. alloc_space shows allocation rate — high alloc rate means GC pressure and latency spikes.

Common fixes:

1. Reuse buffers with sync.Pool:

var bufPool = sync.Pool{
    New: func() interface{} {
        return new(bytes.Buffer)
    },
}

func process(data []byte) {
    buf := bufPool.Get().(*bytes.Buffer)
    buf.Reset()
    defer bufPool.Put(buf)

    buf.Write(data)
    // ... use buf
}

2. Pre-allocate slices when you know the size:

// Before: grows and reallocates
var results []Item
for _, row := range rows {
    results = append(results, transform(row))
}

// After: single allocation
results := make([]Item, 0, len(rows))
for _, row := range rows {
    results = append(results, transform(row))
}

Goroutine Leaks

The sneakiest problem. Goroutines that never exit consume memory and CPU, growing slowly until the process OOMs.

Check goroutine count:

curl http://localhost:6060/debug/pprof/goroutine?debug=1

If goroutine count grows over time, you have a leak. Get a full dump:

go tool pprof http://localhost:6060/debug/pprof/goroutine
(pprof) top
(pprof) traces  # See full stack traces

Most common cause: goroutines blocked on channels that will never receive:

// LEAK: if ctx is never cancelled and ch never receives, this goroutine lives forever
go func() {
    select {
    case val := <-ch:
        process(val)
    case <-ctx.Done():
        return
    }
}()

The fix: always ensure goroutines have an exit path. Use context.WithTimeout or context.WithCancel.

Runtime Metrics

Export runtime stats to your metrics system:

func collectRuntimeMetrics() {
    ticker := time.NewTicker(10 * time.Second)
    defer ticker.Stop()

    for range ticker.C {
        var m runtime.MemStats
        runtime.ReadMemStats(&m)

        goroutineCount.Set(float64(runtime.NumGoroutine()))
        heapAlloc.Set(float64(m.HeapAlloc))
        heapInuse.Set(float64(m.HeapInuse))
        gcPauseMs.Observe(float64(m.PauseNs[(m.NumGC+255)%256]) / 1e6)
        gcRuns.Set(float64(m.NumGC))
    }
}

Alert on:

  • Goroutine count trending upward (leak)
  • Heap allocation trending upward (memory leak)
  • GC pause duration spikes (allocation pressure)

Continuous Profiling

For persistent issues, use continuous profiling with tools like Pyroscope or Datadog’s profiler:

import "github.com/grafana/pyroscope-go"

func main() {
    pyroscope.Start(pyroscope.Config{
        ApplicationName: "order-service",
        ServerAddress:   "http://pyroscope:4040",
        ProfileTypes: []pyroscope.ProfileType{
            pyroscope.ProfileCPU,
            pyroscope.ProfileAllocObjects,
            pyroscope.ProfileGoroutines,
        },
    })

    runServer()
}

This records profiles continuously and lets you compare “now” vs “last week” to find regressions.

The Profiling Workflow

  1. Observe: metrics show latency spike or memory growth
  2. Profile: grab the relevant profile (CPU, heap, goroutine)
  3. Identify: find the hot function or allocation site
  4. Fix: optimize the specific bottleneck
  5. Verify: check metrics after deploy

Don’t guess. Profile. The data will surprise you — the bottleneck is never where you think it is.