Profiling Go Services in Production: CPU, Memory, and Goroutine Leaks

“It’s slow” is the most common and least useful bug report. Profiling turns vague complaints into precise diagnoses. Here’s how I profile Go services in production without disrupting traffic.

Always-On pprof Endpoint

Every Go service I deploy exposes pprof:

import _ "net/http/pprof"

func main() {
    // Serve pprof on a separate port — don't expose it publicly
    go func() {
        slog.Info("pprof listening", "addr", ":6060")
        http.ListenAndServe(":6060", nil)
    }()

    // Your actual server
    runServer()
}

In Kubernetes, this port isn’t exposed externally. I access it via kubectl port-forward.

CPU Profiling

When latency spikes, grab a 30-second CPU profile:

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

Inside pprof:

(pprof) top 20
(pprof) web          # Visualize call graph
(pprof) list funcName # Source-level view

Real example: we found 40% of CPU was spent in json.Marshal. The fix: switch to encoding/json/v2 for hot paths and cache serialized responses where possible.

// Before: marshal on every request
func handler(w http.ResponseWriter, r *http.Request) {
    data := getExpensiveData()
    json.NewEncoder(w).Encode(data)
}

// After: cache serialized response
var cache sync.Map

func handler(w http.ResponseWriter, r *http.Request) {
    key := r.URL.Path
    if cached, ok := cache.Load(key); ok {
        w.Header().Set("Content-Type", "application/json")
        w.Write(cached.([]byte))
        return
    }

    data := getExpensiveData()
    bytes, _ := json.Marshal(data)
    cache.Store(key, bytes)
    w.Header().Set("Content-Type", "application/json")
    w.Write(bytes)
}

Memory Profiling

For memory issues, grab a heap profile:

go tool pprof http://localhost:6060/debug/pprof/heap

Two views matter:

(pprof) top -inuse_space    # What's currently allocated
(pprof) top -alloc_space    # What's been allocated total (GC pressure)

inuse_space shows leaks. alloc_space shows allocation rate — high alloc rate means GC pressure and latency spikes.

Common fixes:

1. Reuse buffers with sync.Pool:

var bufPool = sync.Pool{
    New: func() interface{} {
        return new(bytes.Buffer)
    },
}

func process(data []byte) {
    buf := bufPool.Get().(*bytes.Buffer)
    buf.Reset()
    defer bufPool.Put(buf)

    buf.Write(data)
    // ... use buf
}

2. Pre-allocate slices when you know the size:

// Before: grows and reallocates
var results []Item
for _, row := range rows {
    results = append(results, transform(row))
}

// After: single allocation
results := make([]Item, 0, len(rows))
for _, row := range rows {
    results = append(results, transform(row))
}

Goroutine Leaks

The sneakiest problem. Goroutines that never exit consume memory and CPU, growing slowly until the process OOMs.

Check goroutine count:

curl http://localhost:6060/debug/pprof/goroutine?debug=1

If goroutine count grows over time, you have a leak. Get a full dump:

go tool pprof http://localhost:6060/debug/pprof/goroutine

(pprof) top
(pprof) traces  # See full stack traces

Most common cause: goroutines blocked on channels that will never receive:

// LEAK: if ctx is never cancelled and ch never receives, this goroutine lives forever
go func() {
    select {
    case val := <-ch:
        process(val)
    case <-ctx.Done():
        return
    }
}()

The fix: always ensure goroutines have an exit path. Use context.WithTimeout or context.WithCancel.

Runtime Metrics

Export runtime stats to your metrics system:

func collectRuntimeMetrics() {
    ticker := time.NewTicker(10 * time.Second)
    defer ticker.Stop()

    for range ticker.C {
        var m runtime.MemStats
        runtime.ReadMemStats(&m)

        goroutineCount.Set(float64(runtime.NumGoroutine()))
        heapAlloc.Set(float64(m.HeapAlloc))
        heapInuse.Set(float64(m.HeapInuse))
        gcPauseMs.Observe(float64(m.PauseNs[(m.NumGC+255)%256]) / 1e6)
        gcRuns.Set(float64(m.NumGC))
    }
}

Alert on:

Goroutine count trending upward (leak)
Heap allocation trending upward (memory leak)
GC pause duration spikes (allocation pressure)

Continuous Profiling

For persistent issues, use continuous profiling with tools like Pyroscope or Datadog’s profiler:

import "github.com/grafana/pyroscope-go"

func main() {
    pyroscope.Start(pyroscope.Config{
        ApplicationName: "order-service",
        ServerAddress:   "http://pyroscope:4040",
        ProfileTypes: []pyroscope.ProfileType{
            pyroscope.ProfileCPU,
            pyroscope.ProfileAllocObjects,
            pyroscope.ProfileGoroutines,
        },
    })

    runServer()
}

This records profiles continuously and lets you compare “now” vs “last week” to find regressions.

The Profiling Workflow

Observe: metrics show latency spike or memory growth
Profile: grab the relevant profile (CPU, heap, goroutine)
Identify: find the hot function or allocation site
Fix: optimize the specific bottleneck
Verify: check metrics after deploy

Don’t guess. Profile. The data will surprise you — the bottleneck is never where you think it is.