Profiling Go Services in Production: CPU, Memory, and Goroutine Leaks
“It’s slow” is the most common and least useful bug report. Profiling turns vague complaints into precise diagnoses. Here’s how I profile Go services in production without disrupting traffic.
Always-On pprof Endpoint
Every Go service I deploy exposes pprof:
import _ "net/http/pprof"
func main() {
// Serve pprof on a separate port — don't expose it publicly
go func() {
slog.Info("pprof listening", "addr", ":6060")
http.ListenAndServe(":6060", nil)
}()
// Your actual server
runServer()
}
In Kubernetes, this port isn’t exposed externally. I access it via kubectl port-forward.
CPU Profiling
When latency spikes, grab a 30-second CPU profile:
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
Inside pprof:
(pprof) top 20
(pprof) web # Visualize call graph
(pprof) list funcName # Source-level view
Real example: we found 40% of CPU was spent in json.Marshal. The fix: switch to encoding/json/v2 for hot paths and cache serialized responses where possible.
// Before: marshal on every request
func handler(w http.ResponseWriter, r *http.Request) {
data := getExpensiveData()
json.NewEncoder(w).Encode(data)
}
// After: cache serialized response
var cache sync.Map
func handler(w http.ResponseWriter, r *http.Request) {
key := r.URL.Path
if cached, ok := cache.Load(key); ok {
w.Header().Set("Content-Type", "application/json")
w.Write(cached.([]byte))
return
}
data := getExpensiveData()
bytes, _ := json.Marshal(data)
cache.Store(key, bytes)
w.Header().Set("Content-Type", "application/json")
w.Write(bytes)
}
Memory Profiling
For memory issues, grab a heap profile:
go tool pprof http://localhost:6060/debug/pprof/heap
Two views matter:
(pprof) top -inuse_space # What's currently allocated
(pprof) top -alloc_space # What's been allocated total (GC pressure)
inuse_space shows leaks. alloc_space shows allocation rate — high alloc rate means GC pressure and latency spikes.
Common fixes:
1. Reuse buffers with sync.Pool:
var bufPool = sync.Pool{
New: func() interface{} {
return new(bytes.Buffer)
},
}
func process(data []byte) {
buf := bufPool.Get().(*bytes.Buffer)
buf.Reset()
defer bufPool.Put(buf)
buf.Write(data)
// ... use buf
}
2. Pre-allocate slices when you know the size:
// Before: grows and reallocates
var results []Item
for _, row := range rows {
results = append(results, transform(row))
}
// After: single allocation
results := make([]Item, 0, len(rows))
for _, row := range rows {
results = append(results, transform(row))
}
Goroutine Leaks
The sneakiest problem. Goroutines that never exit consume memory and CPU, growing slowly until the process OOMs.
Check goroutine count:
curl http://localhost:6060/debug/pprof/goroutine?debug=1
If goroutine count grows over time, you have a leak. Get a full dump:
go tool pprof http://localhost:6060/debug/pprof/goroutine
(pprof) top
(pprof) traces # See full stack traces
Most common cause: goroutines blocked on channels that will never receive:
// LEAK: if ctx is never cancelled and ch never receives, this goroutine lives forever
go func() {
select {
case val := <-ch:
process(val)
case <-ctx.Done():
return
}
}()
The fix: always ensure goroutines have an exit path. Use context.WithTimeout or context.WithCancel.
Runtime Metrics
Export runtime stats to your metrics system:
func collectRuntimeMetrics() {
ticker := time.NewTicker(10 * time.Second)
defer ticker.Stop()
for range ticker.C {
var m runtime.MemStats
runtime.ReadMemStats(&m)
goroutineCount.Set(float64(runtime.NumGoroutine()))
heapAlloc.Set(float64(m.HeapAlloc))
heapInuse.Set(float64(m.HeapInuse))
gcPauseMs.Observe(float64(m.PauseNs[(m.NumGC+255)%256]) / 1e6)
gcRuns.Set(float64(m.NumGC))
}
}
Alert on:
- Goroutine count trending upward (leak)
- Heap allocation trending upward (memory leak)
- GC pause duration spikes (allocation pressure)
Continuous Profiling
For persistent issues, use continuous profiling with tools like Pyroscope or Datadog’s profiler:
import "github.com/grafana/pyroscope-go"
func main() {
pyroscope.Start(pyroscope.Config{
ApplicationName: "order-service",
ServerAddress: "http://pyroscope:4040",
ProfileTypes: []pyroscope.ProfileType{
pyroscope.ProfileCPU,
pyroscope.ProfileAllocObjects,
pyroscope.ProfileGoroutines,
},
})
runServer()
}
This records profiles continuously and lets you compare “now” vs “last week” to find regressions.
The Profiling Workflow
- Observe: metrics show latency spike or memory growth
- Profile: grab the relevant profile (CPU, heap, goroutine)
- Identify: find the hot function or allocation site
- Fix: optimize the specific bottleneck
- Verify: check metrics after deploy
Don’t guess. Profile. The data will surprise you — the bottleneck is never where you think it is.