Memory Optimization Techniques for High-Traffic Go Services
When your Go service handles thousands of requests per second, memory allocation patterns directly impact latency. Every allocation is future GC work. Here are the techniques that made the biggest difference in our services.
Measure First
Before optimizing, know where allocations happen:
go test -bench=. -benchmem -memprofile=mem.out ./...
go tool pprof -alloc_space mem.out
In pprof:
(pprof) top 20 -cum
This shows cumulative allocations — where your code creates the most garbage.
sync.Pool for Temporary Buffers
The most impactful optimization for request-scoped objects:
var bufferPool = sync.Pool{
New: func() interface{} {
buf := make([]byte, 0, 4096)
return &buf
},
}
func handleRequest(w http.ResponseWriter, r *http.Request) {
bufPtr := bufferPool.Get().(*[]byte)
buf := (*bufPtr)[:0] // Reset length, keep capacity
defer func() {
*bufPtr = buf
bufferPool.Put(bufPtr)
}()
// Use buf for request processing
buf = append(buf, readBody(r)...)
process(buf)
}
Key rules:
- Pool objects that are allocated and freed in the same request cycle
- Always reset state before returning to pool
- Don’t pool objects that vary wildly in size (the pool keeps the largest ones)
Pre-Allocate Slices
When you know the size, tell Go:
// BAD: 10+ allocations as slice grows
func getUsers(ids []string) []User {
var users []User
for _, id := range ids {
users = append(users, fetchUser(id))
}
return users
}
// GOOD: 1 allocation
func getUsers(ids []string) []User {
users := make([]User, 0, len(ids))
for _, id := range ids {
users = append(users, fetchUser(id))
}
return users
}
For maps too:
m := make(map[string]int, expectedSize)
Avoid String Concatenation in Loops
Strings in Go are immutable. Every concatenation allocates:
// BAD: O(n²) allocations
func buildCSV(rows [][]string) string {
result := ""
for _, row := range rows {
result += strings.Join(row, ",") + "\n"
}
return result
}
// GOOD: single allocation
func buildCSV(rows [][]string) string {
var b strings.Builder
b.Grow(len(rows) * 100) // Estimate total size
for _, row := range rows {
for i, col := range row {
if i > 0 {
b.WriteByte(',')
}
b.WriteString(col)
}
b.WriteByte('\n')
}
return b.String()
}
Reuse byte Slices for JSON
JSON encoding allocates heavily. Reuse the output buffer:
var jsonBufPool = sync.Pool{
New: func() interface{} {
return bytes.NewBuffer(make([]byte, 0, 1024))
},
}
func marshalJSON(v interface{}) ([]byte, error) {
buf := jsonBufPool.Get().(*bytes.Buffer)
buf.Reset()
defer jsonBufPool.Put(buf)
enc := json.NewEncoder(buf)
enc.SetEscapeHTML(false)
if err := enc.Encode(v); err != nil {
return nil, err
}
// Copy out — the buffer will be reused
result := make([]byte, buf.Len())
copy(result, buf.Bytes())
return result, nil
}
Reduce Pointer Chasing
Slices of structs are more cache-friendly than slices of pointers:
// Cache-unfriendly: each pointer dereference may cache-miss
users := []*User{&u1, &u2, &u3}
// Cache-friendly: contiguous memory
users := []User{u1, u2, u3}
This matters when iterating over large collections. Contiguous memory means the CPU prefetcher can work effectively.
String Interning for Repeated Values
If the same strings appear repeatedly (status codes, enum values), intern them:
var statusStrings = map[string]string{
"active": "active",
"inactive": "inactive",
"pending": "pending",
"completed": "completed",
}
func internStatus(s string) string {
if interned, ok := statusStrings[s]; ok {
return interned
}
return s
}
When deserializing 100K records that all have status: "active", this avoids 100K separate string allocations.
Monitor Memory in Production
func reportMemStats() {
ticker := time.NewTicker(10 * time.Second)
for range ticker.C {
var m runtime.MemStats
runtime.ReadMemStats(&m)
heapAllocGauge.Set(float64(m.HeapAlloc))
heapObjectsGauge.Set(float64(m.HeapObjects))
gcPauseHist.Observe(float64(m.PauseNs[(m.NumGC+255)%256]) / 1e6)
allocRateGauge.Set(float64(m.TotalAlloc))
}
}
Alert on:
- HeapAlloc trending up: memory leak
- GC pause > 10ms: too much garbage
- HeapObjects trending up: object leak
Memory optimization in Go is about reducing GC pressure. Fewer allocations mean fewer GC pauses mean lower P99 latency. The techniques are simple — the discipline to apply them consistently in hot paths is what matters.