Memory Optimization Techniques for High-Traffic Go Services

When your Go service handles thousands of requests per second, memory allocation patterns directly impact latency. Every allocation is future GC work. Here are the techniques that made the biggest difference in our services.

Measure First

Before optimizing, know where allocations happen:

go test -bench=. -benchmem -memprofile=mem.out ./...
go tool pprof -alloc_space mem.out

In pprof:

(pprof) top 20 -cum

This shows cumulative allocations — where your code creates the most garbage.

sync.Pool for Temporary Buffers

The most impactful optimization for request-scoped objects:

var bufferPool = sync.Pool{
    New: func() interface{} {
        buf := make([]byte, 0, 4096)
        return &buf
    },
}

func handleRequest(w http.ResponseWriter, r *http.Request) {
    bufPtr := bufferPool.Get().(*[]byte)
    buf := (*bufPtr)[:0] // Reset length, keep capacity
    defer func() {
        *bufPtr = buf
        bufferPool.Put(bufPtr)
    }()

    // Use buf for request processing
    buf = append(buf, readBody(r)...)
    process(buf)
}

Key rules:

Pool objects that are allocated and freed in the same request cycle
Always reset state before returning to pool
Don’t pool objects that vary wildly in size (the pool keeps the largest ones)

Pre-Allocate Slices

When you know the size, tell Go:

// BAD: 10+ allocations as slice grows
func getUsers(ids []string) []User {
    var users []User
    for _, id := range ids {
        users = append(users, fetchUser(id))
    }
    return users
}

// GOOD: 1 allocation
func getUsers(ids []string) []User {
    users := make([]User, 0, len(ids))
    for _, id := range ids {
        users = append(users, fetchUser(id))
    }
    return users
}

For maps too:

m := make(map[string]int, expectedSize)

Avoid String Concatenation in Loops

Strings in Go are immutable. Every concatenation allocates:

// BAD: O(n²) allocations
func buildCSV(rows [][]string) string {
    result := ""
    for _, row := range rows {
        result += strings.Join(row, ",") + "\n"
    }
    return result
}

// GOOD: single allocation
func buildCSV(rows [][]string) string {
    var b strings.Builder
    b.Grow(len(rows) * 100) // Estimate total size
    for _, row := range rows {
        for i, col := range row {
            if i > 0 {
                b.WriteByte(',')
            }
            b.WriteString(col)
        }
        b.WriteByte('\n')
    }
    return b.String()
}

Reuse byte Slices for JSON

JSON encoding allocates heavily. Reuse the output buffer:

var jsonBufPool = sync.Pool{
    New: func() interface{} {
        return bytes.NewBuffer(make([]byte, 0, 1024))
    },
}

func marshalJSON(v interface{}) ([]byte, error) {
    buf := jsonBufPool.Get().(*bytes.Buffer)
    buf.Reset()
    defer jsonBufPool.Put(buf)

    enc := json.NewEncoder(buf)
    enc.SetEscapeHTML(false)
    if err := enc.Encode(v); err != nil {
        return nil, err
    }

    // Copy out — the buffer will be reused
    result := make([]byte, buf.Len())
    copy(result, buf.Bytes())
    return result, nil
}

Reduce Pointer Chasing

Slices of structs are more cache-friendly than slices of pointers:

// Cache-unfriendly: each pointer dereference may cache-miss
users := []*User{&u1, &u2, &u3}

// Cache-friendly: contiguous memory
users := []User{u1, u2, u3}

This matters when iterating over large collections. Contiguous memory means the CPU prefetcher can work effectively.

String Interning for Repeated Values

If the same strings appear repeatedly (status codes, enum values), intern them:

var statusStrings = map[string]string{
    "active":    "active",
    "inactive":  "inactive",
    "pending":   "pending",
    "completed": "completed",
}

func internStatus(s string) string {
    if interned, ok := statusStrings[s]; ok {
        return interned
    }
    return s
}

When deserializing 100K records that all have status: "active", this avoids 100K separate string allocations.

Monitor Memory in Production

func reportMemStats() {
    ticker := time.NewTicker(10 * time.Second)
    for range ticker.C {
        var m runtime.MemStats
        runtime.ReadMemStats(&m)

        heapAllocGauge.Set(float64(m.HeapAlloc))
        heapObjectsGauge.Set(float64(m.HeapObjects))
        gcPauseHist.Observe(float64(m.PauseNs[(m.NumGC+255)%256]) / 1e6)
        allocRateGauge.Set(float64(m.TotalAlloc))
    }
}

Alert on:

HeapAlloc trending up: memory leak
GC pause > 10ms: too much garbage
HeapObjects trending up: object leak

Memory optimization in Go is about reducing GC pressure. Fewer allocations mean fewer GC pauses mean lower P99 latency. The techniques are simple — the discipline to apply them consistently in hot paths is what matters.