Optimizing Go Services Handling Millions of Requests

When your Go service handles millions of requests per day, small inefficiencies compound. A 1ms overhead per request becomes 16 minutes of wasted compute daily at 1M requests. Here’s where I look for gains.

HTTP Client Reuse

The number one mistake in Go HTTP performance: creating new clients per request.

// BAD: new transport per request, no connection reuse
func callService(url string) (*Response, error) {
    client := &http.Client{Timeout: 5 * time.Second}
    return client.Get(url)
}

// GOOD: shared client with connection pooling
var client = &http.Client{
    Timeout: 10 * time.Second,
    Transport: &http.Transport{
        MaxIdleConns:        100,
        MaxIdleConnsPerHost: 10,
        IdleConnTimeout:     90 * time.Second,
        DisableCompression:  true, // Skip if you handle compression yourself
    },
}

This alone can improve latency by 10-50ms per downstream call by reusing TCP connections and avoiding TLS handshakes.

Response Body: Always Close, Always Drain

A subtle connection pool killer:

resp, err := client.Get(url)
if err != nil {
    return err
}
defer resp.Body.Close()

// If you don't read the full body, the connection can't be reused
io.Copy(io.Discard, resp.Body)

If you don’t drain the body, Go can’t return the connection to the pool. Under load, you exhaust connections and start creating new ones — killing throughput.

Zero-Allocation JSON Handling

Standard encoding/json allocates heavily. For hot paths:

// Use a buffer pool for encoding
var encoderPool = sync.Pool{
    New: func() interface{} {
        return new(bytes.Buffer)
    },
}

func writeJSON(w http.ResponseWriter, v interface{}) error {
    buf := encoderPool.Get().(*bytes.Buffer)
    buf.Reset()
    defer encoderPool.Put(buf)

    if err := json.NewEncoder(buf).Encode(v); err != nil {
        return err
    }

    w.Header().Set("Content-Type", "application/json")
    _, err := w.Write(buf.Bytes())
    return err
}

For read-heavy endpoints, consider caching the serialized bytes.

Database Query Optimization

The biggest wins are usually here. Patterns I use:

Batch reads instead of N+1:

// BAD: N+1 queries
for _, orderID := range orderIDs {
    order, _ := db.GetOrder(ctx, orderID)
    orders = append(orders, order)
}

// GOOD: single query
orders, _ := db.GetOrdersByIDs(ctx, orderIDs)

Use prepared statements for repeated queries:

type OrderRepo struct {
    getByID   *pgx.PreparedStatement
}

func NewOrderRepo(db *pgxpool.Pool) *OrderRepo {
    // Prepare once at startup
    db.Exec(ctx, "PREPARE get_order AS SELECT * FROM orders WHERE id = $1")
    return &OrderRepo{}
}

Index your WHERE clauses. Sounds obvious, but I’ve found missing indexes on production tables more times than I can count:

-- Check for sequential scans
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 'abc' AND status = 'pending';

-- Add composite index
CREATE INDEX CONCURRENTLY idx_orders_customer_status ON orders (customer_id, status);

Reduce Allocations in Hot Paths

Use strings.Builder instead of concatenation:

// BAD: allocates on every concatenation
func buildKey(parts ...string) string {
    result := ""
    for _, p := range parts {
        result += p + ":"
    }
    return result
}

// GOOD: single allocation
func buildKey(parts ...string) string {
    var b strings.Builder
    for i, p := range parts {
        if i > 0 {
            b.WriteByte(':')
        }
        b.WriteString(p)
    }
    return b.String()
}

Use strconv instead of fmt.Sprintf for simple conversions:

// Slow: uses reflection
key := fmt.Sprintf("user:%d", userID)

// Fast: direct conversion
key := "user:" + strconv.FormatInt(userID, 10)

Compression

For APIs returning large payloads, gzip reduces bandwidth and often improves perceived latency:

func gzipMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        if !strings.Contains(r.Header.Get("Accept-Encoding"), "gzip") {
            next.ServeHTTP(w, r)
            return
        }

        gz := gzip.NewWriter(w)
        defer gz.Close()

        w.Header().Set("Content-Encoding", "gzip")
        next.ServeHTTP(&gzipResponseWriter{ResponseWriter: w, Writer: gz}, r)
    })
}

Benchmarking

Always benchmark before and after:

func BenchmarkHandler(b *testing.B) {
    handler := setupHandler()
    req := httptest.NewRequest("GET", "/api/orders", nil)

    b.ResetTimer()
    b.ReportAllocs()

    for i := 0; i < b.N; i++ {
        w := httptest.NewRecorder()
        handler.ServeHTTP(w, req)
    }
}

Run with:

go test -bench=. -benchmem -count=5 ./...

The -benchmem flag shows allocations per operation — often more important than raw speed.

Performance optimization is iterative. Profile, identify the bottleneck, fix it, measure again. The temptation to optimize everything at once is strong — resist it. Fix the biggest bottleneck first.