Optimizing Go Services Handling Millions of Requests
When your Go service handles millions of requests per day, small inefficiencies compound. A 1ms overhead per request becomes 16 minutes of wasted compute daily at 1M requests. Here’s where I look for gains.
HTTP Client Reuse
The number one mistake in Go HTTP performance: creating new clients per request.
// BAD: new transport per request, no connection reuse
func callService(url string) (*Response, error) {
client := &http.Client{Timeout: 5 * time.Second}
return client.Get(url)
}
// GOOD: shared client with connection pooling
var client = &http.Client{
Timeout: 10 * time.Second,
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 10,
IdleConnTimeout: 90 * time.Second,
DisableCompression: true, // Skip if you handle compression yourself
},
}
This alone can improve latency by 10-50ms per downstream call by reusing TCP connections and avoiding TLS handshakes.
Response Body: Always Close, Always Drain
A subtle connection pool killer:
resp, err := client.Get(url)
if err != nil {
return err
}
defer resp.Body.Close()
// If you don't read the full body, the connection can't be reused
io.Copy(io.Discard, resp.Body)
If you don’t drain the body, Go can’t return the connection to the pool. Under load, you exhaust connections and start creating new ones — killing throughput.
Zero-Allocation JSON Handling
Standard encoding/json allocates heavily. For hot paths:
// Use a buffer pool for encoding
var encoderPool = sync.Pool{
New: func() interface{} {
return new(bytes.Buffer)
},
}
func writeJSON(w http.ResponseWriter, v interface{}) error {
buf := encoderPool.Get().(*bytes.Buffer)
buf.Reset()
defer encoderPool.Put(buf)
if err := json.NewEncoder(buf).Encode(v); err != nil {
return err
}
w.Header().Set("Content-Type", "application/json")
_, err := w.Write(buf.Bytes())
return err
}
For read-heavy endpoints, consider caching the serialized bytes.
Database Query Optimization
The biggest wins are usually here. Patterns I use:
Batch reads instead of N+1:
// BAD: N+1 queries
for _, orderID := range orderIDs {
order, _ := db.GetOrder(ctx, orderID)
orders = append(orders, order)
}
// GOOD: single query
orders, _ := db.GetOrdersByIDs(ctx, orderIDs)
Use prepared statements for repeated queries:
type OrderRepo struct {
getByID *pgx.PreparedStatement
}
func NewOrderRepo(db *pgxpool.Pool) *OrderRepo {
// Prepare once at startup
db.Exec(ctx, "PREPARE get_order AS SELECT * FROM orders WHERE id = $1")
return &OrderRepo{}
}
Index your WHERE clauses. Sounds obvious, but I’ve found missing indexes on production tables more times than I can count:
-- Check for sequential scans
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 'abc' AND status = 'pending';
-- Add composite index
CREATE INDEX CONCURRENTLY idx_orders_customer_status ON orders (customer_id, status);
Reduce Allocations in Hot Paths
Use strings.Builder instead of concatenation:
// BAD: allocates on every concatenation
func buildKey(parts ...string) string {
result := ""
for _, p := range parts {
result += p + ":"
}
return result
}
// GOOD: single allocation
func buildKey(parts ...string) string {
var b strings.Builder
for i, p := range parts {
if i > 0 {
b.WriteByte(':')
}
b.WriteString(p)
}
return b.String()
}
Use strconv instead of fmt.Sprintf for simple conversions:
// Slow: uses reflection
key := fmt.Sprintf("user:%d", userID)
// Fast: direct conversion
key := "user:" + strconv.FormatInt(userID, 10)
Compression
For APIs returning large payloads, gzip reduces bandwidth and often improves perceived latency:
func gzipMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if !strings.Contains(r.Header.Get("Accept-Encoding"), "gzip") {
next.ServeHTTP(w, r)
return
}
gz := gzip.NewWriter(w)
defer gz.Close()
w.Header().Set("Content-Encoding", "gzip")
next.ServeHTTP(&gzipResponseWriter{ResponseWriter: w, Writer: gz}, r)
})
}
Benchmarking
Always benchmark before and after:
func BenchmarkHandler(b *testing.B) {
handler := setupHandler()
req := httptest.NewRequest("GET", "/api/orders", nil)
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
w := httptest.NewRecorder()
handler.ServeHTTP(w, req)
}
}
Run with:
go test -bench=. -benchmem -count=5 ./...
The -benchmem flag shows allocations per operation — often more important than raw speed.
Performance optimization is iterative. Profile, identify the bottleneck, fix it, measure again. The temptation to optimize everything at once is strong — resist it. Fix the biggest bottleneck first.