← back to posts

Reducing Latency in Go APIs: Lessons from Production

Our API’s P99 latency was 800ms. Users were complaining. After two weeks of profiling and optimization, we got it under 100ms. Here’s exactly what we did.

Step 1: Measure Everything

Before touching code, we instrumented every layer:

func (s *Service) GetOrder(ctx context.Context, id string) (*Order, error) {
    defer trackLatency("GetOrder", time.Now())

    order, err := s.cache.Get(ctx, id)
    if err == nil {
        cacheHits.Inc()
        return order, nil
    }
    cacheMisses.Inc()

    order, err = s.repo.GetOrder(ctx, id)
    if err != nil {
        return nil, err
    }

    s.cache.Set(ctx, id, order, 5*time.Minute)
    return order, nil
}

The breakdown revealed:

  • 60% of time: database queries
  • 25% of time: downstream HTTP calls
  • 10% of time: JSON serialization
  • 5% of time: application logic

Step 2: Fix the Database

The biggest offender was a missing composite index. One query was doing a sequential scan on a 50M-row table:

-- Before: 400ms (sequential scan)
SELECT * FROM orders WHERE customer_id = $1 AND status = 'active' ORDER BY created_at DESC LIMIT 10;

-- After adding index: 2ms
CREATE INDEX CONCURRENTLY idx_orders_customer_active
ON orders (customer_id, created_at DESC) WHERE status = 'active';

Partial indexes are criminally underused. If you always filter on status = 'active', index only those rows.

We also found N+1 queries hiding in a loop:

// Before: 50 queries for 50 orders
for _, order := range orders {
    items, _ := repo.GetItemsByOrderID(ctx, order.ID)
    order.Items = items
}

// After: 1 query
itemsByOrder, _ := repo.GetItemsByOrderIDs(ctx, orderIDs)
for _, order := range orders {
    order.Items = itemsByOrder[order.ID]
}

This alone cut 300ms off the P99.

Step 3: Fix Downstream Calls

Our service called three downstream services sequentially. Two of them were independent:

// Before: sequential, ~200ms total
user, _ := userService.Get(ctx, userID)          // 80ms
preferences, _ := prefService.Get(ctx, userID)   // 70ms
history, _ := historyService.Get(ctx, userID)     // 50ms

// After: parallel independent calls, ~80ms total
g, ctx := errgroup.WithContext(ctx)

var user *User
var preferences *Preferences
var history *History

g.Go(func() error {
    var err error
    user, err = userService.Get(ctx, userID)
    return err
})
g.Go(func() error {
    var err error
    preferences, err = prefService.Get(ctx, userID)
    return err
})
g.Go(func() error {
    var err error
    history, err = historyService.Get(ctx, userID)
    return err
})

if err := g.Wait(); err != nil {
    return nil, err
}

Parallel calls reduced the downstream time from 200ms to 80ms.

Step 4: Add Caching

For data that changes infrequently, cache aggressively:

type TieredCache struct {
    local  *lru.Cache    // L1: in-process, ~1ms
    redis  *redis.Client // L2: network, ~5ms
}

func (c *TieredCache) Get(ctx context.Context, key string) ([]byte, error) {
    // L1
    if val, ok := c.local.Get(key); ok {
        return val.([]byte), nil
    }

    // L2
    val, err := c.redis.Get(ctx, key).Bytes()
    if err == nil {
        c.local.Add(key, val)
        return val, nil
    }

    return nil, ErrCacheMiss
}

In-process caching eliminated 40% of Redis calls. For our most-hit endpoints, cache hit rate was 85%.

Step 5: Async What You Can

Some work doesn’t need to happen in the request path:

// Before: send email synchronously (adds 100-500ms)
func (s *Service) CreateOrder(ctx context.Context, order Order) error {
    if err := s.repo.Create(ctx, order); err != nil {
        return err
    }
    return s.emailService.SendConfirmation(ctx, order) // Slow!
}

// After: publish event, handle email asynchronously
func (s *Service) CreateOrder(ctx context.Context, order Order) error {
    if err := s.repo.Create(ctx, order); err != nil {
        return err
    }
    s.events.Publish(ctx, OrderCreatedEvent{OrderID: order.ID})
    return nil // Return immediately
}

If the user doesn’t need to see the result in this response, don’t make them wait.

Results

MetricBeforeAfter
P50200ms25ms
P99800ms95ms
P99.92.5s200ms

The fixes weren’t exotic. Missing index, N+1 queries, sequential calls that should be parallel, missing cache, synchronous work that should be async. Boring fundamentals — dramatic results.