How I Design Go Microservices for High Throughput Systems

When your service needs to handle 5K+ requests per second, you can’t afford to be sloppy with resource management. Here’s how I structure Go microservices for high throughput.

Project Structure

I follow a clean architecture that separates concerns without over-abstracting:

service/
├── cmd/server/main.go      # Entry point, wiring
├── internal/
│   ├── handler/             # HTTP/gRPC handlers
│   ├── service/             # Business logic
│   ├── repository/          # Data access
│   ├── queue/               # Message queue producers/consumers
│   └── middleware/           # Auth, logging, rate limiting
├── pkg/                     # Shared libraries
└── config/                  # Configuration

The rule: handlers call services, services call repositories. Never skip a layer.

Connection Pooling Done Right

The first thing that breaks at scale is connections — database, Redis, HTTP clients.

func newDBPool(cfg DatabaseConfig) (*pgxpool.Pool, error) {
    config, err := pgxpool.ParseConfig(cfg.URL)
    if err != nil {
        return nil, err
    }

    config.MaxConns = 50
    config.MinConns = 10
    config.MaxConnLifetime = 30 * time.Minute
    config.MaxConnIdleTime = 5 * time.Minute
    config.HealthCheckPeriod = 30 * time.Second

    pool, err := pgxpool.NewWithConfig(context.Background(), config)
    if err != nil {
        return nil, err
    }

    return pool, nil
}

Key numbers:

MaxConns: Match your expected concurrency. Too high and you overwhelm the database. Too low and goroutines block waiting.
MaxConnLifetime: Rotate connections to handle DNS changes and database failovers.
HealthCheckPeriod: Detect dead connections before they cause request failures.

Same principles apply to Redis and HTTP clients:

var httpClient = &http.Client{
    Timeout: 10 * time.Second,
    Transport: &http.Transport{
        MaxIdleConns:        100,
        MaxIdleConnsPerHost: 20,
        IdleConnTimeout:     90 * time.Second,
    },
}

Never use http.DefaultClient in production. It has no timeout.

Request Batching

When a handler triggers N downstream calls, batch them:

func (s *Service) GetUsers(ctx context.Context, ids []string) ([]User, error) {
    // Don't do N queries
    // Do one query with IN clause
    rows, err := s.db.Query(ctx,
        "SELECT id, name, email FROM users WHERE id = ANY($1)",
        ids,
    )
    if err != nil {
        return nil, err
    }
    defer rows.Close()

    var users []User
    for rows.Next() {
        var u User
        rows.Scan(&u.ID, &u.Name, &u.Email)
        users = append(users, u)
    }
    return users, nil
}

For downstream HTTP calls, use errgroup for bounded concurrency:

func (s *Service) EnrichOrders(ctx context.Context, orders []Order) error {
    g, ctx := errgroup.WithContext(ctx)
    g.SetLimit(10) // Max 10 concurrent calls

    for i := range orders {
        order := &orders[i]
        g.Go(func() error {
            user, err := s.userClient.GetUser(ctx, order.UserID)
            if err != nil {
                return err
            }
            order.UserName = user.Name
            return nil
        })
    }

    return g.Wait()
}

Backpressure

When your service is overwhelmed, it should push back rather than fall over.

func newServer(handler http.Handler, maxConcurrent int) *http.Server {
    sem := make(chan struct{}, maxConcurrent)

    wrapped := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        select {
        case sem <- struct{}{}:
            defer func() { <-sem }()
            handler.ServeHTTP(w, r)
        default:
            http.Error(w, "service overloaded", http.StatusServiceUnavailable)
        }
    })

    return &http.Server{
        Handler:      wrapped,
        ReadTimeout:  5 * time.Second,
        WriteTimeout: 10 * time.Second,
        IdleTimeout:  120 * time.Second,
    }
}

When all semaphore slots are taken, new requests get 503 immediately instead of queuing and eventually timing out.

Caching Hot Paths

For data that’s read far more than written, in-process caching eliminates network round trips entirely:

type CachedRepo struct {
    db    *pgxpool.Pool
    cache *sync.Map
    ttl   time.Duration
}

type cacheEntry struct {
    value     interface{}
    expiresAt time.Time
}

func (r *CachedRepo) GetConfig(ctx context.Context, key string) (string, error) {
    if entry, ok := r.cache.Load(key); ok {
        e := entry.(*cacheEntry)
        if time.Now().Before(e.expiresAt) {
            return e.value.(string), nil
        }
        r.cache.Delete(key)
    }

    val, err := r.db.QueryRow(ctx,
        "SELECT value FROM config WHERE key = $1", key,
    ).Scan(&val)
    if err != nil {
        return "", err
    }

    r.cache.Store(key, &cacheEntry{
        value:     val,
        expiresAt: time.Now().Add(r.ttl),
    })
    return val, nil
}

Use this sparingly — only for data that can be stale for your TTL without causing issues.

Structured Logging

Every log line must be machine-parseable and include request context:

func LoggingMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        start := time.Now()
        wrapped := &responseWriter{ResponseWriter: w, statusCode: 200}

        next.ServeHTTP(wrapped, r)

        slog.Info("request",
            "method", r.Method,
            "path", r.URL.Path,
            "status", wrapped.statusCode,
            "duration_ms", time.Since(start).Milliseconds(),
            "correlation_id", r.Header.Get("X-Correlation-ID"),
        )
    })
}

Health Checks

Load balancers need to know if your service is healthy:

func (s *Server) healthHandler(w http.ResponseWriter, r *http.Request) {
    ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
    defer cancel()

    checks := map[string]error{
        "database": s.db.Ping(ctx),
        "redis":    s.redis.Ping(ctx).Err(),
    }

    healthy := true
    status := make(map[string]string)
    for name, err := range checks {
        if err != nil {
            status[name] = err.Error()
            healthy = false
        } else {
            status[name] = "ok"
        }
    }

    code := http.StatusOK
    if !healthy {
        code = http.StatusServiceUnavailable
    }

    w.WriteHeader(code)
    json.NewEncoder(w).Encode(status)
}

High throughput isn’t just about fast code. It’s about managing resources correctly, failing gracefully, and having visibility into what’s happening.