How I Design Go Microservices for High Throughput Systems
When your service needs to handle 5K+ requests per second, you can’t afford to be sloppy with resource management. Here’s how I structure Go microservices for high throughput.
Project Structure
I follow a clean architecture that separates concerns without over-abstracting:
service/
├── cmd/server/main.go # Entry point, wiring
├── internal/
│ ├── handler/ # HTTP/gRPC handlers
│ ├── service/ # Business logic
│ ├── repository/ # Data access
│ ├── queue/ # Message queue producers/consumers
│ └── middleware/ # Auth, logging, rate limiting
├── pkg/ # Shared libraries
└── config/ # Configuration
The rule: handlers call services, services call repositories. Never skip a layer.
Connection Pooling Done Right
The first thing that breaks at scale is connections — database, Redis, HTTP clients.
func newDBPool(cfg DatabaseConfig) (*pgxpool.Pool, error) {
config, err := pgxpool.ParseConfig(cfg.URL)
if err != nil {
return nil, err
}
config.MaxConns = 50
config.MinConns = 10
config.MaxConnLifetime = 30 * time.Minute
config.MaxConnIdleTime = 5 * time.Minute
config.HealthCheckPeriod = 30 * time.Second
pool, err := pgxpool.NewWithConfig(context.Background(), config)
if err != nil {
return nil, err
}
return pool, nil
}
Key numbers:
- MaxConns: Match your expected concurrency. Too high and you overwhelm the database. Too low and goroutines block waiting.
- MaxConnLifetime: Rotate connections to handle DNS changes and database failovers.
- HealthCheckPeriod: Detect dead connections before they cause request failures.
Same principles apply to Redis and HTTP clients:
var httpClient = &http.Client{
Timeout: 10 * time.Second,
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 20,
IdleConnTimeout: 90 * time.Second,
},
}
Never use http.DefaultClient in production. It has no timeout.
Request Batching
When a handler triggers N downstream calls, batch them:
func (s *Service) GetUsers(ctx context.Context, ids []string) ([]User, error) {
// Don't do N queries
// Do one query with IN clause
rows, err := s.db.Query(ctx,
"SELECT id, name, email FROM users WHERE id = ANY($1)",
ids,
)
if err != nil {
return nil, err
}
defer rows.Close()
var users []User
for rows.Next() {
var u User
rows.Scan(&u.ID, &u.Name, &u.Email)
users = append(users, u)
}
return users, nil
}
For downstream HTTP calls, use errgroup for bounded concurrency:
func (s *Service) EnrichOrders(ctx context.Context, orders []Order) error {
g, ctx := errgroup.WithContext(ctx)
g.SetLimit(10) // Max 10 concurrent calls
for i := range orders {
order := &orders[i]
g.Go(func() error {
user, err := s.userClient.GetUser(ctx, order.UserID)
if err != nil {
return err
}
order.UserName = user.Name
return nil
})
}
return g.Wait()
}
Backpressure
When your service is overwhelmed, it should push back rather than fall over.
func newServer(handler http.Handler, maxConcurrent int) *http.Server {
sem := make(chan struct{}, maxConcurrent)
wrapped := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
select {
case sem <- struct{}{}:
defer func() { <-sem }()
handler.ServeHTTP(w, r)
default:
http.Error(w, "service overloaded", http.StatusServiceUnavailable)
}
})
return &http.Server{
Handler: wrapped,
ReadTimeout: 5 * time.Second,
WriteTimeout: 10 * time.Second,
IdleTimeout: 120 * time.Second,
}
}
When all semaphore slots are taken, new requests get 503 immediately instead of queuing and eventually timing out.
Caching Hot Paths
For data that’s read far more than written, in-process caching eliminates network round trips entirely:
type CachedRepo struct {
db *pgxpool.Pool
cache *sync.Map
ttl time.Duration
}
type cacheEntry struct {
value interface{}
expiresAt time.Time
}
func (r *CachedRepo) GetConfig(ctx context.Context, key string) (string, error) {
if entry, ok := r.cache.Load(key); ok {
e := entry.(*cacheEntry)
if time.Now().Before(e.expiresAt) {
return e.value.(string), nil
}
r.cache.Delete(key)
}
val, err := r.db.QueryRow(ctx,
"SELECT value FROM config WHERE key = $1", key,
).Scan(&val)
if err != nil {
return "", err
}
r.cache.Store(key, &cacheEntry{
value: val,
expiresAt: time.Now().Add(r.ttl),
})
return val, nil
}
Use this sparingly — only for data that can be stale for your TTL without causing issues.
Structured Logging
Every log line must be machine-parseable and include request context:
func LoggingMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
wrapped := &responseWriter{ResponseWriter: w, statusCode: 200}
next.ServeHTTP(wrapped, r)
slog.Info("request",
"method", r.Method,
"path", r.URL.Path,
"status", wrapped.statusCode,
"duration_ms", time.Since(start).Milliseconds(),
"correlation_id", r.Header.Get("X-Correlation-ID"),
)
})
}
Health Checks
Load balancers need to know if your service is healthy:
func (s *Server) healthHandler(w http.ResponseWriter, r *http.Request) {
ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
defer cancel()
checks := map[string]error{
"database": s.db.Ping(ctx),
"redis": s.redis.Ping(ctx).Err(),
}
healthy := true
status := make(map[string]string)
for name, err := range checks {
if err != nil {
status[name] = err.Error()
healthy = false
} else {
status[name] = "ok"
}
}
code := http.StatusOK
if !healthy {
code = http.StatusServiceUnavailable
}
w.WriteHeader(code)
json.NewEncoder(w).Encode(status)
}
High throughput isn’t just about fast code. It’s about managing resources correctly, failing gracefully, and having visibility into what’s happening.