Lessons from Building Event-Driven Microservices in Go
Event-driven architecture sounds elegant on whiteboards. In production, it’s a minefield of ordering issues, duplicate events, and debugging nightmares. Here’s what I learned building event-driven microservices in Go.
Why Events?
We moved to events because synchronous REST calls between services were creating tight coupling. Service A needed to know about services B, C, and D. Deploying A meant coordinating with three other teams.
Events invert this dependency. A publishes “OrderCreated.” Whoever cares, subscribes. A doesn’t know or care who’s listening.
Event Schema Design
The most important decision you’ll make. Get this wrong and you’ll be fighting schema evolution for years.
type Event struct {
ID string `json:"id"`
Type string `json:"type"`
Source string `json:"source"`
Time time.Time `json:"time"`
Data json.RawMessage `json:"data"`
Version int `json:"version"`
}
type OrderCreated struct {
OrderID string `json:"order_id"`
CustomerID string `json:"customer_id"`
TotalCents int64 `json:"total_cents"`
Currency string `json:"currency"`
}
Rules I follow:
- Always version events. You will change the schema. Consumers need to know which version they’re handling.
- Use past tense for event names.
OrderCreated, notCreateOrder. Events describe facts that happened, not commands. - Include enough data. “Fat events” (with full entity state) are almost always better than “thin events” (just IDs). Thin events force consumers to make synchronous calls back to the source, defeating the purpose.
- Never remove fields. Only add. Treat events like a public API with backward compatibility guarantees.
The Consumer Pattern
Every consumer follows the same structure:
type Consumer struct {
handler func(ctx context.Context, event Event) error
store IdempotencyStore
subscriber Subscriber
}
func (c *Consumer) Start(ctx context.Context) error {
return c.subscriber.Subscribe(ctx, func(msg Message) error {
var event Event
if err := json.Unmarshal(msg.Body, &event); err != nil {
slog.Error("unmarshal event", "error", err)
msg.Nack()
return nil
}
// Idempotency check
processed, err := c.store.IsProcessed(ctx, event.ID)
if err != nil {
return fmt.Errorf("idempotency check: %w", err)
}
if processed {
slog.Debug("duplicate event, skipping", "event_id", event.ID)
msg.Ack()
return nil
}
// Process
if err := c.handler(ctx, event); err != nil {
slog.Error("handle event", "event_id", event.ID, "error", err)
msg.Nack()
return nil
}
// Mark processed
c.store.MarkProcessed(ctx, event.ID)
msg.Ack()
return nil
})
}
Three guarantees:
- Idempotent processing — duplicate events are silently ignored
- Explicit ack/nack — failed events are retried by the broker
- Structured error handling — parse errors nack immediately, processing errors get retried
Eventual Consistency Is Real
The hardest mental shift: data across services will be inconsistent for some window of time. A user might see their order as “confirmed” in the orders service but “pending” in the shipping service.
You deal with this by:
- Accepting it in your UI. Show “processing” states. Don’t pretend things are instant.
- Designing for convergence. Every consumer must eventually reach the correct state, even if events arrive out of order.
- Monitoring lag. Measure the time between event publish and consumer processing. Alert when it exceeds your SLA.
func publishWithMetrics(ctx context.Context, event Event) error {
start := time.Now()
err := publisher.Publish(ctx, event)
publishLatency.Observe(time.Since(start).Seconds())
if err != nil {
publishErrors.Inc()
}
return err
}
Handling Out-of-Order Events
Events can arrive out of order. OrderShipped might arrive before OrderCreated if they’re published to different partitions or through different paths.
Our approach: store the event and reconcile later.
func handleOrderShipped(ctx context.Context, event Event) error {
var data OrderShipped
json.Unmarshal(event.Data, &data)
order, err := repo.GetOrder(ctx, data.OrderID)
if errors.Is(err, ErrNotFound) {
// Order hasn't been created in our view yet
// Store the event and process it when OrderCreated arrives
return repo.StoreUnprocessedEvent(ctx, event)
}
order.Status = "shipped"
order.ShippedAt = data.ShippedAt
return repo.UpdateOrder(ctx, order)
}
When OrderCreated arrives, we check for any stored events and replay them in order.
Dead Letter Queues Save Lives
Some events will always fail. Invalid data, bugs in consumers, missing dependencies. Without a DLQ, these poison events block your entire queue.
func (c *Consumer) handleWithDLQ(ctx context.Context, msg Message, maxRetries int) {
retryCount := msg.Headers.GetInt("x-retry-count")
if retryCount >= maxRetries {
slog.Error("max retries exceeded, moving to DLQ",
"event_id", msg.ID,
"retry_count", retryCount,
)
c.dlqPublisher.Publish(ctx, msg)
msg.Ack()
return
}
if err := c.handler(ctx, msg); err != nil {
msg.Headers.Set("x-retry-count", retryCount+1)
msg.Nack()
} else {
msg.Ack()
}
}
We review DLQ messages daily. Sometimes it’s a bug we need to fix and replay. Sometimes it’s genuinely invalid data that should be discarded.
Testing Event-Driven Systems
Unit testing individual handlers is straightforward. Integration testing the full event flow is hard.
What works for us:
- In-memory event bus for tests. Same interface as production, synchronous execution.
- Contract tests for event schemas. Producer and consumer agree on the schema. Tests verify both sides are compatible.
- End-to-end tests with real brokers in CI. Docker Compose with RabbitMQ. Slow but catches real issues.
func TestOrderFlow(t *testing.T) {
bus := NewInMemoryBus()
orderService := NewOrderService(bus)
shippingService := NewShippingService(bus)
bus.Subscribe("OrderCreated", shippingService.HandleOrderCreated)
order, err := orderService.CreateOrder(ctx, input)
require.NoError(t, err)
// In-memory bus processes synchronously
shipment, err := shippingService.GetShipment(ctx, order.ID)
require.NoError(t, err)
assert.Equal(t, "pending", shipment.Status)
}
Key Takeaways
- Fat events over thin events. Include enough data for consumers to work independently.
- Idempotency is non-negotiable. Every consumer must handle duplicates.
- Design for out-of-order delivery. It will happen.
- Monitor consumer lag. Eventual consistency has an SLA.
- DLQs are required. Poison messages will block your queues.
Event-driven systems trade implementation complexity for operational flexibility. The complexity is real — but so is the flexibility. Just make sure you’re ready for both.