← back to posts

Lessons from Building Event-Driven Microservices in Go

Event-driven architecture sounds elegant on whiteboards. In production, it’s a minefield of ordering issues, duplicate events, and debugging nightmares. Here’s what I learned building event-driven microservices in Go.

Why Events?

We moved to events because synchronous REST calls between services were creating tight coupling. Service A needed to know about services B, C, and D. Deploying A meant coordinating with three other teams.

Events invert this dependency. A publishes “OrderCreated.” Whoever cares, subscribes. A doesn’t know or care who’s listening.

Event Schema Design

The most important decision you’ll make. Get this wrong and you’ll be fighting schema evolution for years.

type Event struct {
    ID        string          `json:"id"`
    Type      string          `json:"type"`
    Source    string          `json:"source"`
    Time      time.Time       `json:"time"`
    Data      json.RawMessage `json:"data"`
    Version   int             `json:"version"`
}

type OrderCreated struct {
    OrderID    string `json:"order_id"`
    CustomerID string `json:"customer_id"`
    TotalCents int64  `json:"total_cents"`
    Currency   string `json:"currency"`
}

Rules I follow:

  • Always version events. You will change the schema. Consumers need to know which version they’re handling.
  • Use past tense for event names. OrderCreated, not CreateOrder. Events describe facts that happened, not commands.
  • Include enough data. “Fat events” (with full entity state) are almost always better than “thin events” (just IDs). Thin events force consumers to make synchronous calls back to the source, defeating the purpose.
  • Never remove fields. Only add. Treat events like a public API with backward compatibility guarantees.

The Consumer Pattern

Every consumer follows the same structure:

type Consumer struct {
    handler    func(ctx context.Context, event Event) error
    store      IdempotencyStore
    subscriber Subscriber
}

func (c *Consumer) Start(ctx context.Context) error {
    return c.subscriber.Subscribe(ctx, func(msg Message) error {
        var event Event
        if err := json.Unmarshal(msg.Body, &event); err != nil {
            slog.Error("unmarshal event", "error", err)
            msg.Nack()
            return nil
        }

        // Idempotency check
        processed, err := c.store.IsProcessed(ctx, event.ID)
        if err != nil {
            return fmt.Errorf("idempotency check: %w", err)
        }
        if processed {
            slog.Debug("duplicate event, skipping", "event_id", event.ID)
            msg.Ack()
            return nil
        }

        // Process
        if err := c.handler(ctx, event); err != nil {
            slog.Error("handle event", "event_id", event.ID, "error", err)
            msg.Nack()
            return nil
        }

        // Mark processed
        c.store.MarkProcessed(ctx, event.ID)
        msg.Ack()
        return nil
    })
}

Three guarantees:

  1. Idempotent processing — duplicate events are silently ignored
  2. Explicit ack/nack — failed events are retried by the broker
  3. Structured error handling — parse errors nack immediately, processing errors get retried

Eventual Consistency Is Real

The hardest mental shift: data across services will be inconsistent for some window of time. A user might see their order as “confirmed” in the orders service but “pending” in the shipping service.

You deal with this by:

  • Accepting it in your UI. Show “processing” states. Don’t pretend things are instant.
  • Designing for convergence. Every consumer must eventually reach the correct state, even if events arrive out of order.
  • Monitoring lag. Measure the time between event publish and consumer processing. Alert when it exceeds your SLA.
func publishWithMetrics(ctx context.Context, event Event) error {
    start := time.Now()
    err := publisher.Publish(ctx, event)
    publishLatency.Observe(time.Since(start).Seconds())
    if err != nil {
        publishErrors.Inc()
    }
    return err
}

Handling Out-of-Order Events

Events can arrive out of order. OrderShipped might arrive before OrderCreated if they’re published to different partitions or through different paths.

Our approach: store the event and reconcile later.

func handleOrderShipped(ctx context.Context, event Event) error {
    var data OrderShipped
    json.Unmarshal(event.Data, &data)

    order, err := repo.GetOrder(ctx, data.OrderID)
    if errors.Is(err, ErrNotFound) {
        // Order hasn't been created in our view yet
        // Store the event and process it when OrderCreated arrives
        return repo.StoreUnprocessedEvent(ctx, event)
    }

    order.Status = "shipped"
    order.ShippedAt = data.ShippedAt
    return repo.UpdateOrder(ctx, order)
}

When OrderCreated arrives, we check for any stored events and replay them in order.

Dead Letter Queues Save Lives

Some events will always fail. Invalid data, bugs in consumers, missing dependencies. Without a DLQ, these poison events block your entire queue.

func (c *Consumer) handleWithDLQ(ctx context.Context, msg Message, maxRetries int) {
    retryCount := msg.Headers.GetInt("x-retry-count")

    if retryCount >= maxRetries {
        slog.Error("max retries exceeded, moving to DLQ",
            "event_id", msg.ID,
            "retry_count", retryCount,
        )
        c.dlqPublisher.Publish(ctx, msg)
        msg.Ack()
        return
    }

    if err := c.handler(ctx, msg); err != nil {
        msg.Headers.Set("x-retry-count", retryCount+1)
        msg.Nack()
    } else {
        msg.Ack()
    }
}

We review DLQ messages daily. Sometimes it’s a bug we need to fix and replay. Sometimes it’s genuinely invalid data that should be discarded.

Testing Event-Driven Systems

Unit testing individual handlers is straightforward. Integration testing the full event flow is hard.

What works for us:

  • In-memory event bus for tests. Same interface as production, synchronous execution.
  • Contract tests for event schemas. Producer and consumer agree on the schema. Tests verify both sides are compatible.
  • End-to-end tests with real brokers in CI. Docker Compose with RabbitMQ. Slow but catches real issues.
func TestOrderFlow(t *testing.T) {
    bus := NewInMemoryBus()
    orderService := NewOrderService(bus)
    shippingService := NewShippingService(bus)

    bus.Subscribe("OrderCreated", shippingService.HandleOrderCreated)

    order, err := orderService.CreateOrder(ctx, input)
    require.NoError(t, err)

    // In-memory bus processes synchronously
    shipment, err := shippingService.GetShipment(ctx, order.ID)
    require.NoError(t, err)
    assert.Equal(t, "pending", shipment.Status)
}

Key Takeaways

  • Fat events over thin events. Include enough data for consumers to work independently.
  • Idempotency is non-negotiable. Every consumer must handle duplicates.
  • Design for out-of-order delivery. It will happen.
  • Monitor consumer lag. Eventual consistency has an SLA.
  • DLQs are required. Poison messages will block your queues.

Event-driven systems trade implementation complexity for operational flexibility. The complexity is real — but so is the flexibility. Just make sure you’re ready for both.