A Practical Journey from Application to Distributed Systems - Part 6
Reliability basics: Outbox pattern + why it matters

In Part 5, we changed QuickBite to an event-driven design. Orders now writes an order and publishes an OrderCreatedevent, and Inventory processes that event later. This is a better design than direct synchronous calls, but it still has an important reliability problem.
Right now, Orders does two separate things:
save the order in Postgres as
PENDINGpublish
OrderCreatedto Kafka
If saving to Postgres works but publishing to Kafka fails, the order gets stuck. It exists in the database, but no other service knows about it. That means the workflow stops halfway through. This is a classic reliability problem in distributed systems. In this part, we will solve it using the Outbox pattern.
What the Outbox pattern does
The Outbox pattern changes the workflow so that saving data and saving the event happen together.
Before (current flow)
Right now, the flow looks like this:
write the order to the database
publish the event to Kafka
That sounds fine at first, but there is a dangerous gap between those two steps.
If the database write succeeds but Kafka publish fails, the order is saved, but the event is lost. That leaves the system in an inconsistent state.
After (with the Outbox pattern)
With the Outbox pattern, the flow changes to this:
write the order to the database
write the event into a database table called
outbox_eventsdo both of those in the same database transaction
a background worker later reads from
outbox_eventsand publishes to Kafka
This is the key idea:
instead of publishing directly to Kafka inside the request flow, we first save the event safely in the database
We are not changing Inventory in this part. Inventory already works fine once it receives the event. The problem is “how to guarantee the event gets published”.
Step 1: Add migrations for the outbox table
The first thing we need for the Outbox pattern is a new database table to store events before they are published to Kafka.
Create this migration file:
services/orders/migrations/000002_outbox.up.sql
CREATE TABLE IF NOT EXISTS outbox_events (
id BIGSERIAL PRIMARY KEY,
event_id TEXT NOT NULL UNIQUE,
topic TEXT NOT NULL,
key TEXT NOT NULL,
payload BYTEA NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
published_at TIMESTAMPTZ NULL
);
CREATE INDEX IF NOT EXISTS idx_outbox_unpublished
ON outbox_events(id)
WHERE published_at IS NULL;
Now create the down migration:
services/orders/migrations/000002_outbox.down.sql
DROP TABLE IF EXISTS outbox_events;
What this table is for
This table stores events that still need to be published to Kafka.
Instead of trying to publish directly inside the request flow, we first write the event into outbox_events. Later, a background worker will read from this table and publish those events to Kafka.
So this table is basically a safe holding area for messages waiting to be published.
After this step, the Orders service has a place to store events safely in the database before they are sent to Kafka. That is the foundation of the Outbox pattern.
We have not changed the request flow yet, but now we have the table that makes the rest of the pattern possible.
Step 2: Add transaction support in Orders store
Right now, the Orders store probably inserts rows directly using the database pool. That works fine when we only insert one thing at a time. But for the Outbox pattern, we now need to do two inserts together:
insert the order row
insert the outbox event row
And we need both of those to happen inside the same database transaction.
That way:
either both inserts succeed
or neither one is saved
This is what makes the Outbox pattern reliable.
Open:
services/orders/internal/store/orders.go
Add these methods:
import "github.com/jackc/pgx/v5"
// Begin starts a DB transaction.
func (s *OrdersStore) Begin(ctx context.Context) (pgx.Tx, error) {
return s.db.Begin(ctx)
}
// CreatePendingTx inserts an order inside a transaction as PENDING.
func (s *OrdersStore) CreatePendingTx(ctx context.Context, tx pgx.Tx, userID, note, sku string, quantity int) (int64, error) {
var id int64
err := tx.QueryRow(ctx,
`INSERT INTO orders (user_id, note, status, sku, quantity)
VALUES (\(1, \)2, \(3, \)4, $5)
RETURNING id`,
userID, note, StatusPending, sku, quantity,
).Scan(&id)
return id, err
}
Keep your existing Create and GetByID methods. We are not replacing them. We are only adding transaction-aware versions.
Why we need transactions here
The Outbox pattern depends on one important guarantee:
the order row and the outbox event row must be saved together
Without a transaction, this could happen:
order insert succeeds
outbox insert fails
Now the order exists, but the event does not. That is exactly the kind of broken state we are trying to avoid. With a transaction, the database treats both inserts as one unit of work.
So the result is:
both are committed together
or both are rolled back together
Step 3: Create an outbox package in Orders
Now we will create a small package to manage outbox events inside the Orders service.
Open a new folder:
mkdir -p services/orders/internal/outbox
Then create this file:
services/orders/internal/outbox/outbox.go
package outbox
import (
"context"
"time"
"github.com/jackc/pgx/v5"
)
//This struct represents one row in the outbox_events table.
type Event struct {
ID int64 // database row ID
EventID string // unique ID of the event
Topic string // Kafka topic to publish to
Key string // Kafka message key
Payload []byte // raw event bytes, usually JSON in our case
}
//This function inserts a new row into outbox_events.
// The important part is that it takes a `pgx.Tx`, which means it runs inside an existing database transaction.
func InsertTx(ctx context.Context, tx pgx.Tx, evt Event) error {
_, err := tx.Exec(ctx,
`INSERT INTO outbox_events (event_id, topic, key, payload)
VALUES (\(1, \)2, \(3, \)4)`,
evt.EventID, evt.Topic, evt.Key, evt.Payload,
)
return err
}
// This function fetches a batch of unpublished outbox rows.
func FetchUnpublishedForUpdate(ctx context.Context, tx pgx.Tx, limit int) ([]Event, error) {
rows, err := tx.Query(ctx,
`SELECT id, event_id, topic, key, payload
FROM outbox_events
WHERE published_at IS NULL
ORDER BY id
LIMIT $1
FOR UPDATE SKIP LOCKED`, // This locks the selected rows so another worker cannot take the same rows at the same time. If some rows are already locked by another worker, PostgreSQL just skips them and returns other available rows.
limit,
)
if err != nil {
return nil, err
}
defer rows.Close()
var out []Event
for rows.Next() {
var e Event
if err := rows.Scan(&e.ID, &e.EventID, &e.Topic, &e.Key, &e.Payload); err != nil {
return nil, err
}
out = append(out, e)
}
return out, rows.Err()
}
//This function updates the outbox row after Kafka publishing succeeds.
func MarkPublishedTx(ctx context.Context, tx pgx.Tx, id int64) error {
_, err := tx.Exec(ctx,
`UPDATE outbox_events SET published_at = \(2 WHERE id = \)1`,
id, time.Now().UTC(),
)
return err
}
This package is the database helper for the Outbox pattern. It gives us three main operations:
insert a new outbox event inside a transaction
fetch unpublished outbox events
mark an event as published after Kafka publish succeeds
So instead of writing raw outbox SQL in many places, we keep it in one focused package.
Step 4: Add outbox publisher loop
Now that Orders can store unpublished events in the outbox_events table, we need something that actually reads those rows and publishes them to Kafka.
That job is handled by a background publisher loop.
Create this file:
services/orders/internal/outbox/publisher.go
package outbox
import (
"context"
"log"
"time"
"github.com/jackc/pgx/v5/pgxpool"
"github.com/segmentio/kafka-go"
)
type Publisher struct {
DB *pgxpool.Pool // to read and update rows in the outbox table
Writer *kafka.Writer //to publish messages to Kafka
}
// Every 300 milliseconds, the publisher wakes up and checks whether there are any unpublished events waiting in the outbox.
// Because outbox publishing is a background job. It should keep running while the service is running, and keep retrying unpublished events until they are successfully sent.
func (p *Publisher) Run(ctx context.Context) {
ticker := time.NewTicker(300 * time.Millisecond)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
p.publishBatch(ctx)
}
}
}
// This function does one batch of outbox publishing work.
//You can think of it as: “Try to publish up to 25 unpublished events right now.”
func (p *Publisher) publishBatch(ctx context.Context) {
// The publisher starts a database transaction before fetching rows.
tx, err := p.DB.Begin(ctx)
if err != nil {
log.Printf("outbox begin tx failed: %v", err)
return
}
defer func() { _ = tx.Rollback(ctx) }()
// This reads up to 25 unpublished outbox rows.
events, err := FetchUnpublishedForUpdate(ctx, tx, 25)
if err != nil {
log.Printf("outbox fetch failed: %v", err)
return
}
if len(events) == 0 {
_ = tx.Commit(ctx)
return
}
for _, e := range events {
// For every fetched outbox row. This sends the event to Kafka
err := p.Writer.WriteMessages(ctx, kafka.Message{
Topic: e.Topic,
Key: []byte(e.Key),
Value: e.Payload,
})
if err != nil {
// rollback => events stay unpublished, retry later
log.Printf("outbox kafka publish failed: %v", err)
return
}
// If Kafka publish succeeds, we then do. This updates published_at in the database.
if err := MarkPublishedTx(ctx, tx, e.ID); err != nil {
log.Printf("outbox mark published failed: %v", err)
return
}
}
// This final commit makes all those changes permanent.
if err := tx.Commit(ctx); err != nil {
log.Printf("outbox commit failed: %v", err)
}
}
This file adds the background worker that makes the Outbox pattern actually work.
Its job is simple:
look for unpublished events in
outbox_eventspublish them to Kafka
mark them as published
So instead of publishing directly during the HTTP request, Orders now has a dedicated background process for publishing safely.
Step 5: Update create-order handler to write to Outbox instead of Kafka directly
Where: services/orders/internal/http/http.go
Right now, the create-order handler publishes the event to Kafka directly using:
kafkabus.PublishOrderCreated(...)
In this part, we replace that direct publish with the Outbox flow.
The new sequence becomes:
start a database transaction
insert the order row
insert the outbox event row
commit the transaction
return the response
That means the HTTP request no longer depends on Kafka being available immediately.
Now update the “create order” logic to:
// We begin a database transaction
tx, err := s.store.Begin(ctx)
if err != nil {
http.Error(w, "db error", http.StatusInternalServerError)
return
}
defer func() { _ = tx.Rollback(ctx) }()
// This creates the order row inside the transaction. At this point, we have a real order ID that we can use in the event.
id, err := s.store.CreatePendingTx(ctx, tx, req.UserID, req.Note, req.Sku, req.Quantity)
if err != nil {
http.Error(w, "db error", http.StatusInternalServerError)
return
}
// This creates the event payload we eventually want to publish to Kafka.
evt := kafkabus.OrderCreated{
EventID: uuid.New().String(),
Type: "OrderCreated",
Time: time.Now().UTC(),
OrderID: id,
UserID: req.UserID,
Sku: req.Sku,
Quantity: int32(req.Quantity),
}
// This turns the event struct into bytes so it can be stored in the outbox_events table.
payload, _ := json.Marshal(evt)
// This stores the event inside the outbox_events table in the same transaction as the order row
if err := outbox.InsertTx(ctx, tx, outbox.Event{
EventID: evt.EventID,
Topic: "orders.v1",
Key: strconv.FormatInt(id, 10),
Payload: payload,
}); err != nil {
http.Error(w, "db error", http.StatusInternalServerError)
return
}
// Only after both inserts succeed do we commit the transaction.
if err := tx.Commit(ctx); err != nil {
http.Error(w, "db error", http.StatusInternalServerError)
return
}
writeJSON(w, http.StatusAccepted, createOrderResponse{ID: id})
In summary, we remove the direct Kafka publish from the create-order handler. Instead, we store the order row and the event row together inside one transaction, then return 202 Accepted. The actual Kafka publish is now handled later by the outbox worker.
Step 6: Start the outbox publisher from Orders main.go
Now that Orders can store events in the outbox_events table, we need to start the background publisher when the service boots.
Open: services/orders/cmd/orders/main.go
After creating the Kafka bus:
bus := kafkabus.New(...)
start the Outbox publisher:
pub := &outbox.Publisher{
DB: pool,
Writer: bus.OrdersWriter,
}
go pub.Run(ctx)
This creates the background worker that continuously checks the outbox_events table for unpublished events.
Once it finds them, it:
publishes them to Kafka
marks them as published in the database
How to test Outbox
I encountered few problems which you might also. So there are few fixed which we need to do:
- We need to remove
Topicfrom the outbox message (the writer already targetsorders.v1) in/orders/internal/outbox/publisher.go
err := p.Writer.WriteMessages(ctx, kafka.Message{
// Topic: -- removed
Key: []byte(e.Key),
Value: e.Payload,
})
- Add red panda dependency in
inventoryservice
inventory:
...
depends_on:
redpanda:
condition: service_healthy
That's it. Now lets go ahead and test it
This is the most important part of this section, because it proves that the Outbox pattern is actually solving the reliability problem we introduced earlier.
We want to verify two things:
the normal flow still works
events are not lost even when Kafka is temporarily down
Test 1: Normal flow still works
First, make sure the normal event-driven flow still behaves as expected.
Create an order and confirm that it eventually becomes:
CONFIRMEDor
CANCELLED
This tells us that adding the Outbox pattern did not break the existing async workflow.
Test 2: Kafka-down test
This is the test that really proves the Outbox pattern is working.
Step 1: Stop Redpanda
docker compose stop redpanda
Now Kafka is unavailable.
Step 2: Create an order
curl -X POST http://localhost:8080/v1/orders \
-H 'Content-Type: application/json' \
-d '{"userId":"u1","sku":"burger","quantity":1}'
The request should still succeed. That means:
the order is created in Postgres
the outbox event is also saved in Postgres
but the event cannot be published yet because Kafka is down
So at this point, the order should exist in the database as:
PENDING
And Inventory will not process it yet, because it never received the event.
This is the key difference from the old design.
Step 3: Start Redpanda again
docker compose start redpanda
Once Kafka comes back:
- The outbox publisher wakes up, it reads the unpublished event from
outbox_events, it publishes that event to Kafka. Inventory consumes it. Inventory processes the stock reservation. Orders later receives the result and updates the order status
So the order should eventually move from:
PENDING -> CONFIRMED/ CANCELLED
Conclusion
At this point, QuickBite is starting to feel much more like a real production-style backend.
We solved one of the most common reliability problems in event-driven systems: what happens when Kafka is unavailable at the wrong time. With the Outbox pattern in place, Orders no longer risks losing the event just because the broker is temporarily down. The order and the event are stored safely together, and publishing can happen later. That is a big step forward.
But reliability is not finished yet. The next challenges are duplicates, retries, and bad messages. In the next part, we will introduce idempotency so repeated deliveries do not corrupt state, and we will also start making Inventory durable so it can survive restarts properly.




