Saga (Distributed Transaction Pattern)

Distributed systems break transactions apart.

In a monolith, a database transaction can wrap multiple changes. If something fails, everything rolls back.

In a microservices architecture, that guarantee disappears. Each service owns its own database. Each change is local. There is no global transaction manager coordinating atomic commits across services.

Now imagine a workflow that spans multiple services.

An order is created.
Inventory is reserved.
Payment is charged.
Shipping is scheduled.

If the payment fails after inventory has been reserved, what happens next?

Without coordination, the system can drift into partial completion. One service reflects the change. Another does not.

The Saga pattern exists to manage this kind of distributed workflow.

A Sequence of Local Transactions

A saga breaks a large, cross-service operation into a sequence of smaller, local transactions.

Each service performs its own update independently. After completing its step, it emits an event or triggers the next action in the sequence.

If a later step fails, the system does not attempt to roll back with a global transaction. Instead, it executes compensating actions.

If payment fails, inventory is released.
If shipping fails, payment is refunded.
If a later step cannot complete, earlier steps are undone through explicit logic.

The workflow becomes a chain of forward actions and compensations.

Coordination Styles

Sagas can be orchestrated or choreographed.

In orchestration, a central coordinator directs each step. It decides what action comes next and which compensation to execute on failure.

In choreography, services react to events. Each service listens for relevant events and triggers its own actions or compensations accordingly.

Both approaches aim to maintain business consistency across independent systems.

Embracing Imperfection

Sagas introduce complexity because distributed transactions are inherently complex.

Failure handling must be explicit. Compensating actions must be carefully designed. Edge cases multiply. Time becomes a factor — what happens if a service is unavailable during compensation?

Reasoning about the full state of the system becomes more difficult. Partial progress may exist temporarily while compensations execute.

Yet the alternative is uncontrolled inconsistency.

The saga pattern accepts that atomic cross-service transactions are impractical and replaces them with structured recovery.

Where It Fits

Sagas are appropriate when business workflows span multiple services and require coordinated outcomes.

They are less necessary in systems that can rely on a single database transaction.

As architectures become more distributed, the need for deliberate workflow management increases.

Coordinated Recovery

The Saga pattern does not eliminate failure.

It plans for it.

Each step is local. Each failure has a defined response. The system progresses through a conversation of events and compensations.

Consistency is achieved not through atomicity, but through disciplined coordination.

In distributed systems, recovery is part of the design.