Workflow Engines
Orchestrate complex business processes with workflow engines, state machines, and long-running task coordination across distributed services.
Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.
Overview
Workflow engines orchestrate complex, multi-step business processes that span services, time, and failure domains. Unlike simple job queues that execute independent tasks, workflows manage state transitions, retries, timeouts, and compensations across distributed systems. Whether processing an e-commerce order, underwriting an insurance policy, or approving a loan, workflow engines ensure each step executes in the right order with proper error handling.
When to Use
Use this resource when:
- Business processes have 5+ sequential steps with failure handling requirements
- Steps need to wait for human approval or external events (hours or days)
- Partial failures require compensating transactions (saga pattern)
- You need audit trails and visibility into long-running process state
Solution
Temporal Workflow (TypeScript)
import { Workflow, Activity } from '@temporalio/workflow';
const { sendEmail, chargePayment, shipOrder } = proxyActivities<{
sendEmail(email: string): Promise<void>;
chargePayment(amount: number): Promise<string>;
shipOrder(orderId: string): Promise<string>;
}>({
startToCloseTimeout: '30 seconds',
retry: { maximumAttempts: 3 }
});
export async function orderWorkflow(order: Order): Promise<void> {
await sendEmail(order.customerEmail);
const paymentId = await chargePayment(order.total);
if (!paymentId) {
await sendCompensationEmail(order);
throw new Error('Payment failed');
}
try {
await shipOrder(order.id);
} catch (err) {
await refundPayment(paymentId);
throw err;
}
await sendEmail(order.customerEmail, 'Order shipped!');
}
State Machine (Python + transitions)
from transitions import Machine
class OrderWorkflow:
states = ['pending', 'paid', 'shipped', 'cancelled']
def __init__(self):
self.machine = Machine(
model=self,
states=OrderWorkflow.states,
initial='pending',
transitions=[
{'trigger': 'pay', 'source': 'pending', 'dest': 'paid'},
{'trigger': 'ship', 'source': 'paid', 'dest': 'shipped'},
{'trigger': 'cancel', 'source': ['pending', 'paid'], 'dest': 'cancelled',
'after': 'refund_payment'}
]
)
def refund_payment(self):
print("Refunding payment...")
order = OrderWorkflow()
order.pay() # pending -> paid
order.ship() # paid -> shipped
Camunda BPMN Process
<?xml version="1.0" encoding="UTF-8"?>
<bpmn:definitions>
<bpmn:process id="OrderProcess" isExecutable="true">
<bpmn:startEvent id="StartEvent" />
<bpmn:sequenceFlow id="Flow_1" sourceRef="StartEvent" targetRef="CheckInventory" />
<bpmn:serviceTask id="CheckInventory" camunda:delegateExpression="${inventoryChecker}" />
<bpmn:sequenceFlow id="Flow_2" sourceRef="CheckInventory" targetRef="Gateway_1" />
<bpmn:exclusiveGateway id="Gateway_1" default="Flow_4">
<bpmn:sequenceFlow id="Flow_3" sourceRef="Gateway_1" targetRef="ProcessPayment"
conditionExpression="${inventoryAvailable}" />
<bpmn:sequenceFlow id="Flow_4" sourceRef="Gateway_1" targetRef="NotifyOutOfStock" />
</bpmn:exclusiveGateway>
</bpmn:process>
</bpmn:definitions>
Explanation
Core concepts:
- Workflow definition: The blueprint describing steps, transitions, and conditions
- Activity: A single unit of work (API call, database update, human task)
- State: The current position in the workflow ( persisted for durability)
- Compensation: Reversing already completed steps when a later step fails
- Timer: Delaying execution or setting deadlines for activities
When to use workflow engines vs. code:
| Complexity | Approach | Example |
|---|---|---|
| 1-2 steps | Direct function calls | Sending a welcome email |
| 3-5 steps | Code with retry logic | Order processing with payment |
| 5+ steps | Workflow engine | Loan approval with 10+ departments |
| Human tasks | BPMN engine | Insurance claim review |
Variants
| Engine | Model | Best For |
|---|---|---|
| Temporal | Code-as-workflow | Developer-centric; durable execution |
| Camunda | BPMN | Business analyst visibility; human tasks |
| Apache Airflow | DAGs | Data pipelines; scheduled workflows |
| Netflix Conductor | JSON DSL | Microservices orchestration |
| AWS Step Functions | State machines | Serverless; AWS integration |
Best Practices
- Idempotent activities: Running the same activity twice should produce the same result. See message idempotency.
- Idempotency keys: Pass unique keys to external APIs to prevent double charges
- Set timeouts on everything: Default 10-minute timeout prevents stuck workflows
- Version workflow definitions: New deployments shouldn’t break in-flight workflows
- Query workflow state: Expose APIs to check progress without inspecting internal state
Common Mistakes
- Tight coupling to orchestrator: Business logic bleeding into workflow definitions makes testing hard
- No compensation paths: Failed workflows that already charged the customer need explicit refunds. Learn more in saga pattern.
- Polling instead of events: Waiting 30 seconds to check status wastes resources; use callbacks
- Ignoring workflow history: Old completed workflows fill storage; implement retention policies
- No replay testing: Temporal and similar engines replay history; non-deterministic code breaks
Frequently Asked Questions
Q: When should I use a workflow engine instead of a message queue? A: Use message queues for independent, parallel tasks. Use workflow engines for coordinated, sequential processes with state.
Q: How do workflow engines handle crashes? A: They persist state after each activity. On restart, they resume from the last completed step.
Q: Can business analysts modify workflows without developers? A: BPMN-based engines (Camunda) allow this. Code-based engines (Temporal) require developers but offer more flexibility.
Related Resources
Microservices Architecture — When to Use and When Not To
A practical guide to microservices: benefits, trade-offs, common patterns, and when to choose them over monoliths. Covers decomposition strategies and operational complexity.
GuideSystem Design Interview Guide — Key Concepts
A practical guide to system design interviews: scalability, databases, caching, load balancing, microservices, and how to structure your answer.
GuideCAP Theorem and Database Trade-offs
A practical guide to the CAP theorem: consistency, availability, and partition tolerance. Learn how to choose the right trade-offs for your application.
RecipeMicroservices Communication Patterns
Choose between synchronous and asynchronous communication patterns for resilient microservices architectures.
DocADR Template
A reusable template for Architecture Decision Records that capture context, decision, and consequences.