Ms communication · Microservices · Intermediate

Communication Styles: Sync vs Async

Choose the communication style before the technology — request-response vs event-driven, synchronous vs asynchronous — and understand temporal coupling, cascading failures, and what goes in an event.

Ms communication Intermediate ⏱ 5 min read Complete

🧭 Analogy

A phone call is synchronous: both people must be available at once, and if the other end is slow you’re stuck holding the line. A letter or a posted notice is asynchronous: you drop it in the box and carry on; the recipient reads it whenever they’re free. Neither is “better” — you choose by whether you need an answer right now or can decouple in time.

Style before technology

Newman’s thesis: teams pick a technology (REST, gRPC, Kafka) before deciding what style of communication they need. Set technology aside and build a vocabulary. Moving from in-process to inter-process calls, three things change: performance (network calls can’t be inlined; payload and serialization now matter), changing interfaces (provider and consumers deploy separately, so backward-incompatible changes force lockstep or phased rollouts), and error handling (new failure modes — crash, omission, timing, response, and arbitrary/Byzantine failures; HTTP status codes model rich semantics, e.g. a retryable 503 vs a pointless-to-retry 404).

The model of styles

graph TD
Start{"What do you need?"}
Start -->|"a reply / something done"| RR["Request-response"]
Start -->|"emit a fact, don't care who acts"| EV["Event-driven"]
RR --> SB["Synchronous blocking<br/>(keep connection open)"]
RR --> AB["Asynchronous nonblocking<br/>(via queue/broker)"]
EV --> ASYNC["Nonblocking asynchronous only"]

First choose request-response vs event-driven; if request-response, choose synchronous vs asynchronous; event-driven is limited to nonblocking asynchronous. Mix and match is the norm.

A poison message shows why async needs a safety valve — without a retry cap a single bad message can take down the whole pool:

graph LR
Q["Queue"] --> W["Worker"]
W -->|"crash, requeue"| Q
W -->|"retries exceed limit"| DLQ["Dead letter queue<br/>(message hospital)"]
DLQ --> H["Human / automated triage"]

Synchronous blocking — simple and familiar, but creates temporal coupling, susceptibility to slow downstreams, and cascading failures in long call chains (MusicCorp’s fraud example). Remedies: restructure interactions (move fraud detection off the critical path) or go nonblocking.
Asynchronous nonblocking — temporal decoupling, good for long-running work (a warehouse dispatch taking hours or days), at the cost of complexity and choice. (Note: await is still blocking from the code’s perspective.)
Communication through common data — one service writes to a known location (file, store) that others later consume (data lakes for loose coupling, warehouses for tighter). Simple, ubiquitous, great for large volumes and interoperability — but usually high-latency (polling).

Request-response: a “request” beats a “command”

For fetching data or ensuring something is done, prefer a request (examinable, rejectable) over a command. Synchronous keeps a connection open; asynchronous routes via a queue/broker (buffering benefits, but you must correlate and route the response). All forms need time-out handling.

Event-driven: emit facts, don’t issue orders

A service emits events — factual statements that something happened — without knowing who consumes them, greatly reducing coupling. An event is a fact (the payload); a message is the medium. Implement via message brokers (“keep the middleware dumb, the smarts in the endpoints”).

What’s in an event?

Just an ID — causes a barrage of callbacks and re-adds coupling.
Fully detailed events — Newman’s preference: self-sufficient consumers and a historical record useful for event sourcing. But watch event size, PII leakage, and the fact that event data becomes part of your contract.

💡 Choreography vs orchestration is next

Once services emit and react to events, you’re choosing between letting them react independently or having one service direct the flow. That trade-off is covered in choreography vs orchestration and applied to workflows in sagas.

⚠️ The poison message and catastrophic failover

A 2006 bank pricing system suffered a catastrophic failover: a crashing worker requeued a poison message that crashed the next worker that picked it up, repeatedly, taking the whole pool down. The fix was a maximum retry limit and a dead letter queue / message hospital. With async messaging you also need correlation IDs and good monitoring — the decoupling buys scalability at a real cost in operational complexity.

🔑 Key insight

There is rarely one right option; expect a mix. Synchronous is simplest but couples you in time; asynchronous events give the loosest coupling but demand correlation, monitoring, and careful failure handling. Choose per interaction, deliberately.

When to use it — and when not

✅ Reach for it when

You are deciding how two services should talk to each other.
You want to reduce temporal coupling or avoid cascading failures.
You need to decide what to put inside an event.

⛔ Think twice when

You have already picked a style and need a specific technology comparison.
You are designing a whole distributed workflow — see sagas.

ms-communicationChoreography vs Orchestration

Two ways to coordinate a multi-service business process: a central orchestrator that commands the flow, or choreographed services reacting to broadcast events — and when to choose each.

ms-communicationSagas: Distributed Workflows Without Distributed Transactions

Coordinate multiple state changes across services without long-held locks by modeling a process as a sequence of local transactions, with compensating actions for rollback.

ms-operationsResilience: Timeouts, Retries, Circuit Breakers, Bulkheads

Stability patterns for distributed systems — timeouts, retries, bulkheads, circuit breakers, and idempotency — plus the four aspects of resilience and why it's ultimately a people property.

Check your understanding

Score: 0 / 4

1. What is Newman's main thesis about communication?

Teams pick a technology before deciding what style they need; the chapter builds a vocabulary of styles first, deliberately setting technology aside.

2. What problem does synchronous blocking communication create?

A synchronous call requires the callee to be up at the same time; slow downstreams can cascade failures back up the chain.

3. What does Newman recommend putting in an event?

Fully detailed events make consumers self-sufficient and provide a historical record; 'just an ID' causes a barrage of callbacks and adds coupling — but beware size, PII, and contract implications.

4. What fixed the 2006 bank pricing system's catastrophic failover?

A poison message kept crashing each worker that picked it up; a max retry limit and dead letter queue stopped the loop.

Sync across devices

Communication Styles: Sync vs Async

Style before technology

The model of styles

Request-response: a “request” beats a “command”

Event-driven: emit facts, don’t issue orders

See also

When to use it — and when not

✅ Reach for it when

⛔ Think twice when

Check your understanding

Comments

Style before technology

The model of styles

Request-response: a “request” beats a “command”

Event-driven: emit facts, don’t issue orders

See also

When to use it — and when not

✅ Reach for it when

⛔ Think twice when

Related topics

Check your understanding

Comments