🧭 Analogy
A data product is a packaged consumer good, not a raw ingredient left on a loading dock. It has a label (schema), a manufacturer who stands behind it (owner with SLAs), a shelf where you can find it (catalog), and different package sizes for different shoppers (modes). You don’t reverse-engineer the recipe by breaking into the factory — you buy the product off the shelf.
What a data product is
In data mesh, the second principle elevates data to a first-class product. Crucially, a data product is not just data — it is:
- the data itself,
- the code that builds it,
- the infrastructure that stores and serves it, and
- the ports/modes by which it is accessed.
Like a microservice, it carries single ownership, SLAs, quality guarantees, a release cycle, and clean, standardized access. Its schema together with the stream API forms the contract — captured in the book’s equation Event stream API + event schema == REST API + json schema.
Four key factors
- Immutable and time-stamped — queries yield consistent, reproducible results over time and across consumers; changes are published as new incremental events.
- Multimodal — the same product can be served as an event stream, as daily Parquet files in cloud storage, and behind a REST API. Operational systems need the low latency of streams; streams can be aggregated down into batch files, making them the overarching medium.
- Push vs. pull — pull (REST/SQL/files) is familiar but high-latency and requires polling; push (event streams) notifies subscribers with low latency and is required for real-time operational use. “Like begets like”: the access patterns you choose ripple across the organization.
- Three alignment types — see below.
The key insight
Data mesh was conceived for analytics, but event-driven data products serve operational systems as a powerful “off-label” use: a payments service can read a sales state event in milliseconds — impossible with hourly batch. Teams get real-time event-driven services “for the price of one data product,” which is the strongest argument for internal buy-in.
The three alignment types
Alignment captures how much business logic the owner has applied, trading off owner vs. consumer responsibility.
graph LR SRC["Operational source"] --> SA["Source-aligned<br/>general-purpose, e.g. a sales event"] SA --> AA["Aggregate-aligned<br/>owner applies logic, e.g. daily sales totals"] AA --> CA["Consumer-aligned<br/>customized for one domain, e.g. ad-targeting prediction"] SA -.->|"more owner reuse"| NOTE1["lower per-consumer effort upstream"] CA -.->|"more consumer specificity"| NOTE2["joins several domains"]
- Source-aligned — general-purpose, close to the operational source (a sales event). Resource-constrained popular domains should prioritize these.
- Aggregate-aligned — the owner applies business logic or aggregation, often mixing in other products (daily sales totals). Popular consumer-side aggregations tend to migrate upstream into formal aggregate-aligned products.
- Consumer-aligned — highly customized for one domain, frequently joining several domains’ products (an ad-targeting prediction event).
Quality: the medallion model
Quality is tracked separately from alignment via the medallion model: bronze (raw, source-coupled), silver (structured, sanitized, denormalized — ~99.99% pass), and gold (authoritative, rigorously tested — ~99.9999% pass).
graph LR B["Bronze<br/>raw, source-coupled"] --> S["Silver<br/>sanitized ~99.99%"] S --> G["Gold<br/>authoritative ~99.9999%"] Note["Quality is orthogonal to alignment:<br/>a gold product can build on bronze data"] -.-> B
Each new mode multiplies cost — and needs one owner
Streams are the core mode; every additional mode (files, REST) multiplies governance and self-service complexity, so add one only when ROI justifies it. And whether something is a separate product or another mode of one product turns on ownership: multiple modes require a single owner guaranteeing consistency across them.
See also
- Data mesh principles — the four principles this elaborates.
- Schemas and evolution — the contract a product exposes.
- Event notification vs. state transfer — state events as the product payload.
When to use it — and when not
✅ Reach for it when
- Important business data should be reliably reusable across many domains
- You want both operational (low-latency) and analytical (batch) consumers from one source
- Ownership, SLAs, quality tiers, and discoverability need to be explicit
⛔ Think twice when
- Data has exactly one private consumer and no reuse value
- You would add a new access mode with no clear ROI (each mode multiplies governance cost)
- Multiple would-be owners — a product needs exactly one owner
Related topics
The four principles of data mesh — domain ownership, data as a product, federated governance, and a self-service platform — and why it is as much a social shift as a technical one.
ed-patternsSchemas and EvolutionThe data contract behind every event — explicit schemas, compatibility types (forward, backward, full), the schema registry, and how to handle breaking changes.
ed-patternsEvent Notification vs. State TransferNotification, event-carried state transfer (ECST), state events, and delta events — what each carries, the coupling it implies, and why state events are the right default for shared data.
Check your understanding
Score: 0 / 41. A data product is composed of…
Like a microservice, a data product bundles data, code, infrastructure, and access ports under single ownership with SLAs and a release cycle.
2. The three alignment types of data products are…
They trade off owner vs. consumer responsibility: source-aligned is general-purpose near the source; aggregate-aligned applies business logic; consumer-aligned is customized for one domain.
3. The medallion quality model (bronze/silver/gold) is…
Quality (bronze raw → gold authoritative) is independent of alignment; the normal path is bronze → gold, and a product can't promise stronger guarantees than its dependencies.
4. Why are event-driven data products a strong selling point for operational systems?
Data mesh was conceived for analytics, but event streams serve operational use 'off-label' at low latency, which drives internal buy-in.
Comments
Sign in with GitHub to join the discussion.