The Architecture Reference

Cld saas · Cloud & SaaS · Advanced

Tenant Isolation Models

Silo, pool, and bridge express how resources map to tenants — but deployment is not isolation; true isolation needs a separate gatekeeper scoped by tenant context.

Cld saas Advanced ⏱ 5 min read Complete

🧭 Analogy

Giving each tenant their own apartment (silo) makes privacy easier, but it isn’t privacy by itself — if the building’s master key opens every door, a careless concierge can still walk into the wrong unit. Isolation is the lock and the rule that says “this key opens only your door,” whether tenants have separate apartments or share one big loft (pool).

Silo, pool, and the bridge between them

The whole book turns on two terms that describe how application-plane resources map to infrastructure:

  • Silo — a resource dedicated to a tenant.
  • Pool — a resource shared by tenants.

These deliberately avoid the baggage of “multi-tenant,” apply granularly (per resource or group), and don’t map to any single construct. The deployment models combine them:

graph TD
M["Deployment models"] --> FS["Full Stack Silo<br/>(every tenant fully dedicated)"]
M --> FP["Full Stack Pool<br/>(all resources shared)"]
M --> HY["Hybrid Full Stack<br/>(limited silos + pool)"]
M --> MM["Mixed Mode / Bridge<br/>(silo/pool per service)"]
M --> PD["Pod<br/>(group tenants per unit)"]
  • Full Stack Silo — every tenant gets an identical, same-version environment (account-per-tenant or VPC-per-tenant). Drivers: compliance, migration, premium tiering. Reduced blast radius, simple cost attribution — but it doesn’t scale to hundreds/thousands of tenants. Mindset: “build as if it were a full-stack pool, then treat each silo as a single-tenant instance of the pool” — never allow per-tenant customization.
  • Full Stack Pool — all resources shared, chasing economies of scale; tenant context becomes essential to every operation. Isolation is harder, noisy-neighbor risk is high, and an outage is an “all-in commitment” hitting all tenants (demanding bulkhead patterns and a higher availability bar).
  • Mixed Mode (the bridge) — fine-grained silo/pool choices service-by-service (e.g., a siloed order service for compliance alongside a pooled ratings service) — maximum flexibility, a compelling middle path.
  • Pod — groups a collection of tenants into a unit of deployment/operation, driven by scale limits, geography, or isolation.

Deployment is not isolation

The cardinal SaaS mistake

A Product service with siloed per-tenant databases looks isolated — but if service code substitutes another tenant’s ID, the request succeeds. Deployment is not isolation. Dedicated deployment only makes isolation easier. You need a separate gatekeeper mechanism that uses tenant context to scope access regardless of deployment model. A single leak can be catastrophic, and even “trusted” developer code can unintentionally cross boundaries.

Two more distinctions matter: RBAC ≠ isolation (RBAC scopes by a user’s role; isolation scopes exclusively by tenant context and is identical for all users in a tenant), and filtering by TenantId is not enough — true isolation must make it impossible for a query to reach another tenant’s data.

The three isolation categories

From coarse to fine:

  1. Full-stack isolation — a dedicated stack per tenant (straightforward).
  2. Resource-level isolation — shared compute, but an entire resource (database, bucket, queue) is dedicated per tenant.
  3. Item-level isolation — inside a shared resource where tenants’ items are commingled — the hardest, because available constructs shrink (e.g., a DynamoDB shared table with an IAM Condition on dynamodb:LeadingKeys).

Deployment-time vs. runtime enforcement

  • Deployment-time isolation (siloed compute) — inject tenant context into a templatized policy at deployment, with no dependency on application code; compliance shifts to provisioning.
  • Runtime isolation (pooled compute) — code dynamically acquires tenant-scoped credentials per request via an isolation manager — e.g., STS assume_role() with a scoped policy. Optionally inject scoped credentials from outside the service (an API Gateway / Lambda authorizer).
sequenceDiagram
participant R as Request with tenant context
participant S as Service code
participant IM as Isolation manager
participant STS as STS assume_role
participant D as Shared resource
R->>S: invoke
S->>IM: acquire scoped credentials
IM->>STS: assume role with scoped policy
STS-->>IM: tenant-scoped credentials
IM-->>S: credentials
S->>D: access only this tenant data

Hide it through interception

Push isolation out of developers’ hands with interception — aspects, middleware, wrappers, and sidecars — so business logic never handles tenant scoping directly. For scale, cache scoped credentials with a TTL, keep credential helpers in-process, and use policy templates to avoid hitting IAM policy limits. Default to pooled and make silo “earn its way out.” Testing isolation is one of the hardest aspects — inject an invalid tenant context deep in the app (even via chaos engineering) and assert no data is returned.

See also

When to use it — and when not

✅ Reach for it when

  • Choosing a deployment model per resource: dedicated (silo), shared (pool), or mixed (bridge)
  • Enforcing that no tenant can ever reach another tenant's data
  • Designing for compliance, blast-radius limits, or premium tiering

⛔ Think twice when

  • Assuming a siloed/dedicated resource is automatically isolated
  • Relying on RBAC or simple tenant-ID filtering as your isolation mechanism
  • Siloing everything by default — make silo earn its way out

Check your understanding

Score: 0 / 4

1. What do 'silo' and 'pool' mean?

The silo/pool terms apply granularly per resource and avoid the baggage of 'multi-tenant.' A mixed/bridge model combines them service-by-service.

2. Why is 'deployment is not isolation' the foundational argument?

Dedicated deployment only makes isolation easier. You need a separate gatekeeper that scopes access by tenant context regardless of deployment model.

3. What is item-level isolation?

The three categories (coarse to fine) are full-stack, resource-level, and item-level; item-level is hardest because available constructs shrink (e.g., DynamoDB LeadingKeys conditions).

4. How does runtime isolation typically obtain tenant-scoped access for pooled compute?

Runtime isolation dynamically acquires scoped credentials per request; deployment-time isolation injects a templatized policy into siloed compute with no dependency on app code.

Comments

Sign in with GitHub to join the discussion.