Designing Multi-Tenant SaaS APIs with Node.js and FastAPI

Multi-tenancy is one of those architectural decisions that looks deceptively simple in a design doc and quietly turns into the most expensive system you own. Every authentication call, every database query, every cache key, every log line, every background job has to carry a tenant identity correctly — or you ship a cross-tenant data leak. This post walks through the production patterns that hold up under load, the trade-offs between isolation models, and how to implement them concretely in Node.js (Express/Fastify) and Python (FastAPI) backends.

The Three Isolation Models

Before writing a single line of code, decide which of three tenancy models you are building. The model dictates everything downstream — connection pooling, migrations, blast radius during incidents, and compliance posture.

Shared database, shared schema. All tenants share the same tables. Every row carries a tenant_id. Cheapest to operate, easiest to scale horizontally, hardest to isolate at the storage layer. Cross-tenant bugs are catastrophic and silent.
Shared database, separate schema. One PostgreSQL/MySQL database, one schema per tenant. Reasonable isolation at the SQL layer (SET search_path in Postgres), but schema-per-tenant migrations become operationally painful past a few hundred tenants.
Database per tenant. Maximum isolation. Required for regulated workloads (HIPAA, certain financial verticals) and large enterprise customers who insist on it contractually. Operationally expensive: backups, migrations, connection limits, and per-tenant capacity planning all multiply.

Most B2B SaaS systems start at model 1 and selectively promote large or regulated tenants to dedicated schemas or databases. Build for that promotion path from day one — your tenant identity model needs to be portable across all three.

Tenant Identity: The Foundation

Every request must arrive at your service with a verified, non-spoofable tenant identity. Two patterns dominate in practice:

JWT-embedded tenant claim. The access token carries a tid (tenant id) claim, signed by your identity provider (Auth0, Cognito, Keycloak, or in-house OIDC). The API verifies the JWT signature and uses tid as the source of truth.
Subdomain-based routing. acme.api.example.com resolves the tenant before authentication. Useful for branded customer URLs, but always cross-check the subdomain against the token’s tid — never trust the subdomain alone.

A common, dangerous mistake is reading tenant_id from a header or request body. If the client controls it, the attacker controls it. The tenant id must come from a cryptographically verified source.

FastAPI: Tenant Resolution Middleware

from fastapi import FastAPI, Depends, HTTPException, Request
from jose import jwt, JWTError
from contextvars import ContextVar

_current_tenant: ContextVar[str] = ContextVar("current_tenant")

app = FastAPI()

async def require_tenant(request: Request) -> str:
    auth = request.headers.get("authorization", "")
    if not auth.startswith("Bearer "):
        raise HTTPException(401, "missing bearer token")
    try:
        claims = jwt.decode(
            auth[7:],
            key=PUBLIC_KEY,
            algorithms=["RS256"],
            audience="api.example.com",
        )
    except JWTError:
        raise HTTPException(401, "invalid token")

    tid = claims.get("tid")
    if not tid:
        raise HTTPException(403, "no tenant in token")

    subdomain_tid = request.url.hostname.split(".")[0]
    if subdomain_tid and subdomain_tid != tid:
        raise HTTPException(403, "tenant mismatch")

    _current_tenant.set(tid)
    return tid

@app.get("/v1/projects")
async def list_projects(tenant_id: str = Depends(require_tenant)):
    return await projects_repo.list_for_tenant(tenant_id)

The ContextVar is critical: it propagates tenant identity into background tasks, async DB clients, and structured loggers without requiring every function signature to carry it explicitly.

Node.js: AsyncLocalStorage for Tenant Context

Node has a direct equivalent in node:async_hooks:

import { AsyncLocalStorage } from "node:async_hooks";

export const tenantStore = new AsyncLocalStorage<{ tenantId: string }>();

export function tenantMiddleware(req, res, next) {
  const claims = verifyJwt(req.headers.authorization);
  if (!claims?.tid) return res.status(403).end();
  tenantStore.run({ tenantId: claims.tid }, () => next());
}

export function currentTenant(): string {
  const ctx = tenantStore.getStore();
  if (!ctx) throw new Error("tenant context missing");
  return ctx.tenantId;
}

The discipline you need: never read req.body.tenantId or req.query.tenantId. The only legitimate read is currentTenant(), which is bound to the authenticated identity.

Data Isolation at the Query Layer

A correctly-set tenant context is necessary but not sufficient — every query still has to filter by it. There are three defenses, and you want all three.

Defense 1: Repository-level Enforcement

Wrap your data access so the tenant id is impossible to omit:

class ProjectRepository {
  async list() {
    const tid = currentTenant();
    return this.db.query(
      "SELECT id, name FROM projects WHERE tenant_id = $1",
      [tid],
    );
  }
}

Forbid raw query execution outside the repository layer with a lint rule or wrapper that requires a tenant id argument.

Defense 2: PostgreSQL Row-Level Security

For the shared-schema model, Postgres RLS is the strongest in-database guardrail:

ALTER TABLE projects ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON projects
  USING (tenant_id = current_setting('app.current_tenant')::uuid);

In your connection wrapper, set the GUC on every checkout:

async with pool.acquire() as conn:
    await conn.execute(
        "SELECT set_config('app.current_tenant', $1, true)",
        tenant_id,
    )
    yield conn

The true (local) argument scopes the setting to the transaction so it cannot leak between pooled connections. Even an application bug that omits the WHERE tenant_id = ... clause now returns zero rows instead of another tenant’s data.

Defense 3: Per-Tenant Connection Routing

For mixed shared/dedicated topologies, route based on a tenant catalog:

const catalogEntry = await tenantCatalog.lookup(tid);
const pool = catalogEntry.dedicated
  ? pools.dedicated.get(catalogEntry.cluster)
  : pools.shared;

Cache the catalog aggressively (5–60 seconds) with a stampede-safe loader. Tenant directory lookups happen on every request and become the busiest endpoint in your system.

Connection Pool Math

Pool sizing is where multi-tenant systems quietly die. A typical Postgres instance can handle ~200–500 concurrent connections before TCP/process overhead dominates. With N service instances and M tenants, naive per-tenant pools blow past that limit immediately.

Practical rules:

Use a single shared pool for the shared-database model. Apply RLS rather than per-tenant pools.
PgBouncer (transaction mode) in front of Postgres lets you keep thousands of client-side connections while Postgres sees a few hundred. Transaction-mode pooling forbids session-level state — design accordingly (no SET LOCAL outside a transaction, no advisory locks across statements).
Dedicated pools for dedicated databases, sized to the expected concurrent in-flight queries per tenant, not the tenant’s user count. Most tenants idle.

Background Jobs and Async Work

Tenant context must flow into every queue producer and consumer. A job that runs without tenant context is a bug — there is no “system” tenant for tenant-scoped work.

await queue.enqueue("report.generate", {
  tenantId: currentTenant(),
  reportId,
});

In the worker:

async function handle(job) {
  await tenantStore.run({ tenantId: job.data.tenantId }, async () => {
    await generateReport(job.data.reportId);
  });
}

The same pattern applies to scheduled jobs. A nightly “send digest emails” cron should iterate the tenant catalog and run each tenant inside its own context, not query across tenants in a single transaction.

Caching Without Cross-Tenant Bleeds

Cache keys must include the tenant id. Always. The most common bug in early-stage SaaS is a Redis key like user:42 that returns a user from a different tenant because two tenants both have a user id of 42.

const key = `t:${tenantId}:user:${userId}`;

Better, encode this in your cache wrapper so callers cannot construct raw keys:

cache.get("user", userId); // becomes t:<tid>:user:<id>

For shared caches, also separate eviction policies — a noisy tenant should not evict another tenant’s hot data. Redis Cluster with per-tenant key prefixes plus maxmemory-policy set to allkeys-lfu handles this well enough at small scale. At larger scale, dedicate logical Redis databases or clusters to premium tenants.

Rate Limiting and Quotas

Per-tenant rate limits are a first-class concern, not a nice-to-have. A single misbehaving tenant can saturate your event loop and degrade everyone else.

Implement quotas at the gateway when possible (NGINX with limit_req_zone $http_x_tenant, Envoy with rate limit service, AWS API Gateway usage plans). When the gateway cannot see the tenant — e.g., it lives inside the JWT — enforce in the application using a token-bucket implementation backed by Redis. Use INCR with EXPIRE for simple fixed windows; use a Lua script for accurate sliding windows or atomic token bucket refills.

-- bucket: { tokens, last_refill_ms }
local now = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local capacity = tonumber(ARGV[3])
local bucket = redis.call('HMGET', KEYS[1], 'tokens', 'last')
local tokens = tonumber(bucket[1]) or capacity
local last = tonumber(bucket[2]) or now
tokens = math.min(capacity, tokens + (now - last) * rate / 1000)
if tokens < 1 then return 0 end
redis.call('HMSET', KEYS[1], 'tokens', tokens - 1, 'last', now)
redis.call('PEXPIRE', KEYS[1], 60000)
return 1

Observability: Tenant as a First-Class Label

Every log line, metric, and trace must carry tenant_id. Without it, you cannot answer the most basic operational question: which tenant is causing this incident?

Logs. Inject tenant_id via your structured logger’s context binding (pino’s child loggers, Python’s LoggerAdapter, structlog context vars).
Metrics. Be careful: a histogram labeled by tenant_id produces a unique time series per tenant. With 10,000 tenants that is unsustainable in Prometheus. Use exemplars or bucket metrics into “top-N noisy tenants” + “everyone else” via a recording rule.
Traces. Add tenant_id as a span attribute. OpenTelemetry handles cardinality better than Prometheus, but storage backends (Tempo, Jaeger) still need to be sized for it.

Trade-offs and Failure Modes

A few realities worth internalizing:

Shared-schema multi-tenancy is fast but unforgiving. A single missing WHERE tenant_id is a P0 incident. Invest in RLS, repository-level guardrails, and integration tests that explicitly probe for cross-tenant access.
Noisy-neighbor problems are inevitable. One tenant generating 10x the traffic will degrade everyone unless you implement per-tenant concurrency limits and quotas. Plan for the largest 1% of tenants from day one.
Migrations get harder over time. Schema migrations on a 500GB shared table touching every tenant can lock writes for minutes. Use pg_repack, online schema change tools (gh-ost, pt-online-schema-change), or move heavy tenants to dedicated databases before this becomes the only option.
Compliance promotion is a product feature. When a customer signs a regulated contract, you need a runbook for moving them from shared to dedicated infrastructure — pre-built, tested, and ideally automated.

Closing

A correct multi-tenant API is not the result of one good decision; it is the cumulative product of dozens of small, disciplined ones: how tenant identity is extracted, how queries are written, how connection pools are sized, how cache keys are constructed, how jobs are enqueued, how logs are tagged. The patterns above — JWT-derived tenant claims, RLS-backed query isolation, context-bound async propagation, per-tenant quotas, and tenant-labeled observability — give you a system that scales horizontally, isolates correctly, and survives the kind of incidents that kill less rigorous designs. Build for the promotion path from shared to dedicated infrastructure from the beginning, and the architecture stays viable from your first ten customers to your ten-thousandth.