Skip to main content

Architecture

This page is the deeper version of the diagram on the Overview. It covers the multi-tenant model, where tenant identity comes from, how auth and CORS flow through the v3 stack, and how scheduled work is invoked.

Generation: v3 (current)

LayerToday
FrontendCloudFront → S3 (per-tenant in prod, shared SPA on dev).
API GatewayHTTP APIs, one per backend service.
Backend computeShared Lambdas. No per-tenant EC2 or ECS.
Routingapi-router Lambda fans out by X-Tenant-Id.
AuthOne Cognito user pool per tenant; backend validates the JWT against the tenant's pool.
DataOne Aurora PostgreSQL cluster, database per tenant inside it.
DirectoryTenantDirectory in DynamoDB — single source of truth for tenant → DB / Cognito / routing.
SchedulingEventBridge Scheduler, one rule per tenant at rate(5 minutes).

Generation history (v1 = EC2 per tenant, v2 = ECS Fargate per tenant) is fully decommissioned. ECS clusters, the shared per-tenant ALB, and the ECS-Exec VPC endpoints were all deleted on 2026-05-06.

Multi-tenant model

There is one swishing-game-backend Lambda for all tenants. Tenant identity is not baked into the deploy; it arrives on every request and the Lambda fans out per-tenant resources by id.

  • Per-tenant DB. TenantDirectory[PK=TENANT#<id>][SK=DB] holds secret_arn. getDbPool({ tenantId }) reads the tenant secret and caches a per-tenant pg.Pool (max=2, idleTimeoutMillis=5000).
  • Per-tenant Cognito. TenantDirectory[SK=COGNITO] holds the pool id + app client. AWS SDK calls to cognito-idp:Admin* target the tenant pool.
  • Per-tenant routing. TenantDirectory[SK=ROUTING] holds backend_base_url. In v3 every active tenant points at the same shared Lambda (api.swishing.cards).

If a request doesn't carry a recognizable tenant id, api-router rejects it before the backend ever sees it.

Where tenant identity comes from

Two sources, in priority order:

  1. HTTP requests: X-Tenant-Id header. Frontend attaches this; api-router validates it against TenantDirectory; the backend reads req.headers['x-tenant-id'] via tenantIdFromReq().
  2. Scheduled / EventBridge invocations: event.tenantId on the Lambda event payload. The Lambda dispatcher (lambda.js) routes based on event.trigger; for game-transition events the tenant comes from event.tenantId.

There is no TENANT_ID environment variable. That was a v2 artifact; it was stripped during the v3 refactor.

Request flow (authenticated game request)

Auth flow

  • Identity provider: Cognito User Pool per tenant. Each pool issues its own JWTs.
  • Token validation: the backend verifies the bearer token against the tenant's pool by id (looked up via TenantDirectory[SK=COGNITO]). The token issuer is checked against the expected pool URL.
  • Authorization on the wire: Authorization: Bearer <jwt> header.
  • Tenant scoping: X-Tenant-Id is validated against the token's iss claim. A token from one tenant's pool cannot be used to address another tenant's resources. This is the IDOR fix that landed 2026-05-11.

CORS

CORS lives at API Gateway, not Express:

  • Source of truth: the API Gateway HTTP API CORS config on gateway.*.
  • Backend: Express cors() was removed during the v3 refactor; the Lambda has only a 3-line OPTIONS-204 handler so the $default route doesn't 404 on preflight.
  • Why: keeping CORS in the gateway means any future split (per-service Lambda, multi-region) doesn't drag duplicated CORS configs along.

Scheduler

Each active tenant has its own EventBridge Scheduler entry under the game-transitions (prod) / game-transitions-dev (dev) group, firing at rate(5 minutes). The target is lambda:Invoke on the shared swishing-game-backend Lambda with { trigger: 'game-transition', tenantId: <id> }.

A daily reconcile cron was retired on 2026-05-12 — the rate-based per-tenant schedule is self-healing on its own.

tenant-teardown lives in a separate scheduler group of the same name.

Tenant provisioning + teardown

  • Provisioning: SQS swishing-internal-provisioning[-dev]swishing-provision-worker Lambda creates DB, Cognito pool, TenantDirectory rows, then auto-syncs templates from S3. There is no per-tenant runtime deploy step — the new tenant just becomes addressable through the shared Lambda.
  • Teardown: the swishing-tenant-teardown Lambda runs on a schedule and removes DB/Cognito/directory for tenants that have signaled the end of their lifecycle.

Documentation surfaces

HostnameBacked by
docs.internal.[dev.]swishing.cardsThis portal (the overview you're reading).
api.[dev.]swishing.cards/docsswishing-game-backend Swagger UI.
gateway.swishing.cards/docsapi-router Swagger UI.
api.auth.[dev.]swishing.cards/docsauth-api Swagger UI.
api.internal.[dev.]swishing.cards/docsinternal-api Swagger UI.
api.demo.swishing.cards/docsdemo-api Swagger UI.
marketing.swishing.cards/docsmarketing-api Swagger UI.

All /docs endpoints are gated by the same Microsoft Entra SSO via Lambda-side OIDC middleware (middleware/oidcDocs.js per service). See the 2026-05-12 docs consolidation runbook for the full pivot story.

Where to go next

  • Services — every service in one table with its /docs URL.
  • AWS inventory — every AWS resource and its role.
  • Runbooks — indexed operational walkthroughs.