|
@@ -772,3 +772,34 @@ The updated scope correctly addresses all three Critical findings and all seven
|
|
|
| Critical | 2 | GoTrue anonymous auth env var; Edge Functions not in default self-hosted stack |
|
|
| Critical | 2 | GoTrue anonymous auth env var; Edge Functions not in default self-hosted stack |
|
|
|
| Recommended | 8 | argon2 Docker native build; iron-session v8 API; persistQueryClient adapter packages; serwist sw.ts authoring; t3-env over raw zod + SUPABASE_URL naming fix; tini confirmation; Caddy/Docker internal URL networking; Vitest/Playwright scoping |
|
|
| Recommended | 8 | argon2 Docker native build; iron-session v8 API; persistQueryClient adapter packages; serwist sw.ts authoring; t3-env over raw zod + SUPABASE_URL naming fix; tini confirmation; Caddy/Docker internal URL networking; Vitest/Playwright scoping |
|
|
|
| No New Action | — | All first-review findings confirmed adopted in updated scope |
|
|
| No New Action | — | All first-review findings confirmed adopted in updated scope |
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## Realtime Scaling Limits (added 2026-05-08)
|
|
|
|
|
+
|
|
|
|
|
+Self-hosted Supabase Realtime is fine for MVP and the low thousands of concurrent users with the current single-container config. Document of known limits so future capacity work has a baseline.
|
|
|
|
|
+
|
|
|
|
|
+**Architecture today:** one `supabase-realtime` container (BEAM/Elixir), one logical replication slot from Postgres, postgres_cdc_rls extension evaluating RLS per subscriber per change, single shared Postgres for Realtime + PostgREST + Auth.
|
|
|
|
|
+
|
|
|
|
|
+**Comfortable limits (single-node):**
|
|
|
|
|
+
|
|
|
|
|
+- ~10–30k concurrent WebSocket connections per BEAM node (RAM-bound).
|
|
|
|
|
+- Hundreds of writes/sec on watched `public.movies` rows.
|
|
|
|
|
+- `REPLICA IDENTITY FULL` on `movies` is cheap because rows are ~1KB; would be expensive on wide/large tables.
|
|
|
|
|
+
|
|
|
|
|
+**Failure modes at scale:**
|
|
|
|
|
+
|
|
|
|
|
+1. **Single realtime container = single fan-out CPU.** Hot groups (e.g., 100+ users in one list, all subscribed) cause linear policy evaluation on every UPDATE. CPU saturation, not crash. Mitigation: cluster Realtime via libcluster (BEAM distributed) — needs DNS-based discovery and `DNS_CLUSTER_QUERY` env wired into compose.
|
|
|
|
|
+2. **Single logical replication slot.** Stuck or slow subscriber bloats WAL on Postgres, can fill disk. Mitigation: monitor `pg_replication_slots.confirmed_flush_lsn` lag; alert before WAL fills volume.
|
|
|
|
|
+3. **Shared Postgres connection pool.** Realtime + PostgREST + Auth + cron all hit the same DB. At ~1000+ concurrent users, add **pgbouncer** in transaction-pooling mode in front of Postgres; raise `max_connections` only as a stopgap.
|
|
|
|
|
+4. **postgres_cdc_rls per-subscriber RLS evaluation.** Current `movies` SELECT policy is cheap (one membership check). If policies grow more complex (joins, multi-table subqueries), evaluation cost compounds with subscriber count.
|
|
|
|
|
+5. **Tenant table is a single point of config.** `_realtime.tenants` holds encrypted DB credentials with `DB_ENC_KEY=supabaserealtime`. Rotating that key requires re-encrypting the tenant row.
|
|
|
|
|
+
|
|
|
|
|
+**Capacity triggers — when to act:**
|
|
|
|
|
+
|
|
|
|
|
+- Realtime container CPU sustained >70% → cluster.
|
|
|
|
|
+- WS connect failures or `phx_close` storms → check tenant config + connection pool.
|
|
|
|
|
+- WAL volume growth >10%/day with no corresponding DB write growth → check replication slot lag.
|
|
|
|
|
+- p95 update-broadcast latency >500ms → fan-out bottleneck.
|
|
|
|
|
+
|
|
|
|
|
+**Out of scope for MVP.** Flag in PROJECT_SCOPE.md Phase 9 (capacity) or Phase 10 (launch-readiness) when traffic projections justify the work.
|