1 miesiąc temu · 8444d3d
--- a/research/TECHFILE.md
+++ b/research/TECHFILE.md
@@ -772,3 +772,34 @@ The updated scope correctly addresses all three Critical findings and all seven
 
				 | Critical      | 2     | GoTrue anonymous auth env var; Edge Functions not in default self-hosted stack                                                                                                                                                                   |
			
 
				 | Recommended   | 8     | argon2 Docker native build; iron-session v8 API; persistQueryClient adapter packages; serwist sw.ts authoring; t3-env over raw zod + SUPABASE_URL naming fix; tini confirmation; Caddy/Docker internal URL networking; Vitest/Playwright scoping |
			
 
				 | No New Action | —     | All first-review findings confirmed adopted in updated scope                                                                                                                                                                                     |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## Realtime Scaling Limits (added 2026-05-08)
			
 
				+
			
 
				+Self-hosted Supabase Realtime is fine for MVP and the low thousands of concurrent users with the current single-container config. Document of known limits so future capacity work has a baseline.
			
 
				+
			
 
				+**Architecture today:** one `supabase-realtime` container (BEAM/Elixir), one logical replication slot from Postgres, postgres_cdc_rls extension evaluating RLS per subscriber per change, single shared Postgres for Realtime + PostgREST + Auth.
			
 
				+
			
 
				+**Comfortable limits (single-node):**
			
 
				+
			
 
				+- ~10–30k concurrent WebSocket connections per BEAM node (RAM-bound).
			
 
				+- Hundreds of writes/sec on watched `public.movies` rows.
			
 
				+- `REPLICA IDENTITY FULL` on `movies` is cheap because rows are ~1KB; would be expensive on wide/large tables.
			
 
				+
			
 
				+**Failure modes at scale:**
			
 
				+
			
 
				+1. **Single realtime container = single fan-out CPU.** Hot groups (e.g., 100+ users in one list, all subscribed) cause linear policy evaluation on every UPDATE. CPU saturation, not crash. Mitigation: cluster Realtime via libcluster (BEAM distributed) — needs DNS-based discovery and `DNS_CLUSTER_QUERY` env wired into compose.
			
 
				+2. **Single logical replication slot.** Stuck or slow subscriber bloats WAL on Postgres, can fill disk. Mitigation: monitor `pg_replication_slots.confirmed_flush_lsn` lag; alert before WAL fills volume.
			
 
				+3. **Shared Postgres connection pool.** Realtime + PostgREST + Auth + cron all hit the same DB. At ~1000+ concurrent users, add **pgbouncer** in transaction-pooling mode in front of Postgres; raise `max_connections` only as a stopgap.
			
 
				+4. **postgres_cdc_rls per-subscriber RLS evaluation.** Current `movies` SELECT policy is cheap (one membership check). If policies grow more complex (joins, multi-table subqueries), evaluation cost compounds with subscriber count.
			
 
				+5. **Tenant table is a single point of config.** `_realtime.tenants` holds encrypted DB credentials with `DB_ENC_KEY=supabaserealtime`. Rotating that key requires re-encrypting the tenant row.
			
 
				+
			
 
				+**Capacity triggers — when to act:**
			
 
				+
			
 
				+- Realtime container CPU sustained >70% → cluster.
			
 
				+- WS connect failures or `phx_close` storms → check tenant config + connection pool.
			
 
				+- WAL volume growth >10%/day with no corresponding DB write growth → check replication slot lag.
			
 
				+- p95 update-broadcast latency >500ms → fan-out bottleneck.
			
 
				+
			
 
				+**Out of scope for MVP.** Flag in PROJECT_SCOPE.md Phase 9 (capacity) or Phase 10 (launch-readiness) when traffic projections justify the work.