CLAUDE.md 7.2 KB

Project Context

What This File Is

This is the persistent context file for Claude Code. Keep it concise and useful.

CLAUDE.md Maintenance Rules

  • Keep this file under 160 lines
  • Only record decisions, not explanations of common knowledge
  • When adding something, check if anything existing is now outdated and remove it
  • Use short bullet points, not paragraphs
  • No boilerplate or filler text
  • If a section grows past 10 items, consolidate or prune the least relevant ones
  • Format: what we chose + why in one line (e.g., "vanilla JS over React — embeddable, no build step")

Stack

  • Python 3 (stdlib only) — fetch, parse, analyze, build_members, build_app, enrich_roster
  • pytest — classifier unit tests + KPI parity gate against legacy/
  • Vanilla JS (IIFE, no framework, no bundler) — frontend must drop into 3rd-party hosts
  • Chart.js 4.4.0 + SortableJS 1.15.2 — vendored locally under template/vendor/, no CDN
  • Data source: clerk.house.gov XML + senate.gov XML (no API key); Congress.gov v3 API for roster enrichment (key in .env)

Architecture Decisions

  • One unified vote-record schema for both chambers (votes.jsonl) — analyze.py is chamber-agnostic
  • analyze.py is pure (no I/O); consumed by build_members.py workers
  • build_members.py uses multiprocessing.Pool with initializer-shared records (no per-task serialization)
  • Per-member JSON written via tmp + os.replace (atomic); members/ dir wiped on each build to avoid stale files
  • manifest.json is the picker index AND carries per-member KPI dict (k) so ranking.html needs only one fetch
  • build_app.py inlines the full manifest as <script type="application/json" id="polisci-manifest"> so file:// works for the picker
  • Per-member JSON still requires HTTP (78 MB total — too large to inline)
  • Chart canvases wrapped in .chart-canvas-wrap with fixed-height parent .chart-frame — prevents Chart.js infinite-growth feedback loop
  • All CSS scoped under #polisci-root — required by inline-embed contract (CSS audit gates this at 0 violations)
  • data-base="./data/" on #polisci-root lets host pages relocate the data dir
  • Senate vote XMLs use LIS member IDs (e.g. S270); Congress.gov uses bioguide — lis_to_bioguide.json crosswalk built by name+state+party match (Congress.gov v3 does not expose LIS reliably)
  • Roster merge: vote-derived roster gets enriched with full_name, district, served_from/to, photo_url, served_partial from members_directory.json
  • ID regex hard limit: ^[A-Z]\d{6}$|^S\d{3,4}$ validated on every URL read AND against manifestById
  • All upstream strings rendered via textContent / createElement — zero innerHTML (security-audited)
  • localStorage namespaced polisci:v119:*; only lastMember persisted in MVP

Pipeline

fetch.py → parse.py → enrich_roster.py → parse.py → pytest → build_members.py → build_app.py
  • fetch.py — idempotent network fetch into data/<C>/{house,senate}/cache/
  • parse.py — XML → votes.jsonl + roster.json; rejects upstream strings containing <, >, control chars; merges members_directory.json if present
  • enrich_roster.py — Congress.gov API → members_directory.json + LIS crosswalk; cached responses in api_cache/; 350ms throttle; sends User-Agent: polisci-pipeline/1.0 (API rejects without one)
  • build_members.py — parallel pool; passes full chamber records to analyze.aggregate (NOT pre-filtered by member) so absences count as N/A rows
  • build_app.py — wipes + recreates results/<C>/; injects manifest into HTML heads; writes README with CSP + iframe snippet
  • build_all.py — one-command wrapper; runs parse.py twice (before and after enrich) so directory data merges in

Data Layout

  • data/<C>/{house,senate}/{cache/, votes.jsonl, roster.json} — raw + parsed per-chamber
  • data/<C>/members/<id>.json — per-member metrics (~80 KB each)
  • data/<C>/manifest.json — picker index w/ KPI k field per member
  • data/<C>/members_directory.json — Congress.gov roster (~551 members)
  • data/<C>/lis_to_bioguide.json — Senate ID crosswalk
  • data/<C>/api_cache/ — cached Congress.gov responses
  • data/<C>/build_report.json — per-build success/failure log
  • results/<C>/ — embeddable artifact (the shipping output)

Frontend Pages

  • app.html — single-member dashboard: sidebar filters (chamber/party/state), typeahead, 8 KPI tiles, 5 charts, sortable vote table
  • compare.html — overlay up to 6 members across 5 comparison charts; per-member color-coded pills
  • ranking.html — rank House or Senate members by any of 14 metrics; row click opens member dashboard in new tab
  • Nav between pages: Member · Compare · Rankings in <header>
  • Member IDs in URL only (?id=, ?ids=, ?c=&m=&o=&p=) — deep-linkable, validated against manifest

Embedding Modes

  • Standalone: open app.html directly (over HTTP)
  • Iframe: sandbox="allow-scripts allow-same-origin" referrerpolicy="no-referrer"
  • Inline: copy <link> + <script> + <div id="polisci-root" data-base="..."> into host page
  • Recommended CSP documented in results/<C>/README.md

Verified Properties

  • KPI parity vs legacy/: 8/8 PASS (script: tests/parity_check.py; report: research/PHASE3_PARITY.md)
  • analyze.py classifier: 22 pytest tests against frozen XML fixtures
  • CSS namespace audit: 0 violations (report: research/PHASE6_CSS_AUDIT.md)
  • HTTP smoke (all assets + manifest + sample member + 404 path): 10/10
  • No external network at runtime; vendor scripts loaded locally
  • build_members.py for 552 members: ~5 s on local CPU

Security

  • API key in .env (gitignored); rotated via Congress.gov sign-up flow; redacted from DOCUMENTATION.md §2
  • .gitignore: .env, __pycache__/, *.pyc, data/*/cache/, data/*/api_cache/
  • Upstream-string validation in parse.py rejects <, >, control chars at XML parse time
  • All DOM mutations via textContent / createElement / replaceChildrengrep innerHTML returns empty across template/ and results/
  • No eval, no Function(), no inline event handlers

Conventions

  • One JS file per page (app.js, compare.js, ranking.js); IIFE wrappers
  • ~200 lines of filter/typeahead logic duplicated between app.js and compare.js — accepted for v1 to avoid coupling
  • Python scripts: CLI via argparse, --congress N (default 119), shebang + executable
  • No comments in code unless WHY is non-obvious (e.g. security validation, LIS fallback rationale)
  • Tests live in tests/ (not the test-engineer sandbox) per project requirement

Known Issues / Deferred (see NOTES.md for full list)

  • Editorial label wording ("Helped Republicans", "Blocked Dem-Backed") — compliance flagged; user kept current copy for MVP, must revisit before publication
  • compare.html "Voted against own party" uses monthly['Helped Neither'] as proxy — analyze.py doesn't emit a true monthly own-party-defection series
  • Visible "How to read this" caveats panel deferred to v1.1 (limitations live in DOCUMENTATION.md §10)
  • localStorage only persists lastMember; filter state lives in URL only
  • Cross-browser manual smoke not run (no headless browser in build env); _iframe_test.html + _embed_test.html available in results/119/
  • 120th-Congress dry run impossible until 120th data exists

Active Tasks

  • (none — Phase 7 closed)