PLAN.md 24 KB

119th Congress Voting Dashboard — Interactive SPA Rewrite (FINAL v1)

Context

The user has iteratively built a working voting-dashboard pipeline in this session: fetch.py, parse.py, and analyze.py are written and produce a unified schema (1,342 cached XML rollcalls — 553 House + 789 Senate) for the 119th Congress. Eight standalone dashboard HTML files were previously generated (Massie, Khanna, AOC, Omar, MTG, Jordan, Donalds, Graham) and now live in legacy/.

The user now wants:

  1. A dashboard for every member of the 119th Congress (~535 House + Senate).
  2. A single interactive web page with a member picker that re-renders without a page reload.
  3. A comparison page for overlaying multiple representatives on shared charts.
  4. Framework-free so it can be embedded into a third-party site (no React/Vue/Next).
  5. Static-file data hosting — per-member JSON served alongside the page.
  6. Member selector: searchable typeahead dropdown PLUS sidebar filters (chamber, party, state).
  7. Structure must generalize to future Congresses via a --congress N CLI arg.
  8. Cheap re-analysis: change analyze.py, re-render all member JSON in seconds; no re-fetch.

This plan incorporates feedback from PM + programmer (Consult) + security (Consult) + compliance (Consult) agents. Decisions locked by the user:

  • Labels: keep current "Helped Republicans / Helped Democrats / Blocked Dem-Backed / Blocked GOP-Backed" wording. Editorial-bias concern flagged by compliance will be captured in NOTES.md (not on dashboards).
  • Comparison page: ship all 5 overlay charts in MVP. Programmer/PM concern about derivative charts will be captured in NOTES.md.
  • Congress.gov API key: redact from DOCUMENTATION.md, move to .env (gitignored) with a rotation note. The key has never been used by the pipeline.

Target File Layout

polisci/
├── DOCUMENTATION.md            # existing — UPDATE §2 (redact key), §8 (file layout), §9 (regeneration)
├── NOTES.md                    # NEW — captures known concerns we chose not to fix in MVP
├── PROJECT_SCOPE.md            # NEW — written by PM agent (only it may edit)
├── .env                        # NEW — gitignored — holds CONGRESS_GOV_API_KEY
├── .gitignore                  # NEW — at least: .env, __pycache__/, *.pyc, data/*/cache/
├── fetch.py                    # existing — unchanged
├── parse.py                    # existing — minor: validate upstream strings (reject <, >, NUL)
├── analyze.py                  # existing — unchanged (output shape stable)
├── enrich_roster.py            # NEW — Congress.gov API → data/<C>/members_directory.json (complete roster)
├── build_members.py            # NEW — write data/<C>/members/<id>.json + manifest.json (parallelized)
├── build_app.py                # NEW — copy template/ → results/<C>/; embed manifest version; copy data/
├── build_all.py                # NEW — orchestrator: fetch → parse → build_members → build_app
├── tests/                      # NEW — pytest unit tests for analyze.py with frozen fixtures
│   ├── fixtures/
│   │   ├── partisan_house.xml
│   │   ├── bipartisan_house.xml
│   │   ├── partisan_senate.xml
│   │   └── failed_blocking_senate.xml
│   └── test_analyze.py
├── template/                   # NEW — input templates copied at build time
│   ├── app.html                # single-member dashboard shell
│   ├── compare.html            # comparison shell
│   ├── app.js                  # shared frontend logic
│   ├── app.css                 # shared styles, all selectors namespaced under #polisci-root
│   └── vendor/                 # NEW — pinned local copies (no CDN)
│       ├── chart.umd.min.js    # Chart.js 4.4.0
│       └── sortable.min.js     # SortableJS 1.15.2
├── data/119/
│   ├── house/{cache/, votes.jsonl, roster.json}
│   ├── senate/{cache/, votes.jsonl, roster.json}
│   ├── members/<id>.json       # NEW — per-member metrics (~80 KB each)
│   ├── manifest.json           # NEW — array of {id,n,p,s,c,district,served_partial} for picker
│   ├── members_directory.json  # NEW — complete 119th roster from Congress.gov
│   ├── lis_to_bioguide.json    # NEW — Senate ID crosswalk
│   ├── api_cache/              # NEW — cached Congress.gov responses (idempotent)
│   └── build_report.json       # NEW — per-build success/failure log
├── results/119/                # output — entire dir is the embeddable artifact
│   ├── app.html
│   ├── compare.html
│   ├── app.js
│   ├── app.css
│   ├── vendor/{chart…, sortable…}
│   ├── data/
│   │   ├── manifest.json
│   │   └── members/<id>.json
│   └── README.md               # NEW — embedding instructions + recommended CSP/sandbox snippet
└── legacy/                     # existing — archived after Milestone 0 validation passes

Frontend Architecture

app.html — single-member view

  • URL: app.html?id=M001184 (no ?c= — Congress is implicit in the deploy path; programmer recommendation)
  • Sidebar (collapsible on mobile): Chamber checkboxes (House / Senate), Party checkboxes (R / D / I), State multi-select populated from manifest.
  • Searchable typeahead: case-insensitive substring + initials match ("AOC" → Ocasio-Cortez). Iterates manifest as an array (not object) for sort/filter speed. Filtered live by sidebar.
  • On selection: fetch data/members/<id>.json?v={manifest.version} (cache-busted on classifier changes), then mutate existing Chart.js datasets in place and call chart.update('none') — no teardown/rebuild per switch. All 5 charts created once at page init.
  • State persistence: URL via history.replaceState for filter typing, history.pushState only on member selection. localStorage as Could-Have (per PM), namespaced as polisci:v119:lastMember, validated against manifest on read.

compare.html — overlay view

  • URL: compare.html?ids=M001184,K000389,O000172 (shareable; cap parse at 6 IDs; validate each against manifest)
  • Same sidebar + typeahead. Multi-select pills (color-coded by member-assigned color).
  • All 5 overlay charts per user decision:
    1. Alignment-over-time (line, one per member; switcher for which alignment class)
    2. Voted-against-own-party rate over time (line)
    3. Side-by-side KPI grouped bar (% against GOP, % against Dem, Lone Wolf %, Participation %, Blocked counts)
    4. Defection scatter (X: % against GOP, Y: % against Dem; one dot per member, party color)
    5. Vote-distribution grouped bar (Yea/Nay/Present/Not Voting per member)
  • Pill click opens member's app.html?id=<id> in a new tab.

Embedding contract

Three modes, all supported by the same artifact in results/119/:

  1. Standalone: open app.html directly.
  2. Iframe: <iframe src="…/app.html" sandbox="allow-scripts allow-same-origin" referrerpolicy="no-referrer"> (snippet documented in results/119/README.md).
  3. Inline: copy <div id="polisci-root" data-base="./data/">…</div> + <link>/<script> tags into a host page. All CSS namespaced under #polisci-root; data-base attribute makes the data path host-configurable.

No external requests after page load (Chart.js + SortableJS pinned locally). Recommended host CSP documented in results/119/README.md.

Security posture

  • All upstream strings rendered via textContent, never innerHTML. Bill links built via createElement('a') + textContent + validated href.
  • parse.py rejects any upstream string containing <, >, or control characters with a build-report warning (gov XML should never legitimately contain these).
  • localStorage: namespaced (polisci:v119:*); values regex-validated against the manifest allowlist on read.
  • Query strings: id matched against ^[A-Z]\d{6}$|^S\d{3}$ (House bioguide or Senate LIS) AND verified to be a manifest key before any fetch or DOM use; ids capped at 6.
  • No postMessage API in v1 — frame-boundary attack surface stays closed.
  • No CDN, no SRI question — vendored locally.

Build Pipeline

python3 fetch.py          --congress 119                   # idempotent network fetch (clerk.house.gov + senate.gov)
python3 parse.py          --congress 119                   # XML → votes.jsonl + roster.json
python3 enrich_roster.py  --congress 119                   # Congress.gov API → complete roster + LIS crosswalk (NEW)
pytest tests/                                              # gate: classifier behavior frozen
python3 build_members.py  --congress 119                   # parallel; writes per-member JSON + manifest.json + build_report.json
python3 build_app.py      --congress 119                   # template + vendor → results/<C>/
# OR one command:
python3 build_all.py      --congress 119

build_members.py requirements

  • Loads votes.jsonl + roster.json once in the parent process.
  • Uses multiprocessing.Pool(min(8, os.cpu_count())) to run analyze.aggregate() per member.
  • Per-member JSON includes a _meta block (compliance-required reproducibility metadata):

    "_meta": {
    "schema_version": 1,
    "pipeline_version": "1.0.0",
    "classifier_hash": "<sha256 of analyze.py>",
    "data_snapshot_date": "2026-05-24",
    "source_xml_count": {"house": 553, "senate": 789}
    }
    
  • Atomic writes: write to *.tmp, then os.replace. Per-member failures log to build_report.json and continue, never abort the batch.

  • manifest.json is shipped as an array of objects (programmer recommendation), with a top-level version field for client-side cache-busting:

    {
    "version": "<pipeline_version>+<data_snapshot_date>",
    "members": [{"id":"M001184","n":"Thomas Massie","p":"R","s":"KY","c":"H"}, …]
    }
    

build_app.py requirements

  • More than a cp: copies template/ into results/<C>/, stamps the manifest version into app.html / compare.html as inline JSON (saves a fetch), copies data/<C>/members/ and data/<C>/manifest.json into results/<C>/data/, and writes the recommended-CSP README.md.

analyze.py test suite (compliance-required)

Frozen fixture XMLs covering: partisan-line vote, bipartisan vote, tied/Split party position, member absent, Aye/No vs Yea/Nay normalization, failed-measure blocking case (House and Senate). Each test asserts the metrics dict for a known member.


NOTES.md (NEW)

This file captures concerns we considered but chose NOT to address in MVP:

  1. Editorial label wording (compliance Finding 1, High). Current labels ("Helped Republicans", "Blocked Dem-Backed") attribute intent/agency the math doesn't measure. Neutral alternatives proposed ("Aligned with R majority", "Voted Nay on failed D-backed measures"). Decision: keep current wording per user preference; revisit before any third-party publication.
  2. compare.html chart scope (PM + programmer, Medium). Voted-against-own-party overlay and vote-distribution comparison are derivative of single-member data. Decision: ship all 5 charts per user preference; trim if user-research shows redundancy.
  3. Visible caveats panel (compliance Finding 2). §10 of DOCUMENTATION.md lists material limitations (procedural vs substantive votes treated equally, blocking is per-share-not-marginal, lone-wolf ≤5 is heuristic, voice votes invisible). Recommend adding a collapsible "How to read this" panel on dashboards. Deferred — surface in v1.1.
  4. localStorage as MVP feature. Listed as Could-Have; cut if Phase 3 runs long.
  5. 120th Congress dry-run. No live data exists yet; pipeline generalization tested by code review only until then.

Implementation Phases & Verification

Phase 0 — Reset (immediate)

  • Mark superseded TaskCreate items 14, 15, 16 as deleted (they targeted the pre-pivot plan).
  • Create .env with the existing API key; create .gitignore listing .env + __pycache__/ + *.pyc; redact DOCUMENTATION.md §2 to reference <see .env> and add rotation guidance.
  • Create NOTES.md with the items above.

Phase 0.5 — Complete-roster enrichment (NEW)

Building a dashboard for every member of the 119th Congress requires a complete roster, not just members who appear in vote XMLs. Members who died, resigned, or were sworn in mid-term but never cast a vote (rare but possible) would otherwise be missing.

Source: Congress.gov API /member/congress/119 (uses the CONGRESS_GOV_API_KEY now in .env). This is the first justified use of the API key since the project began; the Clerk XML alone cannot answer "who served in the 119th."

New script: enrich_roster.py

  • Paginates https://api.congress.gov/v3/member/congress/119?currentMember=false&limit=250&format=json&api_key=<key> (≈540 members across ≈3 pages).
  • Throttled 350 ms between requests; reads key from .env (never from CLI).
  • Writes data/119/members_directory.json keyed by bioguide:

    {
    "M001184": {
      "bioguide": "M001184",
      "lis": null,                       # populated for senators in a second pass
      "full_name": "Thomas Massie",
      "party": "R",
      "state": "KY",
      "district": "4",
      "chamber": "House",
      "served_from": "2012-11-13",
      "served_to": null,                 # null if currently serving
      "photo_url": "...",                # if available
      "source": "congress.gov/v3"
    },
    ...
    }
    
  • Second pass: for each Senate member, fetch https://api.congress.gov/v3/member/<bioguide>?api_key=<key> to read the LIS member ID (needed to join with the Senate vote cache that uses LIS IDs, not bioguide). Cache results to data/119/lis_to_bioguide.json for the reverse map.

Merge step in parse.py (modification, not new file): when writing roster.json, union the vote-derived roster with members_directory.json so every 119th-Congress member is represented, even if votes is empty for them. Members with zero votes get a served_partial: true flag so app.js can auto-render a member note like "This member did not cast any roll-call votes during the period analyzed (served {served_from} – {served_to}). Dashboards reflect that absence."

Validation: assert len(roster.json) >= 535 and that every entry has bioguide + chamber + party + state. Fail the build if not.

Failure mode: if the Congress.gov API is unreachable or rate-limited, the build falls back to the vote-derived roster with a warning logged to build_report.json. build_members.py still produces dashboards for every member found; readers see a banner explaining the roster may be incomplete.

This phase runs after parse.py and before build_members.py. Throttled fetch takes ≈3 min cold; idempotent (cached in data/119/api_cache/).

Phase 1 — Foundation (PM Phase 1 = MVP enabler)

  • build_members.py with parallel pool + atomic writes + _meta block + build_report.json.
  • manifest.json array format with version field.
  • Skeleton template/app.html / app.js / app.css (structure only, no charts yet).
  • Vendor Chart.js 4.4.0 + SortableJS 1.15.2 into template/vendor/.
  • pytest tests/test_analyze.py passes against frozen fixtures.

Phase 2 — Single-member view

  • Sidebar filters + typeahead (vanilla JS, ~300 LOC).
  • Member JSON fetch + in-place Chart.js dataset updates for all 5 charts (vote distribution, alignment doughnut, blocking bars, alignment-over-time line, with/against stacked).
  • Sortable/filterable vote table (port logic from legacy/, switch all innerHTML of upstream strings to textContent).
  • URL deep-linking (pushState on selection, replaceState on filter typing).
  • Member-note banner support (MTG case still needs to surface).

Phase 3 — Milestone 0 validation gate (HARD GATE)

  • Regenerate the 8 legacy members through the new pipeline.
  • Diff KPI numbers against legacy/*.html (extract via grep on KPI placeholders).
  • Confirm member-note banner shows for MTG, deep-links work, vendor scripts load locally with no CDN traffic.
  • If parity fails: fix analyze.py / build_members.py, do NOT proceed.

Phase 4 — Full member build

  • build_members.py --congress 119 for all ~535. Confirm completion in <60 s.
  • Smoke-test 10 randomly chosen members across both chambers + all parties.

Phase 5 — Comparison view

  • compare.html + multi-select pills (reuse sidebar/typeahead from app.js).
  • All 5 overlay charts implemented.
  • Shareable URL (?ids=…, validated + capped at 6).

Phase 6 — Embedding, security hardening, polish

  • Namespace audit of all CSS under #polisci-root; test inline-div embed in a throwaway host page.
  • data-base attribute support; iframe embed test.
  • results/119/README.md with recommended CSP + sandbox snippet:

    Content-Security-Policy: default-src 'self'; script-src 'self'; style-src 'self';
    img-src 'self' data:; connect-src 'self'; frame-ancestors <host-domain>;
    base-uri 'none'; form-action 'none'
    <iframe src="…/app.html" sandbox="allow-scripts allow-same-origin" referrerpolicy="no-referrer">
    
  • Cross-browser smoke test (Chrome, Firefox, Safari).

Phase 7 — Documentation & close-out

  • Update DOCUMENTATION.md §8 (new file layout), §9 (new regeneration commands), §11 (change-log entries).
  • Verify PROJECT_SCOPE.md reflects shipped reality (have PM agent revise once if needed).
  • Delete legacy/ after Phase 3 gate passes AND user confirms.

Verification

End-to-end success means:

python3 fetch.py --congress 119     # exits 0; cache already populated; near-zero fetch
python3 parse.py --congress 119     # writes votes.jsonl + roster.json
pytest tests/                       # all green
python3 build_members.py --congress 119  # writes 535 JSONs in <60 s; build_report.json shows 0 failures
python3 build_app.py --congress 119      # writes results/119/{app.html, compare.html, data/, vendor/, README.md}
# Open results/119/app.html in a browser:
#   - Pick a member from dropdown → dashboard re-renders without page load
#   - Filter by Senate / Party R / State KY → typeahead shows Rand Paul, McConnell
#   - URL updates on selection, shareable; reload restores state
#   - Network panel shows ONLY local requests (vendor + manifest + selected member JSON)
# Open results/119/compare.html:
#   - Select Massie + Khanna + AOC → 5 charts overlay with color-coded lines/dots
#   - URL updates with ?ids=…; reload restores state
# Open results/119/README.md → recommended CSP + iframe sandbox snippet visible

Run-once additional checks:

  • grep -RE "innerHTML\s*=" template/ results/ returns no matches against upstream strings.
  • grep -RE "cdn\.|cdnjs|jsdelivr|unpkg" template/ results/ returns nothing.
  • grep "g9axyby" DOCUMENTATION.md returns nothing (key redacted).
  • cat .gitignore | grep -E '^\.env$' returns the line.
  • python3 -c "import json; m=json.load(open('results/119/data/manifest.json')); print(len(m['members']))" reports ≥535.

Execution Strategy (parallel programmer agents)

After plan approval, the very first action is to copy this plan to /home/user/polisci/research/PLAN.md so a fresh context can pick it up. The user will then clear context. The next session should orchestrate implementation with multiple programmer agents in parallel wherever the work is independent. Dependent steps must run sequentially.

Parallelization map

Phase 0 (3 parallel programmer agents):

  • Agent A: create .env + .gitignore; redact API key in DOCUMENTATION.md §2.
  • Agent B: write NOTES.md with the 5 deferred concerns.
  • Agent C: download Chart.js 4.4.0 + SortableJS 1.15.2 into template/vendor/.

Phase 1 (4 parallel programmer agents + 1 sequential gate):

  • Agent A: write build_members.py (parallel pool, atomic writes, _meta, build_report.json, manifest array+version).
  • Agent B: write tests/fixtures/*.xml + tests/test_analyze.py. Synthesize fixtures from real cached XML.
  • Agent C: write skeleton template/app.html + template/app.css (structure only, namespaced under #polisci-root, no chart logic yet).
  • Agent D: write enrich_roster.py (Phase 0.5) + modify parse.py to merge members_directory.json into roster.json.
  • Sequential gate after all four: run enrich_roster.py, run parse.py, run pytest tests/ and confirm green; run build_members.py and confirm ≥535 JSONs + manifest emitted.

Phase 2 (1 programmer agent — sequential, deeply interlinked frontend logic):

  • Single agent writes all of template/app.js: manifest loader, sidebar filters, typeahead, member-fetch + in-place Chart.js updates for all 5 charts, sortable/filterable table, URL deep-linking, member-note banner. Splitting risks state-management drift.

Phase 3 (1 programmer agent — sequential validation gate):

  • Agent runs build_members.py for the 8 legacy member IDs, opens the new dashboards, extracts KPI numbers from legacy/*.html, diffs, reports.

Phase 4 (1 programmer agent — sequential, bulk build):

  • Full build_members.py --congress 119 + 10-random smoke test.

Phase 5 (5 parallel programmer agents — one per overlay chart):

  • Each agent implements one of the 5 comparison charts in template/compare.html + adds its render function to app.js. Shared scaffolding (multi-select pills, URL state) implemented by one designated agent first, then the 5 chart agents fork off.

Phase 6 (3 parallel programmer agents):

  • Agent A: build_app.py orchestrator + results/119/README.md with CSP + sandbox snippets.
  • Agent B: CSS namespacing audit + inline-div embed test in a throwaway host page.
  • Agent C: data-base attribute support + iframe embed test + cross-browser smoke (Chrome, Firefox, Safari via headless if available).

Phase 7 (1 programmer agent + 1 PM agent):

  • Programmer: update DOCUMENTATION.md §8, §9, §11.
  • PM (sequential, after programmer): revise PROJECT_SCOPE.md to reflect shipped reality.

Coordination rules

  • Each parallel batch is a single message with multiple Agent tool calls.
  • Each agent prompt MUST cite the plan path (/home/user/polisci/research/PLAN.md) and the specific section/files it owns.
  • Trust-but-verify: after each parallel batch completes, read the actual diffs (not just the agent summaries) before launching the next phase.
  • If a parallel agent's work depends on another's output, escalate to sequential rather than guess.

Critical Files (to be created or modified)

  • CREATE: enrich_roster.py, build_members.py, build_app.py, build_all.py, NOTES.md, .env, .gitignore, template/app.html, template/compare.html, template/app.js, template/app.css, template/vendor/{chart,sortable}.min.js, tests/test_analyze.py, tests/fixtures/*.xml, results/119/README.md, research/PLAN.md (copy of this plan)
  • MODIFY: DOCUMENTATION.md (§2 redact key, §8 file layout, §9 regen commands, §11 change-log), parse.py (add upstream-string validation + merge with members_directory.json), PROJECT_SCOPE.md (PM agent owns this)
  • DELETE (after Phase 3 gate): legacy/

Reused Existing Code

  • analyze.py (aggregate, classify_vote, _norm_vote, _majority_position, LONE_WOLF_THRESHOLD) — unchanged; consumed by build_members.py in worker processes.
  • parse.py schema (chamber-unified votes.jsonl + roster.json) — consumed unchanged.
  • legacy/*.html rendering logic for KPI cards, the 5 chart types, the sortable/filterable table — port to vanilla JS in template/app.js with innerHTMLtextContent substitution for upstream strings.
  • DOCUMENTATION.md §6 (Classification) and §10 (Limitations) — referenced from dashboards; not duplicated.