The user has iteratively built a working voting-dashboard pipeline in this session: fetch.py, parse.py, and analyze.py are written and produce a unified schema (1,342 cached XML rollcalls — 553 House + 789 Senate) for the 119th Congress. Eight standalone dashboard HTML files were previously generated (Massie, Khanna, AOC, Omar, MTG, Jordan, Donalds, Graham) and now live in legacy/.
The user now wants:
--congress N CLI arg.analyze.py, re-render all member JSON in seconds; no re-fetch.This plan incorporates feedback from PM + programmer (Consult) + security (Consult) + compliance (Consult) agents. Decisions locked by the user:
NOTES.md (not on dashboards).NOTES.md.DOCUMENTATION.md, move to .env (gitignored) with a rotation note. The key has never been used by the pipeline.polisci/
├── DOCUMENTATION.md # existing — UPDATE §2 (redact key), §8 (file layout), §9 (regeneration)
├── NOTES.md # NEW — captures known concerns we chose not to fix in MVP
├── PROJECT_SCOPE.md # NEW — written by PM agent (only it may edit)
├── .env # NEW — gitignored — holds CONGRESS_GOV_API_KEY
├── .gitignore # NEW — at least: .env, __pycache__/, *.pyc, data/*/cache/
├── fetch.py # existing — unchanged
├── parse.py # existing — minor: validate upstream strings (reject <, >, NUL)
├── analyze.py # existing — unchanged (output shape stable)
├── enrich_roster.py # NEW — Congress.gov API → data/<C>/members_directory.json (complete roster)
├── build_members.py # NEW — write data/<C>/members/<id>.json + manifest.json (parallelized)
├── build_app.py # NEW — copy template/ → results/<C>/; embed manifest version; copy data/
├── build_all.py # NEW — orchestrator: fetch → parse → build_members → build_app
├── tests/ # NEW — pytest unit tests for analyze.py with frozen fixtures
│ ├── fixtures/
│ │ ├── partisan_house.xml
│ │ ├── bipartisan_house.xml
│ │ ├── partisan_senate.xml
│ │ └── failed_blocking_senate.xml
│ └── test_analyze.py
├── template/ # NEW — input templates copied at build time
│ ├── app.html # single-member dashboard shell
│ ├── compare.html # comparison shell
│ ├── app.js # shared frontend logic
│ ├── app.css # shared styles, all selectors namespaced under #polisci-root
│ └── vendor/ # NEW — pinned local copies (no CDN)
│ ├── chart.umd.min.js # Chart.js 4.4.0
│ └── sortable.min.js # SortableJS 1.15.2
├── data/119/
│ ├── house/{cache/, votes.jsonl, roster.json}
│ ├── senate/{cache/, votes.jsonl, roster.json}
│ ├── members/<id>.json # NEW — per-member metrics (~80 KB each)
│ ├── manifest.json # NEW — array of {id,n,p,s,c,district,served_partial} for picker
│ ├── members_directory.json # NEW — complete 119th roster from Congress.gov
│ ├── lis_to_bioguide.json # NEW — Senate ID crosswalk
│ ├── api_cache/ # NEW — cached Congress.gov responses (idempotent)
│ └── build_report.json # NEW — per-build success/failure log
├── results/119/ # output — entire dir is the embeddable artifact
│ ├── app.html
│ ├── compare.html
│ ├── app.js
│ ├── app.css
│ ├── vendor/{chart…, sortable…}
│ ├── data/
│ │ ├── manifest.json
│ │ └── members/<id>.json
│ └── README.md # NEW — embedding instructions + recommended CSP/sandbox snippet
└── legacy/ # existing — archived after Milestone 0 validation passes
app.html — single-member viewapp.html?id=M001184 (no ?c= — Congress is implicit in the deploy path; programmer recommendation)manifest as an array (not object) for sort/filter speed. Filtered live by sidebar.data/members/<id>.json?v={manifest.version} (cache-busted on classifier changes), then mutate existing Chart.js datasets in place and call chart.update('none') — no teardown/rebuild per switch. All 5 charts created once at page init.history.replaceState for filter typing, history.pushState only on member selection. localStorage as Could-Have (per PM), namespaced as polisci:v119:lastMember, validated against manifest on read.compare.html — overlay viewcompare.html?ids=M001184,K000389,O000172 (shareable; cap parse at 6 IDs; validate each against manifest)app.html?id=<id> in a new tab.Three modes, all supported by the same artifact in results/119/:
app.html directly.<iframe src="…/app.html" sandbox="allow-scripts allow-same-origin" referrerpolicy="no-referrer"> (snippet documented in results/119/README.md).<div id="polisci-root" data-base="./data/">…</div> + <link>/<script> tags into a host page. All CSS namespaced under #polisci-root; data-base attribute makes the data path host-configurable.No external requests after page load (Chart.js + SortableJS pinned locally). Recommended host CSP documented in results/119/README.md.
textContent, never innerHTML. Bill links built via createElement('a') + textContent + validated href.parse.py rejects any upstream string containing <, >, or control characters with a build-report warning (gov XML should never legitimately contain these).polisci:v119:*); values regex-validated against the manifest allowlist on read.id matched against ^[A-Z]\d{6}$|^S\d{3}$ (House bioguide or Senate LIS) AND verified to be a manifest key before any fetch or DOM use; ids capped at 6.postMessage API in v1 — frame-boundary attack surface stays closed.python3 fetch.py --congress 119 # idempotent network fetch (clerk.house.gov + senate.gov)
python3 parse.py --congress 119 # XML → votes.jsonl + roster.json
python3 enrich_roster.py --congress 119 # Congress.gov API → complete roster + LIS crosswalk (NEW)
pytest tests/ # gate: classifier behavior frozen
python3 build_members.py --congress 119 # parallel; writes per-member JSON + manifest.json + build_report.json
python3 build_app.py --congress 119 # template + vendor → results/<C>/
# OR one command:
python3 build_all.py --congress 119
build_members.py requirementsvotes.jsonl + roster.json once in the parent process.multiprocessing.Pool(min(8, os.cpu_count())) to run analyze.aggregate() per member.Per-member JSON includes a _meta block (compliance-required reproducibility metadata):
"_meta": {
"schema_version": 1,
"pipeline_version": "1.0.0",
"classifier_hash": "<sha256 of analyze.py>",
"data_snapshot_date": "2026-05-24",
"source_xml_count": {"house": 553, "senate": 789}
}
Atomic writes: write to *.tmp, then os.replace. Per-member failures log to build_report.json and continue, never abort the batch.
manifest.json is shipped as an array of objects (programmer recommendation), with a top-level version field for client-side cache-busting:
{
"version": "<pipeline_version>+<data_snapshot_date>",
"members": [{"id":"M001184","n":"Thomas Massie","p":"R","s":"KY","c":"H"}, …]
}
build_app.py requirementscp: copies template/ into results/<C>/, stamps the manifest version into app.html / compare.html as inline JSON (saves a fetch), copies data/<C>/members/ and data/<C>/manifest.json into results/<C>/data/, and writes the recommended-CSP README.md.analyze.py test suite (compliance-required)Frozen fixture XMLs covering: partisan-line vote, bipartisan vote, tied/Split party position, member absent, Aye/No vs Yea/Nay normalization, failed-measure blocking case (House and Senate). Each test asserts the metrics dict for a known member.
This file captures concerns we considered but chose NOT to address in MVP:
localStorage as MVP feature. Listed as Could-Have; cut if Phase 3 runs long..env with the existing API key; create .gitignore listing .env + __pycache__/ + *.pyc; redact DOCUMENTATION.md §2 to reference <see .env> and add rotation guidance.NOTES.md with the items above.Building a dashboard for every member of the 119th Congress requires a complete roster, not just members who appear in vote XMLs. Members who died, resigned, or were sworn in mid-term but never cast a vote (rare but possible) would otherwise be missing.
Source: Congress.gov API /member/congress/119 (uses the CONGRESS_GOV_API_KEY
now in .env). This is the first justified use of the API key since the
project began; the Clerk XML alone cannot answer "who served in the 119th."
New script: enrich_roster.py
https://api.congress.gov/v3/member/congress/119?currentMember=false&limit=250&format=json&api_key=<key> (≈540 members across ≈3 pages)..env (never from CLI).Writes data/119/members_directory.json keyed by bioguide:
{
"M001184": {
"bioguide": "M001184",
"lis": null, # populated for senators in a second pass
"full_name": "Thomas Massie",
"party": "R",
"state": "KY",
"district": "4",
"chamber": "House",
"served_from": "2012-11-13",
"served_to": null, # null if currently serving
"photo_url": "...", # if available
"source": "congress.gov/v3"
},
...
}
Second pass: for each Senate member, fetch
https://api.congress.gov/v3/member/<bioguide>?api_key=<key> to read the
LIS member ID (needed to join with the Senate vote cache that uses LIS IDs,
not bioguide). Cache results to data/119/lis_to_bioguide.json for the
reverse map.
Merge step in parse.py (modification, not new file): when writing
roster.json, union the vote-derived roster with members_directory.json so
every 119th-Congress member is represented, even if votes is empty for them.
Members with zero votes get a served_partial: true flag so app.js can
auto-render a member note like
"This member did not cast any roll-call votes during the period analyzed
(served {served_from} – {served_to}). Dashboards reflect that absence."
Validation: assert len(roster.json) >= 535 and that every entry has
bioguide + chamber + party + state. Fail the build if not.
Failure mode: if the Congress.gov API is unreachable or rate-limited,
the build falls back to the vote-derived roster with a warning logged to
build_report.json. build_members.py still produces dashboards for every
member found; readers see a banner explaining the roster may be incomplete.
This phase runs after parse.py and before build_members.py. Throttled
fetch takes ≈3 min cold; idempotent (cached in data/119/api_cache/).
build_members.py with parallel pool + atomic writes + _meta block + build_report.json.manifest.json array format with version field.template/app.html / app.js / app.css (structure only, no charts yet).template/vendor/.pytest tests/test_analyze.py passes against frozen fixtures.legacy/, switch all innerHTML of upstream strings to textContent).pushState on selection, replaceState on filter typing).legacy/*.html (extract via grep on KPI placeholders).analyze.py / build_members.py, do NOT proceed.build_members.py --congress 119 for all ~535. Confirm completion in <60 s.compare.html + multi-select pills (reuse sidebar/typeahead from app.js).?ids=…, validated + capped at 6).#polisci-root; test inline-div embed in a throwaway host page.data-base attribute support; iframe embed test.results/119/README.md with recommended CSP + sandbox snippet:
Content-Security-Policy: default-src 'self'; script-src 'self'; style-src 'self';
img-src 'self' data:; connect-src 'self'; frame-ancestors <host-domain>;
base-uri 'none'; form-action 'none'
<iframe src="…/app.html" sandbox="allow-scripts allow-same-origin" referrerpolicy="no-referrer">
Cross-browser smoke test (Chrome, Firefox, Safari).
DOCUMENTATION.md §8 (new file layout), §9 (new regeneration commands), §11 (change-log entries).PROJECT_SCOPE.md reflects shipped reality (have PM agent revise once if needed).legacy/ after Phase 3 gate passes AND user confirms.End-to-end success means:
python3 fetch.py --congress 119 # exits 0; cache already populated; near-zero fetch
python3 parse.py --congress 119 # writes votes.jsonl + roster.json
pytest tests/ # all green
python3 build_members.py --congress 119 # writes 535 JSONs in <60 s; build_report.json shows 0 failures
python3 build_app.py --congress 119 # writes results/119/{app.html, compare.html, data/, vendor/, README.md}
# Open results/119/app.html in a browser:
# - Pick a member from dropdown → dashboard re-renders without page load
# - Filter by Senate / Party R / State KY → typeahead shows Rand Paul, McConnell
# - URL updates on selection, shareable; reload restores state
# - Network panel shows ONLY local requests (vendor + manifest + selected member JSON)
# Open results/119/compare.html:
# - Select Massie + Khanna + AOC → 5 charts overlay with color-coded lines/dots
# - URL updates with ?ids=…; reload restores state
# Open results/119/README.md → recommended CSP + iframe sandbox snippet visible
Run-once additional checks:
grep -RE "innerHTML\s*=" template/ results/ returns no matches against upstream strings.grep -RE "cdn\.|cdnjs|jsdelivr|unpkg" template/ results/ returns nothing.grep "g9axyby" DOCUMENTATION.md returns nothing (key redacted).cat .gitignore | grep -E '^\.env$' returns the line.python3 -c "import json; m=json.load(open('results/119/data/manifest.json')); print(len(m['members']))" reports ≥535.After plan approval, the very first action is to copy this plan to
/home/user/polisci/research/PLAN.md so a fresh context can pick it up.
The user will then clear context. The next session should orchestrate
implementation with multiple programmer agents in parallel wherever
the work is independent. Dependent steps must run sequentially.
Phase 0 (3 parallel programmer agents):
.env + .gitignore; redact API key in DOCUMENTATION.md §2.NOTES.md with the 5 deferred concerns.template/vendor/.Phase 1 (4 parallel programmer agents + 1 sequential gate):
build_members.py (parallel pool, atomic writes, _meta, build_report.json, manifest array+version).tests/fixtures/*.xml + tests/test_analyze.py. Synthesize fixtures from real cached XML.template/app.html + template/app.css (structure only, namespaced under #polisci-root, no chart logic yet).enrich_roster.py (Phase 0.5) + modify parse.py to merge members_directory.json into roster.json.enrich_roster.py, run parse.py, run pytest tests/ and confirm green; run build_members.py and confirm ≥535 JSONs + manifest emitted.Phase 2 (1 programmer agent — sequential, deeply interlinked frontend logic):
template/app.js: manifest loader, sidebar filters, typeahead, member-fetch + in-place Chart.js updates for all 5 charts, sortable/filterable table, URL deep-linking, member-note banner. Splitting risks state-management drift.Phase 3 (1 programmer agent — sequential validation gate):
build_members.py for the 8 legacy member IDs, opens the new dashboards, extracts KPI numbers from legacy/*.html, diffs, reports.Phase 4 (1 programmer agent — sequential, bulk build):
build_members.py --congress 119 + 10-random smoke test.Phase 5 (5 parallel programmer agents — one per overlay chart):
template/compare.html + adds its render function to app.js. Shared scaffolding (multi-select pills, URL state) implemented by one designated agent first, then the 5 chart agents fork off.Phase 6 (3 parallel programmer agents):
build_app.py orchestrator + results/119/README.md with CSP + sandbox snippets.data-base attribute support + iframe embed test + cross-browser smoke (Chrome, Firefox, Safari via headless if available).Phase 7 (1 programmer agent + 1 PM agent):
DOCUMENTATION.md §8, §9, §11.PROJECT_SCOPE.md to reflect shipped reality.Agent tool calls./home/user/polisci/research/PLAN.md) and the specific section/files it owns.enrich_roster.py, build_members.py, build_app.py, build_all.py, NOTES.md, .env, .gitignore, template/app.html, template/compare.html, template/app.js, template/app.css, template/vendor/{chart,sortable}.min.js, tests/test_analyze.py, tests/fixtures/*.xml, results/119/README.md, research/PLAN.md (copy of this plan)DOCUMENTATION.md (§2 redact key, §8 file layout, §9 regen commands, §11 change-log), parse.py (add upstream-string validation + merge with members_directory.json), PROJECT_SCOPE.md (PM agent owns this)legacy/analyze.py (aggregate, classify_vote, _norm_vote, _majority_position, LONE_WOLF_THRESHOLD) — unchanged; consumed by build_members.py in worker processes.parse.py schema (chamber-unified votes.jsonl + roster.json) — consumed unchanged.legacy/*.html rendering logic for KPI cards, the 5 chart types, the sortable/filterable table — port to vanilla JS in template/app.js with innerHTML → textContent substitution for upstream strings.DOCUMENTATION.md §6 (Classification) and §10 (Limitations) — referenced from dashboards; not duplicated.