Methodology.md 6.6 KB

Methodology

How the 119th Congress voting-dashboard metrics are computed, and what they do and don't measure.

1. Data sources

  • House roll calls — Clerk of the House XML, one file per roll-call vote (https://clerk.house.gov/evs/<year>/roll<NNN>.xml). User-facing index: https://clerk.house.gov/Votes
  • Senate roll calls — Senate.gov LIS XML, one file per roll-call vote (https://www.senate.gov/legislative/LIS/roll_call_votes/...). User-facing index: https://www.senate.gov/legislative/votes_new.htm
  • Member roster — Congress.gov v3 API (/member/congress/119) supplements vote-derived rosters so members who served but never cast a recorded vote are still represented.

Coverage for the 119th Congress as of the most recent build: 553 House + 789 Senate roll-call votes; 552 distinct members (449 House + 103 Senate).

2. Vote normalization

Each member's recorded vote on a roll call is one of Yea, Nay, Aye, No, Present, Not Voting. For analysis, Aye/Yea are merged into Yea and No/Nay are merged into Nay — the procedural/substantive distinction is preserved only in the per-vote table column.

For every vote, each party's majority position is computed:

party_position = Yea   if yea > nay
               = Nay   if nay > yea
               = Split otherwise (tie or zero)

3. Alignment classification

Per vote, the member's normalized Yea/Nay is compared to each party's majority position:

Condition Label
Member matches BOTH party majorities Helped Both
Member matches only Republican majority Helped Republicans
Member matches only Democratic majority Helped Democrats
Member matches NEITHER party majority Helped Neither
Member did not cast a Yea/Nay N/A: <state>
  • Helped Both arises on bipartisan votes (post-office namings, suspension calendar items, broadly popular measures).
  • Helped Neither arises when the member is on the losing side relative to both party leaderships — typically a small protest/defector cluster.

4. Blocking analysis

A blocking win is recorded when ALL of the following hold:

  1. The member voted Nay.
  2. The measure failed (result matches fail, reject, not agreed, not passed, not invoked).
  3. The other party's majority was on the opposite side (it was a partisan defeat, not a bipartisan one).
  • blocked = "Democrat" — Democratic majority was Yea, Republican majority was not Yea, member voted Nay → counted as helping sink a Democrat-backed measure.
  • blocked = "Republican" — symmetric.

This is a per-share count, not a marginal causal estimate: the metric credits the member individually for a defeat that may have involved hundreds of other Nay votes.

5. Voted-with / voted-against by party majority

Across votes where each party had a definite majority (not Split), the member's normalized vote is counted as matching (with) or differing (against) that party's majority. Reported as KPI tiles and as a stacked bar chart of raw counts and percentages.

6. Lone-wolf defection

A vote counts as a lone-wolf defection when:

  1. The member's own party (per roster) had a definite majority position.
  2. The member's normalized vote opposed that majority.
  3. The number of fellow same-party defectors was at most the chamber threshold (5 in the House, 3 in the Senate).

Identifies stubborn outliers — members who repeatedly break with their own caucus when very few others do.

7. Monthly trend

Each vote's date is bucketed by YYYY-MM. The four primary alignment classes (Helped Republicans / Helped Democrats / Helped Both / Helped Neither) are summed per month and rendered as a multi-series line chart.

8. Ranking metrics

The Rankings page sorts the chosen chamber by any of:

  • Raw counts: Total Votes, Yeas, Nays, Voted With/Against GOP, Voted With/Against Dem, Lone Wolf Votes
  • Percentages of votes cast: Participation %, Voted With/Against GOP %, Voted With/Against Dem %, Lone Wolf %

Percentage metrics exclude members with zero votes cast (denominator undefined).

9. Comparison view

The comparison view overlays up to six members on five charts: alignment over time, "voted against own party" rate over time (using Helped Neither as a proxy — see Limitations §10), grouped KPI bar, defection scatter (% against GOP vs % against Dem), and vote distribution.

10. Known limitations and caveats

  • Roll-call votes only. Voice votes, unanimous-consent agreements, and motions adopted without a recorded vote are invisible. A member's silence on a controversial measure that passed by voice cannot be detected here.
  • Procedural vs. substantive votes are treated equally. Bill-subject classification is not attempted. Heavy weighting of procedural-calendar votes can inflate "Helped Republicans" / "Helped Democrats" counts vs. the substantive picture.
  • "Helped Both" interpretation. A bipartisan vote that passes overwhelmingly is a real instance of the member helping both sides — but it can visually dilute the more interesting partisan classes.
  • Blocking-wins attribution. Each blocking tally credits the member individually for a defeat that may have involved hundreds of other Nay votes. The metric is a count of partisan defeats the member's Nay vote belonged to, not a marginal causal estimate.
  • Lone-wolf threshold (≤5 House / ≤3 Senate) is a heuristic.
  • "Voted against own party" overlay is a proxy. The comparison-page monthly chart uses Helped Neither per month as an approximation; the pipeline does not currently emit a true monthly own-party-defection series.
  • Mid-term resignations / replacements are not flagged on dashboards beyond the served_partial banner for members who cast zero recorded votes. A low participation count may reflect resignation, illness, or campaigning for higher office — check the underlying date pattern in the votes table.
  • Editorial label wording. Labels such as "Helped Republicans" and "Blocked Dem-Backed" attribute intent that the math does not measure directly. They describe a counting relationship between the member's vote and party majorities, not motive.

11. Reproducibility

Each per-member JSON includes a _meta block with schema_version, pipeline_version, classifier_hash (SHA-256 of analyze.py), data_snapshot_date, and per-chamber source_xml_count. The classifier hash changes if and only if the classification logic changes.