Explorar el Código

Detect member-elect / replaced / died-in-office cases with structural banners

Three classes of edge-case members were either silently mislabeled or had
no explanatory note, even when the data was clearly anomalous:

1. Member-elect who declined the seat (Matt Gaetz / FL-1): appeared in the
   119th roll-call data once on the opening-day quorum call and never
   again, with no full name, no served dates, and no banner — total=553
   denominator made it look like 0.0% participation absenteeism.

2. Members who left mid-term (Waltz, Sherrill, MTG, Green): KPIs reflected
   only their partial service but the dashboard didn't say so or link to
   their successor.

3. Members who died in office (Grijalva R., Turner, Connolly): same problem
   plus no death year shown.

Fixes:

- enrich_roster.py: rescue pass — for every vote-derived House bioguide
  missing from the bulk /v3/member/congress/<C> response, fetch the
  individual /v3/member/<bg> endpoint (carries full name, terms, death
  year even for never-seated members).

- enrich_roster.py: replacement-linking pass — pair predecessor↔successor
  by (state, district) within the 119th window using each term's
  per-Congress data; emit replaces/replaced_by bioguide refs.

- enrich_roster.py: detail-enrichment pass — for every member on a
  replacement chain, fetch /v3/member/<bg> for accurate per-Congress
  startYear/endYear/district/deathYear (the bulk listing only carries
  chamber + startYear; insufficient for "served 2025–2025" copy).

- parse.py: propagate congress_term, death_year, current_member, replaces,
  replaced_by into the per-chamber roster merge.

- build_members.py: emit those fields in per-member JSON; flag voting==0
  non-delegate non-deceased members as "unseated" (un=true) in manifest.

- template/app.js: branch the banner by status with priority delegate >
  unseated > died > replaced_by > replaces > partial > noVotes. Predecessor
  and successor are rendered as in-app links via manifestById lookup.

8 replacement pairs now linked (FL-1 Gaetz→Patronis, FL-6 Waltz→Fine,
AZ-7 Grijalva R.→Grijalva A., TX-18 Turner→Menefee, VA Connolly→Walkinshaw,
GA Greene→Fuller, TN Green→Van Epps, NJ Sherrill→Mejia) and Gaetz now has
his full name + member-elect-declined banner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Max hace 1 mes
padre
commit
5697d8e9bc
Se han modificado 7 ficheros con 719 adiciones y 90 borrados
  1. 26 0
      build_members.py
  2. 133 0
      data/119/house/roster.json
  3. 0 0
      data/119/manifest.json
  4. 331 64
      data/119/members_directory.json
  5. 129 15
      enrich_roster.py
  6. 5 0
      parse.py
  7. 95 11
      template/app.js

+ 26 - 0
build_members.py

@@ -54,6 +54,11 @@ def _worker(task):
             "served_to": m.get("served_to"),
             "served_partial": bool(m.get("served_partial", False)),
             "is_delegate": is_delegate,
+            "congress_term": m.get("congress_term"),
+            "death_year": m.get("death_year"),
+            "current_member": m.get("current_member"),
+            "replaces": m.get("replaces"),
+            "replaced_by": m.get("replaced_by"),
             "metrics": metrics,
             "_meta": _WORKER_META,
         }
@@ -163,6 +168,27 @@ def main(argv=None):
                 entry["sp"] = True
             if result.get("is_delegate"):
                 entry["dl"] = True
+            if result.get("replaces"):
+                entry["rs"] = result["replaces"]
+            if result.get("replaced_by"):
+                entry["rb"] = result["replaced_by"]
+            term = result.get("congress_term") or {}
+            if term.get("startYear"):
+                entry["sy"] = term["startYear"]
+            if term.get("endYear"):
+                entry["ey"] = term["endYear"]
+            if result.get("death_year"):
+                entry["dy"] = result["death_year"]
+            # Mark "unseated" — member appears in vote data (in manifest) but
+            # never actually cast a Yea/Nay; not a delegate, did not die.
+            # Likely a member-elect who resigned or declined the seat before
+            # serving (e.g. Gaetz, 119th).
+            mx_voting = (result.get("metrics") or {}).get("voting") or 0
+            if (mx_voting == 0
+                    and not result.get("is_delegate")
+                    and not result.get("served_partial")
+                    and not result.get("death_year")):
+                entry["un"] = True
             mx = result.get("metrics") or {}
             entry["k"] = {
                 "total": mx.get("total", 0),

La diferencia del archivo ha sido suprimido porque es demasiado grande
+ 133 - 0
data/119/house/roster.json


La diferencia del archivo ha sido suprimido porque es demasiado grande
+ 0 - 0
data/119/manifest.json


La diferencia del archivo ha sido suprimido porque es demasiado grande
+ 331 - 64
data/119/members_directory.json


+ 129 - 15
enrich_roster.py

@@ -109,25 +109,28 @@ def _state_code(member):
 
 
 def _latest_chamber(member):
-    terms = (member.get("terms") or {}).get("item") or member.get("terms") or []
-    if isinstance(terms, dict):
-        terms = terms.get("item") or []
+    terms = _terms_list(member)
     if not terms:
         return ""
-    # Sort by startYear if available
     def sk(t): return t.get("startYear") or 0
     last = sorted(terms, key=sk)[-1]
     return (last.get("chamber") or "").strip()
 
 
-def _served_dates(member):
-    terms = (member.get("terms") or {}).get("item") or member.get("terms") or []
+def _terms_list(member):
+    terms = member.get("terms")
     if isinstance(terms, dict):
-        terms = terms.get("item") or []
+        return terms.get("item") or []
+    if isinstance(terms, list):
+        return terms
+    return []
+
+
+def _served_dates(member):
+    terms = _terms_list(member)
     if not terms:
         return None, None
-    starts = []
-    ends = []
+    starts, ends = [], []
     for t in terms:
         sy = t.get("startYear")
         ey = t.get("endYear")
@@ -140,6 +143,19 @@ def _served_dates(member):
     return served_from, served_to
 
 
+def _congress_term(member, congress):
+    """Find the term for the target Congress; returns dict or None."""
+    for t in _terms_list(member):
+        if t.get("congress") == congress:
+            return {
+                "startYear": t.get("startYear"),
+                "endYear": t.get("endYear"),
+                "district": str(t["district"]) if t.get("district") is not None else None,
+                "chamber": t.get("chamber"),
+            }
+    return None
+
+
 def _scan_for_lis(obj):
     """Recursively scan obj for any key matching LIS pattern; return string value or None."""
     if isinstance(obj, dict):
@@ -161,7 +177,7 @@ def _scan_for_lis(obj):
     return None
 
 
-def _normalize_member(m):
+def _normalize_member(m, congress=None):
     bioguide = (m.get("bioguideId") or "").strip()
     if not bioguide:
         return None
@@ -175,22 +191,30 @@ def _normalize_member(m):
         first = m.get("firstName") or ""
         last = m.get("lastName") or ""
         name = (first + " " + last).strip()
-    # If name is "Last, First" prefer invertedOrderName? Use as-is otherwise.
     if "," in name and not m.get("directOrderName"):
         parts = [p.strip() for p in name.split(",", 1)]
         if len(parts) == 2:
             name = parts[1] + " " + parts[0]
     photo = ((m.get("depiction") or {}).get("imageUrl")) or None
+    # Per-Congress term — most accurate source of district, start/end year for
+    # this Congress (matters for mid-term resignations and special-election entrants).
+    term = _congress_term(m, congress) if congress is not None else None
+    if term and term.get("district") is not None:
+        district = term["district"]
+    term_chamber = (term or {}).get("chamber") or chamber
     return {
         "bioguide": bioguide,
         "lis": None,
         "full_name": name,
         "party": _party_letter(m),
         "state": _state_code(m),
-        "district": district if chamber.lower() == "house" else None,
-        "chamber": chamber,
+        "district": district if (term_chamber or "").lower().startswith("house") else None,
+        "chamber": term_chamber,
         "served_from": served_from,
         "served_to": served_to,
+        "congress_term": term,
+        "death_year": m.get("deathYear"),
+        "current_member": m.get("currentMember"),
         "photo_url": photo,
         "source": "congress.gov/v3",
     }
@@ -232,7 +256,7 @@ def main():
                   f"0 senators with LIS resolved; {len(warnings)} warnings")
             return 0
         for m in data.get("members") or []:
-            norm = _normalize_member(m)
+            norm = _normalize_member(m, args.congress)
             if norm:
                 directory[norm["bioguide"]] = norm
         nxt = ((data.get("pagination") or {}).get("next")) or None
@@ -292,13 +316,103 @@ def main():
                 resolved += 1
 
     out_dir.mkdir(parents=True, exist_ok=True)
+    # Fallback: individual lookups for House bioguide IDs that appear in vote
+    # data but are missing from the per-Congress directory. Catches people who
+    # were members-elect (appear in opening-day quorum XML) but never seated,
+    # e.g. Matt Gaetz in the 119th.
+    house_roster_path = out_dir / "house" / "roster.json"
+    rescued = 0
+    if house_roster_path.exists():
+        house_roster = json.loads(house_roster_path.read_text())
+        missing = [bg for bg in house_roster
+                   if re.match(r"^[A-Z]\d{6}$", bg) and bg not in directory]
+        if missing:
+            print(f"enrich_roster: rescuing {len(missing)} House bioguide(s) missing from bulk directory",
+                  file=sys.stderr)
+            for bg in missing:
+                url = f"{API_BASE}/member/{bg}?format=json&api_key={api_key}"
+                data = _fetch(url, cache_dir, warnings, label=f"member/{bg}")
+                if data is None:
+                    continue
+                member = (data.get("member") or {})
+                norm = _normalize_member(member, args.congress)
+                if norm:
+                    directory[bg] = norm
+                    rescued += 1
+
+    # Replacement-linking pass — pair predecessor↔successor by (state, district)
+    # within the target Congress. Heuristic: any House seat with >1 member whose
+    # 119th term touches the Congress window. Sort by startYear (and then by
+    # served_to is-null) to determine order.
+    seats = {}
+    for bg, e in directory.items():
+        if not (e.get("chamber") or "").lower().startswith("house"):
+            continue
+        term = e.get("congress_term") or {}
+        if term.get("congress") and term["congress"] != args.congress:
+            continue  # shouldn't happen, but safe
+        state = e.get("state")
+        district = (term.get("district") if term else None) or e.get("district")
+        if not state or district is None:
+            continue
+        seats.setdefault((state, str(district)), []).append(bg)
+    pairs = 0
+    for key, bgs in seats.items():
+        if len(bgs) < 2:
+            continue
+        def sortkey(bg):
+            e = directory[bg]
+            term = e.get("congress_term") or {}
+            start = term.get("startYear") or 9999
+            # served_to None => still serving => sort last
+            ended = e.get("served_to") is not None
+            return (start, 0 if ended else 1)
+        ordered = sorted(bgs, key=sortkey)
+        for i in range(len(ordered) - 1):
+            pred, succ = ordered[i], ordered[i + 1]
+            directory[pred]["replaced_by"] = succ
+            directory[succ]["replaces"] = pred
+            pairs += 1
+    if pairs:
+        print(f"enrich_roster: linked {pairs} House predecessor↔successor pair(s)",
+              file=sys.stderr)
+
+    # Per-Congress term + death_year live on the individual /member/{bg} response
+    # (the bulk listing only carries chamber + startYear). For accurate banner
+    # copy on replacement chains, fetch the individual record for every member
+    # who is on either side of a replacement pair. Cached, so re-runs are free.
+    enrich_targets = set()
+    for bg, e in directory.items():
+        if e.get("replaces") or e.get("replaced_by"):
+            enrich_targets.add(bg)
+    if enrich_targets:
+        print(f"enrich_roster: fetching detail for {len(enrich_targets)} replacement-chain members",
+              file=sys.stderr)
+        for bg in sorted(enrich_targets):
+            url = f"{API_BASE}/member/{bg}?format=json&api_key={api_key}"
+            data = _fetch(url, cache_dir, warnings, label=f"member-detail/{bg}")
+            if data is None:
+                continue
+            member = (data.get("member") or {})
+            term = _congress_term(member, args.congress)
+            if term:
+                directory[bg]["congress_term"] = term
+                # If individual endpoint reports a per-Congress district, prefer it.
+                if term.get("district") is not None:
+                    directory[bg]["district"] = term["district"]
+            if member.get("deathYear") is not None:
+                directory[bg]["death_year"] = member.get("deathYear")
+            if member.get("currentMember") is not None:
+                directory[bg]["current_member"] = member.get("currentMember")
+
     (out_dir / "members_directory.json").write_text(
         json.dumps(directory, indent=2, sort_keys=True))
     (out_dir / "lis_to_bioguide.json").write_text(
         json.dumps(lis_map, indent=2, sort_keys=True))
 
     print(f"enrich_roster: {len(directory)} members directory written; "
-          f"{resolved} senators with LIS resolved; {len(warnings)} warnings")
+          f"{resolved} senators with LIS resolved; {rescued} House rescues; "
+          f"{pairs} replacements linked; {len(warnings)} warnings")
     for w in warnings[:10]:
         print(f"  warn: {w}", file=sys.stderr)
     return 0

+ 5 - 0
parse.py

@@ -259,6 +259,11 @@ def parse_chamber(congress, chamber):
                 "photo_url": entry.get("photo_url"),
                 "bioguide": bioguide,
                 "lis": entry.get("lis"),
+                "congress_term": entry.get("congress_term"),
+                "death_year": entry.get("death_year"),
+                "current_member": entry.get("current_member"),
+                "replaced_by": entry.get("replaced_by"),
+                "replaces": entry.get("replaces"),
             }
             # Overwrite vote-derived state with directory state — vote XMLs
             # report "XX" for territorial delegates (AS/DC/GU/MP/PR/VI).

+ 95 - 11
template/app.js

@@ -300,24 +300,108 @@
     MP: 'the Northern Mariana Islands', PR: 'Puerto Rico', VI: 'the U.S. Virgin Islands'
   };
 
+  function memberLink(bg) {
+    // Returns a span (or anchor if the bg is in the manifest) — caller appends to a node.
+    var entry = state.membersById[bg];
+    if (!entry) {
+      var span = document.createElement('span');
+      span.textContent = bg;
+      return span;
+    }
+    var a = document.createElement('a');
+    a.textContent = entry.n + ' (' + entry.p + '-' + entry.s + ')';
+    a.href = 'app.html?id=' + encodeURIComponent(bg);
+    return a;
+  }
+
+  function appendText(node, text) {
+    node.appendChild(document.createTextNode(text));
+  }
+
   function renderNote(m) {
-    var partial = m.served_partial === true;
-    var noVotes = !m.metrics || m.metrics.total === 0;
+    els.note.replaceChildren();
+    els.note.classList.add('is-hidden');
+
+    var voting = (m.metrics && m.metrics.voting) || 0;
+    var total = (m.metrics && m.metrics.total) || 0;
     var isDelegate = m.is_delegate === true;
+    var partial = m.served_partial === true;
+    var term = m.congress_term || {};
+    var startYear = term.startYear;
+    var endYear = term.endYear;
+    var died = !!m.death_year;
+    // Never-seated: appears in vote data (total > 0) but cast no Yea/Nay AND
+    // was not a delegate, did not die, and was not in the directory as a
+    // currently-serving member with prior tenure. Examples: Gaetz (119th).
+    var unseated = (
+      !isDelegate && !partial && voting === 0 && total > 0 && !died
+    );
+
     if (isDelegate) {
       var terr = TERRITORY_NAMES[m.state] || m.state;
-      els.note.textContent = 'Note: This member is the non-voting delegate from ' + terr +
-        '. House delegates may vote in committees and on amendments in the Committee of the Whole, ' +
-        'but cannot vote on final passage on the House floor. Their low participation rate is structural, not absenteeism.';
+      appendText(els.note,
+        'Note: This member is the non-voting delegate from ' + terr +
+        '. House delegates may vote in committees and on amendments in the ' +
+        'Committee of the Whole, but cannot vote on final passage on the ' +
+        'House floor. Their low participation rate is structural, not absenteeism.');
+      els.note.classList.remove('is-hidden');
+      return;
+    }
+
+    if (unseated) {
+      appendText(els.note,
+        'Note: This member appears once in the 119th Congress roll-call data ' +
+        '(typically on the opening-day quorum call) but cast no recorded ' +
+        'votes — likely a member-elect who resigned, declined the seat, or ' +
+        'was otherwise never seated. The Total Votes denominator (' + total +
+        ') reflects all House roll calls, not their attendance.');
+      if (m.replaced_by) {
+        appendText(els.note, ' The seat was subsequently filled by ');
+        els.note.appendChild(memberLink(m.replaced_by));
+        appendText(els.note, '.');
+      }
       els.note.classList.remove('is-hidden');
-    } else if (partial || noVotes) {
+      return;
+    }
+
+    if (died) {
+      appendText(els.note, 'Note: This member died in office in ' + m.death_year + '. ');
+      if (startYear) appendText(els.note, 'Their 119th-Congress service ran from ' + startYear + ' until their death. ');
+      appendText(els.note, 'KPIs reflect only the votes they cast before then');
+      if (m.replaced_by) {
+        appendText(els.note, '. The seat was subsequently filled by ');
+        els.note.appendChild(memberLink(m.replaced_by));
+      }
+      appendText(els.note, '.');
+      els.note.classList.remove('is-hidden');
+      return;
+    }
+
+    if (m.replaced_by) {
+      appendText(els.note, 'Note: This member left office during the 119th Congress');
+      if (startYear && endYear) appendText(els.note, ' (served ' + startYear + '–' + endYear + ')');
+      appendText(els.note, '. They were succeeded by ');
+      els.note.appendChild(memberLink(m.replaced_by));
+      appendText(els.note, '. KPIs reflect only the portion of the term they served.');
+      els.note.classList.remove('is-hidden');
+      return;
+    }
+
+    if (m.replaces) {
+      appendText(els.note, 'Note: This member entered the 119th Congress mid-term, succeeding ');
+      els.note.appendChild(memberLink(m.replaces));
+      appendText(els.note, '. KPIs reflect only the portion of the term they have served so far.');
+      els.note.classList.remove('is-hidden');
+      return;
+    }
+
+    if (partial || total === 0) {
       var endDate = m.served_to || 'present';
-      els.note.textContent = 'This member did not cast roll-call votes during the period analyzed (served ' +
-        (m.served_from || '?') + ' – ' + endDate + '). The dashboards below reflect that absence.';
+      appendText(els.note,
+        'This member did not cast roll-call votes during the period analyzed ' +
+        '(served ' + (m.served_from || '?') + ' – ' + endDate + '). ' +
+        'The dashboards below reflect that absence.');
       els.note.classList.remove('is-hidden');
-    } else {
-      els.note.textContent = '';
-      els.note.classList.add('is-hidden');
     }
   }
 

Algunos archivos no se mostraron porque demasiados archivos cambiaron en este cambio