-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OK: fix vote scraping logic #4773
OK: fix vote scraping logic #4773
Conversation
6ffb1ff
to
8c5edd6
Compare
8c5edd6
to
a66d1c3
Compare
@jessemortenson I am not sure about motion text. Could you please explain about it? http://webserver1.lsb.state.ok.us/cf/2023-24%20SUPPORT%20DOCUMENTS/votes/House/SB4_VOTES.HTM In my code review, the house vote's motion text should be "DO PASS" and senate's motion text should be empty. I guess the general vote's motion text was "house passage" or "senate passage". |
Sure, let me try to provide some context and please feel free to ask more questions if it is not clear.
http://webserver1.lsb.state.ok.us/cf/2023-24%20SUPPORT%20DOCUMENTS/votes/House/SB4_VOTES.HTM On this page, I think the http://webserver1.lsb.state.ok.us/cf/2023-24%20SUPPORT%20DOCUMENTS/votes/Senate/SB4_VOTES.HTM On this page, I actually see two vote events:
|
@jessemortenson please review it. |
Thank you. I think there is some more work to fix SB169
SJR 22 http://www.oklegislature.gov/BillInfo.aspx?Bill=SJR22&session=2300
|
@jessemortenson I've fixed the motion text issue. |
I can see how this is getting challenging :) When I ran the full scrape (
I seem to be able to get past those errors by editing in some quick checks for things being None or xpath not returning anything, etc.. But I am not sure if these edits are actually productive, or if I'm hurting your intended logic. So I'd love for you to look at my diff below and see if you want to put some of this in place, or if it points to issues you can solve more effectively: diff --git a/scrapers/ok/bills.py b/scrapers/ok/bills.py
index 7af09a5b5..1e49d98fb 100644
--- a/scrapers/ok/bills.py
+++ b/scrapers/ok/bills.py
@@ -284,15 +284,19 @@ class OKBillScraper(Scraper):
else:
chamber = "upper"
- rcs_p = header.xpath(
+ rcs_xpath = header.xpath(
"following-sibling::p[contains(., '***')][1]/preceding-sibling::p[contains(., 'RCS#')][1]"
- )[0]
- rcs_line = rcs_p.xpath("string()").replace("\xa0", " ")
- rcs = re.search(r"RCS#\s+(\d+)", rcs_line).group(1)
- if rcs in seen_rcs:
- continue
+ )
+ if rcs_xpath:
+ rcs_p = rcs_xpath[0]
+ rcs_line = rcs_p.xpath("string()").replace("\xa0", " ")
+ rcs = re.search(r"RCS#\s+(\d+)", rcs_line).group(1)
+ if rcs in seen_rcs:
+ continue
+ else:
+ seen_rcs.add(rcs)
else:
- seen_rcs.add(rcs)
+ continue
committee_motion = None
committees = [
@@ -324,7 +328,7 @@ class OKBillScraper(Scraper):
"Subcommittee",
]
- motion_text = motions.get(rcs)
+ motion_text = motions.get(rcs, '')
committee_motion = None
if "Do Pass" in motion_text or "Committee" in motion_text:
@@ -345,6 +349,9 @@ class OKBillScraper(Scraper):
continue
if "motion by Senator" in line_text:
+ if committee_motion is None:
+ # we are assuming committee motion below is a string, lets be sure
+ committee_motion = ''
do_pass_motion = line_text.strip().title() |
@jessemortenson Thanks for your update. I've fixed the issue. Please review it. |
Thanks Bray - first spot check I did showed an error that I think we saw before: the second vote (Committee vote) on a given page incorrectly reflects the counts and votes of the full body vote. Details: HB2943 vote_event_7bfd61b1-b166-11ee-89f0-29a5f35db3d4.json
vote_event_7bfd61af-b166-11ee-89f0-29a5f35db3d4.json
vote_event_7bfd61ae-b166-11ee-89f0-29a5f35db3d4.json
vote_event_7bfd61b0-b166-11ee-89f0-29a5f35db3d4.json
vote_event_7bfd61ae-b166-11ee-89f0-29a5f35db3d4.json So I think there is at least that bug to fix. Also attaching my full scrape output directory for convenience |
@jessemortenson I've fixed it. Please review it again. |
Thanks, I'm running the scrape now and will check |
Looks good, thanks for the fix! Merging in |
Description
I've updated the vote scraping logic.