Skip to content

Latest commit

 

History

History
executable file
·
27 lines (25 loc) · 933 Bytes

todo.md

File metadata and controls

executable file
·
27 lines (25 loc) · 933 Bytes

TODO

  • push

  • clean up notebook

  • go to larger corpus

  • characterize different kinds of files. and resourdes

  • Filter out Tableaux and other files

  • Filter out tokens that aren't terribly helpful.

    • d7 - matches subset of file name but not clear where it comes out in tokens
    • t6,t7, a80 t12 - doesn't seem to show up in file names
    • write a routine that will grab the tokens for sopmething that matches
    • ok. parsing is making a mess. look at that.
    • 'cti' is another problem
  • make notes of file name patterns that we have pulled out.

  • work on nnmf

DONE

  • leave as is. zip codes are ok. we'll just have to characterize those.
    • illinois_demo_race05_20_2022.csv',
    • 'illinois_demo_gender05_21_2022.csv',
    • 'county_historical_cases_2022-05-10_162303.cs
    • 06 07 05 03 'Sheet_48_Benewah_2022-05-06_212304.csv'
    • Ω push pre-compile all regex
  • filter out time stamp patterns