Skip to content

Latest commit

 

History

History
52 lines (35 loc) · 3.5 KB

README.md

File metadata and controls

52 lines (35 loc) · 3.5 KB

college-emails

.github/workflows/main.yml Netlify Status wakatime

Colleges love to send email advertisements, so much so that it becomes inbox clutter. This project serves to analyze this spam, and look at some interesting trends in the emails I've received in the past year regarding college.

A Svelte frontend of statistics hosted on Netlify.

A number of Node JS scripts to parse emails and get college data. These are designed to be used through Github Actions, but can also be run locally.

Run all scripts, including downloading emails and generating statistics.

A set of utilities used to download and parse the data found in data.

Run Locally

To create the same type of visualization locally for your own emails, follow these steps.

Setup

  • Clone this repository (git clone https://github.com/louismeunier/college-emails.git)
  • Delete client/src/data.json, client/src/dates.json, and client/src/updated.json.
  • While in the directory containing the repo, run cd client && yarn && cd ../scripts && yarn to install dependencies.

Authentication

  • To access your emails, you'll need to authenticate with the GMail API.
  • Follow these steps to create the project.
  • Enable the GMail API with the scope 'https://www.googleapis.com/auth/gmail.readonly'
  • IMPORTANT: Make sure you add your email address as a tester for your application. Otherwise, as your project is unverified, it will not work.
  • Download your credentials as JSON, and save it to scripts/credentials.json
  • Run node scripts, and if your setup was done correctly, it should prompt you to visit a URL and authenticate. This should save a file scripts/token.json.

Generating the data

  • Run node scripts a second time. It should now actually run the program, and regularly print output to the screen indicating progress.
  • Note: it can take quite a while for the scripts to run, around 1.5 minutes per 1000 emails.
  • When completed, the scripts should print some tables of output, as well as some statistics of how well the run went.
  • It should also have created 3 new files, client/src/dev_data.json, client/src/dev_data.json, and client/src/dev_updated.json. Delete dev_ from each of these.

Creating the visuals

  • Run cd client && yarn dev.

Data Credits

The dataset used containing college websites, names, locations, etc. was found here.

Note

Because of the way emails are linked to their respective college (via the domain name of the sender), there are some emails that are unable to be linked to a college and are thus not included in the final statistics. This, however, only accounts for ~2% of all the emails parsed per run.