Skip to content

Latest commit

 

History

History
20 lines (13 loc) · 1.26 KB

README.md

File metadata and controls

20 lines (13 loc) · 1.26 KB

Email Spam Filter

This project creates an email spam filter based on supervised learning that classifies emails as either spam (unwanted) or ham (legitimate) for my data analysis and vsiualization class.

I used two supervised learning algorithms, K Nearest Neighbors (KNN) and Naive Bayes, and compared their performances. To train and evaluate these classifiers, I used the Enron spam email dataset, which consists of approximately 34,000 emails. Once the classifiers were trained, I ran them in a Jupyter Notebook to predict whether new emails are spam or ham.

Goals

  • Explore and implement the KNN and Naive Bayes algorithms.
  • Gain hands-on experience in preprocessing text data, specifically converting emails into numeric features suitable for model processing.
  • Set up a supervised learning problem and analyze the results.
  • Understand and follow a typical end-to-end supervised machine learning workflow.
  • Work with a large, real text dataset.

Dataset

I used the Enron spam email dataset for this project. You can download the dataset using the following links: