Sparkify Project

This project is the capstone of the Udacity data scientist nanodegree, you can find the code I wrote with some commentary in the two files of this repo. In the project I explore a large, 12 GB events dataset, extract a few features, and train a model to predict user churn.

Read more here: https://medium.com/@vicuum/predicting-churn-sparkify-music-9617602ed5c2

Results Summary

I used the events data to form a dataset with several key features and trained a random forest model to have an f1-score of 0.79. The most impactful features were the number of songs listened to each week! You'll have to read the notebooks if you want more detail.

Files

The only included files are the two notebooks:

Exploration and Features.ipynb
Modeling and Conclusions.ipynb

Requirements

The code can't really be run since the data is missing - it should be possible to retrieve the full dataset from Udacity's public link in the second notebook if they have not taken it down. Important packages are:

pandas
matplotlib
seaborn
pyspark

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
1. Exploration and Features.ipynb		1. Exploration and Features.ipynb
2. Modeling and Conclusions.ipynb		2. Modeling and Conclusions.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparkify Project

Results Summary

Files

Requirements

About

Releases

Packages

Languages

vrlambert/DSND-Term2-Capstone

Folders and files

Latest commit

History

Repository files navigation

Sparkify Project

Results Summary

Files

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages