In this project (uni assignment), I used a dataset containing bank customers’ data which was collected
through phone calls. The data includes various variables about each customer and whether
they Subscribed to a bank’s service or not. The dataset consist of around 38.000 records. The aim of this project is to try 4 different
classification methods in order to predict whether a customer will subscribe or not, taking into
consideration the data about them.
The classification methods I used are the following:
- K-Nearest Neighbor
- Random Forest Classification
- Naïve Bayes Classifier
- Support Vector Machines
To compare each method, the Accuracy metric will be used to evaluate the predictions, and the method with the highest accuracy will be chosen.
In the end of the project, I also attempt to cluster the customers by taking into account specific variables about them.
To cluster the customers, I will use the Partitioning Around Medoids method, I will also attempt to characterize each cluster by its demographics and average customer profile.
Dataset can be found here: https://www.kaggle.com/datasets/pankajbhowmik/bank-marketing-campaign-subscriptions