The Dataset is about the potential Insurance Buyers. The purpose of this project is to predict whether a person is willing to buy or not. The dataset is made possible with the resources provided in a Hack-a-thon (Analytics Vidya). Thanks to them for the opportunity.
More info about the data is found in data.md
- Removing the unnecessary features first.
- Replacing the null values and Data cleaning.
- Data wrangling: mean, std, max and, also correlation
- Univariate analysis: to understand single feature precisely.
- Bi-variate and Multi-variate analysis: for drawing out insights, inferences.
Understanding the Recovered Insurance types. By the count plot, it's clear that there are more individual account in the dataset.
After that, it is necessary to analyze the distribution of the customers among the cities mentioned with the help of a seaborn countplot.
In the process, the distribution of the policy premium recovered so far is found out to be right skewed.
Explored the the policy type and their percentages at which the customers bought while comparing them whether the customer is individual or joint account holder
Also, found out the goldilock zone of the duration of policy holders to buy another policy.
A compilation of different plots using seaborn pair plots
The detailed insights from EDA is mentioned in the notebook.