The EDA for this project consisted of:
- Checking missing values
- Looking at the distribution of the target variable (churn)
- Looking at numerical and categorical variables
Functions and methods:
df.isnull().sum()- returns the number of null values in the dataframe.df.x.value_counts()returns the number of values for each category in x series. Thenormalize=Trueargument retrieves the percentage of each category. In this project, the mean of churn is equal to the churn rate obtained with the value_counts method.round(x, y)- round an x number with y decimal placesdf[x].nunique()- returns the number of unique values in x series
The entire code of this project is available in this jupyter notebook.
|
The notes are written by the community. If you see an error here, please create a PR with a fix. |