Metis Project 3: Why are My Customers Leaving ? - Using Logistic Regression To Interpret Churn Data
We just finished our week 6 at Metis in Seattle, Washington, USA. These past weeks have gone by quickly, we are half way through the program and the skillsets learned are amazing.
On my last project, I worked on predicting NBA player salaries and the feedback received were extremely useful. Thank you.
For this project, we utilized clustering methods discussed in class to solve a business problem. This project was done individually. I decided to focus on a company’s churn data to figure out what sort of customers are leaving and used logistic regression algorithm. I used Python for coding and Tableau for data visualization.The code and data for this project can be found on Github.
My initial plan was to utilize data from the Economist to cluster and figure out what style of leadership is important for economic growth of countries. This was based on a discussion with my fellow Schwarzman scholar: Lorem Aminathia on the model of leadership to ensure Africa’s growth. Unfortunately, there is not enough of data features to properly evaluate this problem.
Challenge:
“We were consulted by Infinity - a hypothetical internet service provider- to figure out their Churn - which customers are leaving - and where their Growth Team can focus on“
Data:
The data was an IBM telco service churn data on kaggle.
Approach:
The Minimum Viable Product (MVP) for our client was to address the following point:
Figure out the number of customers churning.
Find out the most frequent types of customer churning.
Provide recommendation of next steps to take for the program.
Steps:
The following steps were taken to produce results, these steps are general data science steps to a solution and are usually iterative.
Data gathering from our data sources.
Data cleaning
Feature Extractions and Cleaning
Data Insights
Client Recommendations
Insights and Reasons
After downloading , cleaning, and aggregating the datasets, the following were noticed:
About 26% of Customers are churning. Out of 7,000 churn data, close to 2,000 are churning.
2. Logistic Regression (accuracy score of 80 %) provided features of the type of customer most likely and not likely to churn.
The image shows the features that will either lead to customer churn or not. Something that surprised me was the fact that fiber optics users were more likely to churn in comparison to digital subscriber line users - a different type of internet service users. It is surprising because fiber optics internet service is usually faster in connecting to the internet than DSL. Another fact is that fiber optics is usually more expensive than DSL and maybe users are getting tired of paying the premium for the service.
Recommendation
An immediate next step for the growth team is to provide an option for fiber optics customers that are about to leave to switch to DSL service.