Machine Learning & Churn Rate: Accurate vs. Actionable Models

Machine Learning & Churn Rate: Accurate vs. Actionable Models

Summary

As companies work to become more "data-driven", there are clear trade-offs when it comes to applying machine-learning models to solve business challenges. One such tradeoff is between the accuracy of a model, and whether insights from that model can be interpreted by the relevant business unit and acted upon.

In this project, I discover that Data Scientists cannot just focus on creating the most accurate models: we need to balance accuracy with interpretability and action so that our insights are useful. Data Scientists can provide more value to their teams and companies when they understand the business's strategy.

CONTENTS

  • Introduction
  • Exploratory Data Analysis
  • Machine Learning Models
  • Evaluating the Algorithms
  • Model Selection and Recommendations
  • Conclusion
  • Appendix

Introduction

BACKGROUND

Telco is a landline telephone subscription company which has been losing customers; senior management at Telco is worried about their customer churn rate, or the rate at which customers stop paying for their service. Customer churn rate is a key performance indicator (KPI) that every subscription-based company needs to minimize: a low churn rate helps a subscription-based company maintain their revenue flow and avoid the costly process of acquiring new customers.

For this project, I will be using the Telco Dataset to address the problem of churn rate. Acting as a Data and Strategy Analyst at Telco, I create machine-learning algorithms using Logistic Regression, Random Forest and Decision Tree methods to understand why customers churned (Churn = Yes) and predict which customers are most likely to churn next. To calculate and improve my predictions, I also use Confusion Matrixes, Error Rate plots, Feature Importance plots, and Boosting.

Based on my findings, I write recommendations to Telco's senior management team as to how they can retain the customers who are most likely to leave. Given that few members of my audience understand machine-learning models, I evaluate the benefits and draw-backs of each model, and pick those which maximize the usefulness of insights gained, to ensure that Telco's strategy team can implement my findings.

* I got inspiration for some of my models from Susan Li,  srepho and BlastChar's work.

OBJECTIVES

- Compare machine-learning methods to predict why customers churned
- Decide which model is the best for communicating to senior management
- Use this model to identify which customers are most likely to churn next
- Provide implementable recommendations for Telco's senior management to decrease churn rate

Exploratory Data Analysis

First, I read in the data and load R packages

Screenshot 2018-04-28 22.13.23.png

There are 7043 customers represented in this dataset, and this dataset has 21 variables.  Each customer is assigned a unique customerID. Tenure indicates how many months the customer has stayed with the company. MultipleLines means that the customer has multiple telephone lines connected to their account.

DATA CLEANING

Screenshot 2018-04-28 22.16.04.png

There are 11 NAs in the TotalCharges variable. I choose to remove the rows that contain NAs. I also clean the MultipleLines column and SeniorCitizen column. I remove the CustomerID column because it is a factor with over 7000 levels, and will corrupt the models I build.

WHICH CUSTOMERS CHURNED?

Methods: DescTools function and GGPlot2 package

1869 of Telco's 7032 customers in this dataset have churned in the last month. Their churn rate is alarmingly high, at 26.6%. I subset out the Telco customers who churned in the last month.
1869 of Telco's 7032 customers in this dataset have churned in the last month. Their churn rate is alarmingly high, at 26.6%. I subset out the Telco customers who churned in the last month.

CHURNED CUSTOMERS: DEMOGRAPHICS

These ggplots summarize some basic demographic information about the customers who churned. Most of these customers seem to not have dependents, are single and are not senior citizens. There were about an equal number of female and male customers wh…
These ggplots summarize some basic demographic information about the customers who churned. Most of these customers seem to not have dependents, are single and are not senior citizens. There were about an equal number of female and male customers who churned.

CHURNED CUSTOMERS: TELCO USAGE

Tenure

Tenure for churned customers (in blue) is on average much shorter, and the distribution is right-skewed. The majority of churns occur very early in their tenure, within the first five months.
Tenure for churned customers (in blue) is on average much shorter, and the distribution is right-skewed. The majority of churns occur very early in their tenure, within the first five months.
Converting tenure to a factor variable, we see that 20.3% of customers who churned left within their first month. 6.6% left within their second month, and 5% left within their third month.
Converting tenure to a factor variable, we see that 20.3% of customers who churned left within their first month. 6.6% left within their second month, and 5% left within their third month.

Contract

88.6% of the customers who churned were using a Month-to-month contract.
88.6% of the customers who churned were using a Month-to-month contract.

CHURNED CUSTOMERS: PAYMENTS

Total Charges

Most customers who churned had paid Telco less than $500 in charges.
Most customers who churned had paid Telco less than $500 in charges.

Monthly Charges

Most customers who churned in the last month were paying between $60 and $100 for Telco services.
Most customers who churned in the last month were paying between $60 and $100 for Telco services.

Machine Learning Models

HOW ACCURATELY CAN WE PREDICT CHURN?

Methods: Logistic Regression, Decision Tree, Random Forest

To provide recommendations for Telco's senior management, I conduct, improve and compare machine-learning models to identify how they can decrease churn. First, I create my training and testing datasets.

In my training set, I have 1402 customers who churned and 3873 who did not churn.
In my training set, I have 1402 customers who churned and 3873 who did not churn.

LOGISTIC REGRESSION MODEL 1

Improvement: Anova

I included all the variables in my logistic regression.

Screenshot 2018-04-28 22.33.29.png
If a variable has a positive value in the "Estimate" column, it is more likely to cause a customer to churn.  If a variable has a negative value in the "Estimate" column, it is less likely to cause a customer to churn.
If a variable has a positive value in the "Estimate" column, it is more likely to cause a customer to churn.
If a variable has a negative value in the "Estimate" column, it is less likely to cause a customer to churn.

From this summary, we can see that many variables are not important for predicting whether or not a customer will churn, based on the lack of asterisks and a high p value. Contract type, Total charges and tenure are the most statistically significant variables.

HOW ACCURATE IS LOGISTIC REGRESSION MODEL 1?

This Logistic Regression will correctly predict whether a customer will churn about 80.4% of the time.
This Logistic Regression will correctly predict whether a customer will churn about 80.4% of the time.

ANOVA

Using an Analysis of Variance (ANOVA) helps me identify which features are the most important, so that I can simplify my logistic regression.

Screenshot 2018-04-28 22.37.31.png

As we add each variable, we can see the drop in deviance of the residuals. Adding InternetService, tenure and MultipleLines significantly reduces the residual deviance. Even though StreamingTV and StreamingMovies have low p values, they also only provide a small reduction in residual deviance.

To simplify the model to be interpreted, I limit the model to the most relevant variables I identified in the first summary, or those with at least a 0.001 level of significance (***).

LOGISTIC REGRESSION MODEL 2

Improvement Method: 2nd iteration

Screenshot 2018-04-28 22.39.23.png

HOW ACCURATE IS LOGISTIC REGRESSION MODEL 2?

Screenshot 2018-04-28 22.40.39.png

Surprisingly, Logistic Regression Model 2 better predicts churn on my testing dataset, with 81.2% accuracy. This suggests that my first logistic regression overfit my training data.

DECISION TREE

Improvement Method: Boosting

We can use a tree model to divide Telco customers based on factors that increase their likelihood to churn. Each branch indicates a decision boundary that divides up the customers.  

Screenshot 2018-04-28 22.41.55.png
Screenshot 2018-04-28 22.42.05.png
The Decision Tree is about 2.5% less accurate than the Logistic Regression.
The Decision Tree is about 2.5% less accurate than the Logistic Regression.

BOOSTING

The Boosting method ranks each variable based on its importance for predicting customer churn. Using boosting for the tree model allows me to rank the variables which should be the highest priority to Telco's senior management, so that Telco could be very strategic in how it allocates time and resources for retaining customers.

Screenshot 2018-04-28 22.45.00.png
Screenshot 2018-04-28 22.45.13.png

This summary shows that Contract type has by far the greatest relative influence on the customer's decision to churn, according to the training data. This is followed by tenure and Online Security, which does not appear in the Decision Tree.

RANDOM FOREST

Improvement method: Error Plot + Tuning

Now I use the caret package to build my random forest model.

Screenshot 2018-04-28 22.45.48.png

HOW ACCURATE IS THE RANDOM FOREST MODEL?

On the training data, the Random Forest model has 79.37% accuracy, about the same as a single decision tree. But what about the testing data?
On the training data, the Random Forest model has 79.37% accuracy, about the same as a single decision tree. But what about the testing data?
Random Forest is slightly more accurate for our testing data, at 79.57%.
Random Forest is slightly more accurate for our testing data, at 79.57%.

ERROR PLOT

Screenshot 2018-04-28 22.47.13.png

I used this error plot to determine whether an increase in the number of trees in the random forest model will lead to a significant decrease in error rate. We see that above about 200 trees, there is no significant difference in any of the error rates, so I can limit my model to 200 trees.

Mtry = 2 results in the lowest error for the model.
Mtry = 2 results in the lowest error for the model.

FIT THE NEW TREES AFTER TUNING

Accuracy improved from 79.57% to 80.82%. Additionally, sensitivity improved from 0.8953 to 0.9101.
Accuracy improved from 79.57% to 80.82%. Additionally, sensitivity improved from 0.8953 to 0.9101.

WHICH FEATURES ARE THE MOST IMPORTANT ACCORDING TO THE RANDOM FOREST MODEL?

Tenure, TotalCharges and Contract type are the three most influential variables in the Random Forest model.
Tenure, TotalCharges and Contract type are the three most influential variables in the Random Forest model.

Evaluating the Algorithms: which model is best?

For the purposes of presenting my findings to Telco's senior management, I am judging each model on not only its accuracy but also its interpretability and the ease with which its findings can be implemented.

LOGISTIC REGRESSION

  • Accuracy for Test set prediction: 81.2%
  • Pros: The logistic regression was my most accurate model. It also tells the management team whether they should try to increase or decrease each variable affecting churn rate. For example, ContractOne (one year contract) and ContractTwo (two year contract) outcomes had negative estimates in the model. This means that increasing the number of customers with these types of contracts will decrease churn.
  • Cons: There are too many significant variables (with *** asterisks and low p-values) in this model, so it does not tell senior management which variable should be prioritized first.

DECISION TREE MODEL

  • Accuracy for Test set prediction: 79.3%
  • Pros: The single decision tree was the simplest and most easily interpretable model. It creates a visual hierarchy for each of the variables affecting churn, and prioritizes Contract type first, followed by InternetService, tenure and MonthlyCharges. It could provide an effective set of strategies for senior management.
  • Cons: This is the least accurate model, and may be overly simplistic.

RANDOM FOREST MODEL

  • Accuracy for Test set prediction: 80.82%
  • Pros: This model was very accurate, and tuning improved the error rate and sensitivity of the model. The VarImportance function ranked the variables in order of importance to senior management.
  • Cons: This model is a "black box", so is difficult to interpret. Using the VarImportance plot does not give insights into how these variables impact churn.

Model Selection and Recommendations

Based on the pros and cons of each method, I have decided to proceed with the Decision Tree as my model to help senior management prioritize which variable to address to improve customer retention. I will then use Logistic Regression Model 2's findings to indicate how senior management should address each variable in order to reduce churn.

Decision Tree
Decision Tree
Logistic Regression Model 2
Logistic Regression Model 2

The Decision Tree is the simplest model to explain because it shows a clear visual hierarchy as to what Telco should focus on first: change the customer Contract type to One or Two years, instead of Month-to-Month where possible.

Combining the Decision Tree with the positive and negative estimates of Logistic Regression Model 2, Telco can see each variables as a lever that will increase or decrease churn. For examples, the Payment Methods which reduce churn are Credit Cards and Mailed Checks, whereas customers paying through Electronic Check are significantly more likely to churn.

INSIGHTS FOR SENIOR MANAGEMENT: WHAT MAKES CUSTOMERS MORE LIKELY TO CHURN?

These findings are based on the Logistic Regressions:

  • Contract: Customers using a Month-to-month contract are significantly more likely to churn than customers using a One year or Two year contract.
  • Tenure: Customers with shorter Tenure were more likely to churn.
  • Payment Method: Customers using an Electronic Check were more likely to churn than those who paid automatically using a Credit Card or by a Mailed Check.
  • Total Charges: Customers who had higher Total Charges were more likely to churn.
  • Paperless Billing: Customers who used Paperless Billing were more likely to churn.
  • Multiple Lines: Customers who had Multiple Lines were more likely to churn.
  • Tech Support: Customers with no Tech Support were more likely to churn.
  • Phone Service: Customers with no Phone Service were more likely to churn.

CHURN CUSTOMER PROFILE: CUSTOMERS MOST LIKELY TO CHURN NEXT

Based on the findings in the Decision Tree, we can identify that these customers are most likely to churn.

  • Contract: Month-to-Month
  • Internet Service: Yes
  • Tenure: < 15.5 months

IDENTIFY WHICH CUSTOMERS WILL CHURN NEXT

Method: Subset the dataset based on the Churn Profile and Boosting results

Using the customer profile, I create a subset of customers who are vulnerable to churn, called, "vulnerable customers".

Screenshot 2018-04-28 23.02.30.png

There are 333 vulnerable customers. Telco can keep track of these customerIDs to measure which of these customers churn once their solution is implemented, and thus gauge how effective that solutions would be. To enable them to do this, I list the first 20 customerIDs of these customers.

Screenshot 2018-04-28 23.02.52.png

HOW COULD TELCO PROACTIVELY PREVENT THESE CUSTOMERS FROM LEAVING? WHAT MIGHT A CUSTOMER RETENTION PROGRAM LOOK LIKE?

In order of priority for retaining customers, Telco could implement the following solutions:

  • Contract: Contract type is by far the most important variable for customer retention, and should be Telco's priority. Telco could advertise the ease of One Year or Two Year contracts for Month-to-Month customers to encourage them to switch, and could cease to offer the Month-to-Month contracts to new customers. They could also provide a small cash-back incentive for customers who switch.
  • Tenure: This is highly dependent on the Contract type of each customer.
  • Internet Service: Telco could create a bundled deal for customers who want Fiber Optic internet service as well as phone service.
  • Payment Method: Telco could advertise the ease of switching to automatic Credit Card payments for customers who pay by Electronic Check. Telco could cease to offer Electronic Checks as an option for payment to new customers who register.
  • Total Charges: As customers are financially sensitive to the total amount they are charged, Telco can sweeten the deal by offering discounts on other variables, such as the Payment Method or Contract type.
  • Multiple Lines: Telco could create a cheaper package for customers who want multiple telephone lines in their home.

Conclusion

From this challenge, I learnt that it is important to include Data Scientists in the strategy and business model planning of a company, if their work is to be valuable for solving business problems.

Exploring a variety of machine learning models showed me how the simplicity of a single Decision Tree might be much more useful to a senior management team who wants to understand churn rate, rather than a much more complicated "black box" model like Random Forest. For this case, Telco is probably okay to sacrifice a couple percentage points in accuracy for predicting outcomes in the test data. But the question of which model to choose is ultimately a judgment call for the Data Scientist. If I was building a classification model to be applied in the health sphere, for example, 0.5% accuracy might make a big difference to the outcome, and I might have picked a "black box" model.

In the Telco case, the real challenge was to generate a model whose findings were insightful into the way that customers behaved, and from that create tangible, resource-efficient steps the company can take to reduce churn.  

Appendix

I experimented with some other models we learnt in class to confirm my findings that the Logistic Regression was appropriate.

Screenshot 2018-04-28 23.10.57.png
Screenshot 2018-04-28 23.14.11.png
Screenshot 2018-04-28 23.14.19.png

The Logistic Regression (gbm) also performs marginally better than the other models such as K Nearest Neighbor (knn) and about the same as Linear Discriminant Analysis (lda).

If you enjoyed this article, subscribe to read more of my work and follow my journey...

Machine Learning & Churn Rate: Accurate vs. Actionable Models

Summary

As companies work to become more "data-driven", there are clear trade-offs when it comes to applying machine-learning models to solve business challenges. One such tradeoff is between the accuracy of a model, and whether insights from that model can be interpreted by the relevant business unit and acted upon.

In this project, I discover that Data Scientists cannot just focus on creating the most accurate models: we need to balance accuracy with interpretability and action so that our insights are useful. Data Scientists can provide more value to their teams and companies when they understand the business's strategy.

CONTENTS

  • Introduction
  • Exploratory Data Analysis
  • Machine Learning Models
  • Evaluating the Algorithms
  • Model Selection and Recommendations
  • Conclusion
  • Appendix

Introduction

BACKGROUND

Telco is a landline telephone subscription company which has been losing customers; senior management at Telco is worried about their customer churn rate, or the rate at which customers stop paying for their service. Customer churn rate is a key performance indicator (KPI) that every subscription-based company needs to minimize: a low churn rate helps a subscription-based company maintain their revenue flow and avoid the costly process of acquiring new customers.

For this project, I will be using the Telco Dataset to address the problem of churn rate. Acting as a Data and Strategy Analyst at Telco, I create machine-learning algorithms using Logistic Regression, Random Forest and Decision Tree methods to understand why customers churned (Churn = Yes) and predict which customers are most likely to churn next. To calculate and improve my predictions, I also use Confusion Matrixes, Error Rate plots, Feature Importance plots, and Boosting.

Based on my findings, I write recommendations to Telco's senior management team as to how they can retain the customers who are most likely to leave. Given that few members of my audience understand machine-learning models, I evaluate the benefits and draw-backs of each model, and pick those which maximize the usefulness of insights gained, to ensure that Telco's strategy team can implement my findings.

* I got inspiration for some of my models from Susan Li,  srepho and BlastChar's work.

OBJECTIVES

- Compare machine-learning methods to predict why customers churned
- Decide which model is the best for communicating to senior management
- Use this model to identify which customers are most likely to churn next
- Provide implementable recommendations for Telco's senior management to decrease churn rate

Exploratory Data Analysis

First, I read in the data and load R packages

Screenshot 2018-04-28 22.13.23.png

There are 7043 customers represented in this dataset, and this dataset has 21 variables.  Each customer is assigned a unique customerID. Tenure indicates how many months the customer has stayed with the company. MultipleLines means that the customer has multiple telephone lines connected to their account.

DATA CLEANING

Screenshot 2018-04-28 22.16.04.png

There are 11 NAs in the TotalCharges variable. I choose to remove the rows that contain NAs. I also clean the MultipleLines column and SeniorCitizen column. I remove the CustomerID column because it is a factor with over 7000 levels, and will corrupt the models I build.

WHICH CUSTOMERS CHURNED?

Methods: DescTools function and GGPlot2 package

1869 of Telco's 7032 customers in this dataset have churned in the last month. Their churn rate is alarmingly high, at 26.6%. I subset out the Telco customers who churned in the last month.
1869 of Telco's 7032 customers in this dataset have churned in the last month. Their churn rate is alarmingly high, at 26.6%. I subset out the Telco customers who churned in the last month.

CHURNED CUSTOMERS: DEMOGRAPHICS

These ggplots summarize some basic demographic information about the customers who churned. Most of these customers seem to not have dependents, are single and are not senior citizens. There were about an equal number of female and male customers wh…
These ggplots summarize some basic demographic information about the customers who churned. Most of these customers seem to not have dependents, are single and are not senior citizens. There were about an equal number of female and male customers who churned.

CHURNED CUSTOMERS: TELCO USAGE

Tenure

Tenure for churned customers (in blue) is on average much shorter, and the distribution is right-skewed. The majority of churns occur very early in their tenure, within the first five months.
Tenure for churned customers (in blue) is on average much shorter, and the distribution is right-skewed. The majority of churns occur very early in their tenure, within the first five months.
Converting tenure to a factor variable, we see that 20.3% of customers who churned left within their first month. 6.6% left within their second month, and 5% left within their third month.
Converting tenure to a factor variable, we see that 20.3% of customers who churned left within their first month. 6.6% left within their second month, and 5% left within their third month.

Contract

88.6% of the customers who churned were using a Month-to-month contract.
88.6% of the customers who churned were using a Month-to-month contract.

CHURNED CUSTOMERS: PAYMENTS

Total Charges

Most customers who churned had paid Telco less than $500 in charges.
Most customers who churned had paid Telco less than $500 in charges.

Monthly Charges

Most customers who churned in the last month were paying between $60 and $100 for Telco services.
Most customers who churned in the last month were paying between $60 and $100 for Telco services.

Machine Learning Models

HOW ACCURATELY CAN WE PREDICT CHURN?

Methods: Logistic Regression, Decision Tree, Random Forest

To provide recommendations for Telco's senior management, I conduct, improve and compare machine-learning models to identify how they can decrease churn. First, I create my training and testing datasets.

In my training set, I have 1402 customers who churned and 3873 who did not churn.
In my training set, I have 1402 customers who churned and 3873 who did not churn.

LOGISTIC REGRESSION MODEL 1

Improvement: Anova

I included all the variables in my logistic regression.

Screenshot 2018-04-28 22.33.29.png
If a variable has a positive value in the "Estimate" column, it is more likely to cause a customer to churn.  If a variable has a negative value in the "Estimate" column, it is less likely to cause a customer to churn.
If a variable has a positive value in the "Estimate" column, it is more likely to cause a customer to churn.
If a variable has a negative value in the "Estimate" column, it is less likely to cause a customer to churn.

From this summary, we can see that many variables are not important for predicting whether or not a customer will churn, based on the lack of asterisks and a high p value. Contract type, Total charges and tenure are the most statistically significant variables.

HOW ACCURATE IS LOGISTIC REGRESSION MODEL 1?

This Logistic Regression will correctly predict whether a customer will churn about 80.4% of the time.
This Logistic Regression will correctly predict whether a customer will churn about 80.4% of the time.

ANOVA

Using an Analysis of Variance (ANOVA) helps me identify which features are the most important, so that I can simplify my logistic regression.

Screenshot 2018-04-28 22.37.31.png

As we add each variable, we can see the drop in deviance of the residuals. Adding InternetService, tenure and MultipleLines significantly reduces the residual deviance. Even though StreamingTV and StreamingMovies have low p values, they also only provide a small reduction in residual deviance.

To simplify the model to be interpreted, I limit the model to the most relevant variables I identified in the first summary, or those with at least a 0.001 level of significance (***).

LOGISTIC REGRESSION MODEL 2

Improvement Method: 2nd iteration

Screenshot 2018-04-28 22.39.23.png

HOW ACCURATE IS LOGISTIC REGRESSION MODEL 2?

Screenshot 2018-04-28 22.40.39.png

Surprisingly, Logistic Regression Model 2 better predicts churn on my testing dataset, with 81.2% accuracy. This suggests that my first logistic regression overfit my training data.

DECISION TREE

Improvement Method: Boosting

We can use a tree model to divide Telco customers based on factors that increase their likelihood to churn. Each branch indicates a decision boundary that divides up the customers.  

Screenshot 2018-04-28 22.41.55.png
Screenshot 2018-04-28 22.42.05.png
The Decision Tree is about 2.5% less accurate than the Logistic Regression.
The Decision Tree is about 2.5% less accurate than the Logistic Regression.

BOOSTING

The Boosting method ranks each variable based on its importance for predicting customer churn. Using boosting for the tree model allows me to rank the variables which should be the highest priority to Telco's senior management, so that Telco could be very strategic in how it allocates time and resources for retaining customers.

Screenshot 2018-04-28 22.45.00.png
Screenshot 2018-04-28 22.45.13.png

This summary shows that Contract type has by far the greatest relative influence on the customer's decision to churn, according to the training data. This is followed by tenure and Online Security, which does not appear in the Decision Tree.

RANDOM FOREST

Improvement method: Error Plot + Tuning

Now I use the caret package to build my random forest model.

Screenshot 2018-04-28 22.45.48.png

HOW ACCURATE IS THE RANDOM FOREST MODEL?

On the training data, the Random Forest model has 79.37% accuracy, about the same as a single decision tree. But what about the testing data?
On the training data, the Random Forest model has 79.37% accuracy, about the same as a single decision tree. But what about the testing data?
Random Forest is slightly more accurate for our testing data, at 79.57%.
Random Forest is slightly more accurate for our testing data, at 79.57%.

ERROR PLOT

Screenshot 2018-04-28 22.47.13.png

I used this error plot to determine whether an increase in the number of trees in the random forest model will lead to a significant decrease in error rate. We see that above about 200 trees, there is no significant difference in any of the error rates, so I can limit my model to 200 trees.

Mtry = 2 results in the lowest error for the model.
Mtry = 2 results in the lowest error for the model.

FIT THE NEW TREES AFTER TUNING

Accuracy improved from 79.57% to 80.82%. Additionally, sensitivity improved from 0.8953 to 0.9101.
Accuracy improved from 79.57% to 80.82%. Additionally, sensitivity improved from 0.8953 to 0.9101.

WHICH FEATURES ARE THE MOST IMPORTANT ACCORDING TO THE RANDOM FOREST MODEL?

Tenure, TotalCharges and Contract type are the three most influential variables in the Random Forest model.
Tenure, TotalCharges and Contract type are the three most influential variables in the Random Forest model.

Evaluating the Algorithms: which model is best?

For the purposes of presenting my findings to Telco's senior management, I am judging each model on not only its accuracy but also its interpretability and the ease with which its findings can be implemented.

LOGISTIC REGRESSION

  • Accuracy for Test set prediction: 81.2%
  • Pros: The logistic regression was my most accurate model. It also tells the management team whether they should try to increase or decrease each variable affecting churn rate. For example, ContractOne (one year contract) and ContractTwo (two year contract) outcomes had negative estimates in the model. This means that increasing the number of customers with these types of contracts will decrease churn.
  • Cons: There are too many significant variables (with *** asterisks and low p-values) in this model, so it does not tell senior management which variable should be prioritized first.

DECISION TREE MODEL

  • Accuracy for Test set prediction: 79.3%
  • Pros: The single decision tree was the simplest and most easily interpretable model. It creates a visual hierarchy for each of the variables affecting churn, and prioritizes Contract type first, followed by InternetService, tenure and MonthlyCharges. It could provide an effective set of strategies for senior management.
  • Cons: This is the least accurate model, and may be overly simplistic.

RANDOM FOREST MODEL

  • Accuracy for Test set prediction: 80.82%
  • Pros: This model was very accurate, and tuning improved the error rate and sensitivity of the model. The VarImportance function ranked the variables in order of importance to senior management.
  • Cons: This model is a "black box", so is difficult to interpret. Using the VarImportance plot does not give insights into how these variables impact churn.

Model Selection and Recommendations

Based on the pros and cons of each method, I have decided to proceed with the Decision Tree as my model to help senior management prioritize which variable to address to improve customer retention. I will then use Logistic Regression Model 2's findings to indicate how senior management should address each variable in order to reduce churn.

Decision Tree
Decision Tree
Logistic Regression Model 2
Logistic Regression Model 2

The Decision Tree is the simplest model to explain because it shows a clear visual hierarchy as to what Telco should focus on first: change the customer Contract type to One or Two years, instead of Month-to-Month where possible.

Combining the Decision Tree with the positive and negative estimates of Logistic Regression Model 2, Telco can see each variables as a lever that will increase or decrease churn. For examples, the Payment Methods which reduce churn are Credit Cards and Mailed Checks, whereas customers paying through Electronic Check are significantly more likely to churn.

INSIGHTS FOR SENIOR MANAGEMENT: WHAT MAKES CUSTOMERS MORE LIKELY TO CHURN?

These findings are based on the Logistic Regressions:

  • Contract: Customers using a Month-to-month contract are significantly more likely to churn than customers using a One year or Two year contract.
  • Tenure: Customers with shorter Tenure were more likely to churn.
  • Payment Method: Customers using an Electronic Check were more likely to churn than those who paid automatically using a Credit Card or by a Mailed Check.
  • Total Charges: Customers who had higher Total Charges were more likely to churn.
  • Paperless Billing: Customers who used Paperless Billing were more likely to churn.
  • Multiple Lines: Customers who had Multiple Lines were more likely to churn.
  • Tech Support: Customers with no Tech Support were more likely to churn.
  • Phone Service: Customers with no Phone Service were more likely to churn.

CHURN CUSTOMER PROFILE: CUSTOMERS MOST LIKELY TO CHURN NEXT

Based on the findings in the Decision Tree, we can identify that these customers are most likely to churn.

  • Contract: Month-to-Month
  • Internet Service: Yes
  • Tenure: < 15.5 months

IDENTIFY WHICH CUSTOMERS WILL CHURN NEXT

Method: Subset the dataset based on the Churn Profile and Boosting results

Using the customer profile, I create a subset of customers who are vulnerable to churn, called, "vulnerable customers".

Screenshot 2018-04-28 23.02.30.png

There are 333 vulnerable customers. Telco can keep track of these customerIDs to measure which of these customers churn once their solution is implemented, and thus gauge how effective that solutions would be. To enable them to do this, I list the first 20 customerIDs of these customers.

Screenshot 2018-04-28 23.02.52.png

HOW COULD TELCO PROACTIVELY PREVENT THESE CUSTOMERS FROM LEAVING? WHAT MIGHT A CUSTOMER RETENTION PROGRAM LOOK LIKE?

In order of priority for retaining customers, Telco could implement the following solutions:

  • Contract: Contract type is by far the most important variable for customer retention, and should be Telco's priority. Telco could advertise the ease of One Year or Two Year contracts for Month-to-Month customers to encourage them to switch, and could cease to offer the Month-to-Month contracts to new customers. They could also provide a small cash-back incentive for customers who switch.
  • Tenure: This is highly dependent on the Contract type of each customer.
  • Internet Service: Telco could create a bundled deal for customers who want Fiber Optic internet service as well as phone service.
  • Payment Method: Telco could advertise the ease of switching to automatic Credit Card payments for customers who pay by Electronic Check. Telco could cease to offer Electronic Checks as an option for payment to new customers who register.
  • Total Charges: As customers are financially sensitive to the total amount they are charged, Telco can sweeten the deal by offering discounts on other variables, such as the Payment Method or Contract type.
  • Multiple Lines: Telco could create a cheaper package for customers who want multiple telephone lines in their home.

Conclusion

From this challenge, I learnt that it is important to include Data Scientists in the strategy and business model planning of a company, if their work is to be valuable for solving business problems.

Exploring a variety of machine learning models showed me how the simplicity of a single Decision Tree might be much more useful to a senior management team who wants to understand churn rate, rather than a much more complicated "black box" model like Random Forest. For this case, Telco is probably okay to sacrifice a couple percentage points in accuracy for predicting outcomes in the test data. But the question of which model to choose is ultimately a judgment call for the Data Scientist. If I was building a classification model to be applied in the health sphere, for example, 0.5% accuracy might make a big difference to the outcome, and I might have picked a "black box" model.

In the Telco case, the real challenge was to generate a model whose findings were insightful into the way that customers behaved, and from that create tangible, resource-efficient steps the company can take to reduce churn.  

Appendix

I experimented with some other models we learnt in class to confirm my findings that the Logistic Regression was appropriate.

Screenshot 2018-04-28 23.10.57.png
Screenshot 2018-04-28 23.14.11.png
Screenshot 2018-04-28 23.14.19.png

The Logistic Regression (gbm) also performs marginally better than the other models such as K Nearest Neighbor (knn) and about the same as Linear Discriminant Analysis (lda).

If you enjoyed this article, subscribe to read more of my work!

Latest POSTS