Metis Project 2: Predicting NBA Player Salaries using Linear Regression
We just finished our third week at Metis in Seattle, Washington, USA. These past weeks were a roller coaster of learning amazing materials in statistics, python and linear algebra.
On our first project, I worked with other students to provide recommendations for Women in Technology and that experience was amazing.
For our second Project, we worked individually and utilized Linear regression to predict or interpret data on a topic of our choosing. I decided to focus on the NBA because of my rekindled love for the game after watching last seasons tumultuous finals between the Toronto Raptors and the Golden State Warriors.
Even though I worked on this project alone, in understanding the theory, my Metis’ classmate Fatima Loumaini and my instructors helped me.
Big shoutout goes to my ex-Managers at Goldman Sachs who gave me feedback on my model and how to properly create compelling visualizations. Thank you Rose Chen, David Chan and Samanth Muppidi (inside joke).
Goal
The goal of this project is to predict NBA players’ salaries per season based on their statistics using Linear Regression. This project can be used by both Team players and Managers to evaluate the impact a particular player is making on a team and to know whether to increase the players’ salary or trade the player.
Notes:
I am taking the non-traditional approach of explaining my results first and for anyone who is interested in the technicalities of the entire project, can read the remainder of the blog and view the code / presentations .
Results and Insights:
Growing Salaries and Injuries Impacts. Predicting Victor Oladipo’s Salaries:
The model was tested on Victor Oladipo’s per season stats from 2017 - 2019. Victor was the Most Improved Player in 2018 . Using a selection algorithm, the most important stats for a player was selected to predict his salary.
From the charts below, we can see that the ratio of Victor’s actual salaries to his stats increased from year 2017 to 2018 and stayed fixed in year 2019 while my model predicted his salary should have increased from year 2017 to 2018 (but not as high as his actual salary increase) and decrease slightly in 2019. We can see that Victor’s stats from 2017 to 2018 increased while decreasing slightly in 2019.
Observations
In the real world, Victor made a huge impact on his team from 2017 to 2019 (The Indiana Pacers) but got an injury that knocked him out for the season in 2019. This injury affected the impact he made on his team hence the decrease in his stats. A reason why we do not see a change in his salary is because he is currently on a multi-year contract that is usually guaranteed despite injuries.
Growing Salaries, Growing Impacts. Predicting Giannis Antetokounmpo Salaries:
The model was tested on Giannis’s stats who was the Most Improved Player in 2017 and the results were used to create the charts below.
From the charts, we can see that the ratio of Giannis’s actual salaries to his stats increased from 2017 to 2019 while my model predicted his salary should have increased from the year 2017 to 2019. Giannis’s stats from 2017 to 2019 saw a steady increase as well. Something worth noting is that my model says that Giannis needs to be making more than his actual salaries from 2017-2019.
Observations
Juxtaposing to the reality, Giannis improved greatly in 2017 and signed a multi year contract that season. Thus we can see an increase in his salaries. My model predicted that because of Giannis’ impact on his team, he should be earning more money. But for Giannis, he cares more about building the Milwaukee Bucks franchise and he is willing to grow with the organization.
Growing Salaries, Declining Impact. Predicting Jimmy Butler Salaries:
Finally, the model was evaluated on Jimmy Butler's stats who was the Most Improved Player in 2015 to generate the charts below.
The charts show the ratio of Jimmy’s actual salaries to his stats increased from the year 2017 to 2019 and my model predicted his salary should have decreased over that time period.
We can see that Jimmy’s stats from 2017 to 2019 slightly decreases. Something worth noting is that my model says that Jimmy needs to be making less money than his actual salaries based on his stats.
In actuality, Jimmy’s stats saw a steady decrease from 2017 to 2019 as he switched from the Chicago bulls team to the Minnesota Timberwolves team in 2018 and to the Philadephia 76ers team in 2019. In explaining these phenomena of increasing salaries to decreasing stats, it is general knowledge that a player’s brand also adds to his value and in switching teams, a player needs time to adjust to the style of play of that particular team. So it is not surprising that Jimmy’s stats decreased over time.
If you made it this far, then you are interested in technicality of things. Kindly enjoy your read. below and I welcome any constructive feedbacks.
NBA Introduction:
The National Basketball Association is a men's professional basketball league in North America, composed of 30 teams. It is one of the four major professional sports leagues in the United States and Canada, and is widely considered to be the premier men's professional basketball league in the world.
Find the major stats for the NBA in 2019 below:
Approach:
The approach for this project was to utilize specific player stats to predict their salaries using linear regression. I utilized the Lasso Algorithm to select the most important player statistics that affected a player salary.
Steps:
The following steps were taken in achieving my goals for this project.
Data scraping and cleaning.
Data and feature engineering.
Model validation and selection.
Model prediction and evaluation.
Data Scraping and Cleaning:
The data for this project was scraped from:
Basketball Reference: a website that contains basketball players stats .
I selected basketball player stats and salaries from 2017 - 2019 for this project.
I chose around 20 unique stats per player.
The python script that was used to scrape the data can be found on my github page.
Data and Feature Engineering:
After performing Lasso algorithm for feature selections I was able to select the 5 specific stats from the 20 unique stats that affected a players salaries. There are namely:
The player’s age
The minutes played per game
The defensive rebounds per game.
The personal fouls per game.
The average points made per game.
The image below shows a HeatMap of the selected NBA stats to the salaries. Notice that the salaries are logarithm transformed to properly scale with the features and all the stats are positively correlated to the salaries which implies that this problem is ideal for linear regression.
Model Validation and Selection:
I split my data into the train and validate sets before fitting the train data with my model. I got a score of 42% for my R-squared which implies the level of variability in my data.
Model Prediction and Evaluation:
After training my model with my train set, I got the predicted salaries for each player from 2017-2019. The insights of this project can be found in the Results and Insights section.
Conclusions:
Players and Team managers can better work together using the NBA prediction model when creating contracts and have a standardized way to evaluate impact.
Future Works:
Collect more NBA data from 2008 - 2019.
Include features on out-of-season Injuries, beginning of contracts for players, and brand value of a player etc.
Figure out ways for players to improve specific stats.
Below is my presentation for the project at Metis. Looking forward to your feedback.