Premier League OLS Regression

Regression model for Barclay's Premier League

View project onGitHub

Introduction

What is a better predictor of a team's success over the course of a season: home form , or away form? This model aims to answer this question by looking at the home and away records of every team in the Barclay's Premier League over the past four seasons and seeing which is a better predictor of a team's final league position. Furthermore, this model will look to differentiate between key features for both home and away, and discern any deeper meaning behind any divergence.

Methods and Libraries

This model uses Ordinary Least Squares regression. OLS regression is a method for estimating the unknown parameters in a linear regression model by minimizing the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear approximation. Data has been collected from WhoScored and TransferMarkt. Key libraries used include StatsModels, Pandas, NumPy, and PyPlot

Feature Selection & Results

Combined Home Away
Goal Differential,
Pass Completion %,
Fouls Per Game
Yellow Cards,
Red Cards
Penalty Differential

With these features selected for home and away respectively the average R-Squared for home form over the past four seasons was .90 versus .85 for away form, meaning that a team’s home form is a better predictor of final league position than a team’s away form.

Additionally, the model determined that Goal Differential, Pass Completion Percentage, and Fouls Per Game are important features regardless of home or away. Perhaps more interestingly, the model determined that Yellow Cards and Red Cards were important features for predicting home results while Penalty Differential was an important feature for predicting away results.

Authors & Additional Resources

Chris Clouten (@triplec1988)

For more in-depth analysis please view the full repository, which is complete with PDF presentation, analysis paper, and source code.