Using Logistic Regression to predict shot results

Image by Author

In Part I, we took a deep dive into the data and trends of shots based on three key variables; the distance, angle and a categorical variable to identify headed shots. We developed an understanding of the distributions and probabilities associated with shots and goals by representing, transforming and visualizing the data. Here, we use this data to develop a model to predict goals.


Since our response variable (shot result) is categorical, we must apply classification methods to create a predictive model. To introduce this type of approach, let’s look at an illustrative example. …

Using event data to visualize trends and probabilities associated with shots

Image by Author

Here, I will introduce the concept of expected goals (xG) and conduct an exploration of event data. This will represent the first part of a three part series on expected goals. Part II will be centered around constructing a machine-learning model from this event data, while Part III will explore the applications, strengths and deficiencies of this model.

What is xG?

Results in football, more so than any other sport, can be greatly influenced by random moments and “luck.” Near misses, deflected shots, goalkeeping errors, and controversial refereeing decisions alone can dictate the final result. Football is a game of inches.

These effects…

Ian Dragulet

Football, Data, Physics and more Football

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store