class: center, middle, inverse, title-slide .title[ #
A Regularized Adjusted Plus-Minus Model in Soccer
] .subtitle[ ##
with Box Score Prior
] .author[ ###
Boyuan Zhang (New York University)
] .institute[ ###
Collaborators: Phong Hoang (Denison), Edvin Tran Hoac (Wesleyan) ] .institute[ ### Advisors: Prof. Kostas Pelechrinis (UPitt), Prof. Ron Yurko (CMU) ] .date[ ###
Carnegie Mellon Sports Analytics Conference
October 29, 2022 ] --- class: inverse center middle # How to evaluate the impact of individual players on their team’s performance within a period of time? --- # Plus-Minus (+/-) Model ## Concept: keeps track of the net changes in the score when a given player is either on or off the court ## Formula: .center[**Plus-Minus for Any Player = (Team Points Scored - Team Points Allowed) While That Player is On The Court**] ## Benefits: - Identify a player’s implied effect on his team’s goal difference while he is on the field - Data required to compute Plus-Minus are already available: only need the player lineups and substitutions records with the times at which they occurred, and goals scored and their corresponding times - Could be employed in any league, on any match, at any time ## Problems: - A player’s effect on his team’s goal differential will change as the makeup of teammates and opponents changes during the game --- # Adjusted Plus-Minus (APM) Model ## Concept: Over a given time period, adjusting the basic plus-minus results to account for both the teammates and the opponents on the court. ## Formula: `$$\frac{T_{total}}{T_{j}} \Delta S = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_ix_i + \cdots + \beta_nx_n + \epsilon$$` - `\(\Delta S\)`: Score differential, `\(S_{home} - S_{away}\)` - `\(T_j\)`: Length of time segment, the interval in which no substitutions or expulsions occurred, for `\(j = 1, \dots, R\)` segments - `\(\beta_0\)`: Average home team advantage over all teams in the competition - `\(\beta_i\)`: Influence of player `\(i\)` on goal differential, for `\(i = 1, \dots, N\)` players in competition - `\(x_i\)`: Player appearance index: - +1: Player `\(i\)` is playing at home - 0: Player `\(i\)` is not playing - -1: Player `\(i\)` is playing away --- # Adjusted Plus-Minus (APM) Model `$$\beta^* = \arg \min_{\beta} ||\Delta S - tX\beta||_2^2$$` ## Interpretation: APM ratings indicate how many additional points are contributed to a team’s scoring margin by a given player in comparison to the league-average player whose APM value is zero over the span of a typical game. ## Benefits: - Reflects the impact of each player on his team’s scoring margin after controlling for the strength of every teammate and every opponent during each minute he’s on the court ## Problems: - High variance, overfitting, and sensitive to the noise - Multicollinearity: Coaches prefer to use some groups of players more frequently or rarely since all players could not be on the court with every other teammate at the same time --- # Regularized Adjusted Plus-Minus (RAPM) Model ## Concept: Adding regularization into APM model to improve model accuracy. ## Formula: `$$\beta^* = \arg \min_{\beta} ||\Delta S - tX\beta||_2^2 + \lambda||\beta||_2^2$$` ## Benefits: - Significantly reduces standard errors in APM model and provide more accurate prediction results ## Problems: - Multicollinearity: Coaches prefer to use some groups of players more frequently or rarely since all players could not be on the court with every other teammate at the same time --- # Problematic Nature of Soccer <div class="figure" style="text-align: center"> <img src="https://www.intraocular.net/posts/how-augmented-apm-works/scoring_vs_subs_sports-1.png" alt="Matano, et al (2018). Augmenting adjusted plus-minus in soccer with FIFA ratings." width="50%" height="50%" /> <p class="caption">Matano, et al (2018). Augmenting adjusted plus-minus in soccer with FIFA ratings.</p> </div> -- - Low number of substitutions `\(\Longrightarrow\)` multicollinearity between features - Low number of scoring `\(\Longrightarrow\)` sparse response variable --- # Current State of Socceer Player Rating ## Action-based Player Ratings .center[![](image/action_based_player_rating.png)] --- # Current State of Socceer Player Rating ## Video Game Player Ratings .center[![](image/video_game_player_rating.png)] --- # Current State of Socceer Player Rating Matano, F., L. F. Richardson, T. Pospisil, C. Eubanks, and J. Qin (2018): “[`Augmenting Adjusted Plus-Minus in Soccer with FIFA Ratings`](https://arxiv.org/abs/1810.08032),” _Carnegie Mellon Sports Analytics Conference_. - Recasting APM into a Bayesian framework, and incorporating FIFA ratings into the prior distribution - Shown that Argumented APM predicts better than standard APM and a model using only FIFA ratings - Shown that Agumented APM decorrelates players that are highly collinear <img src="image/augumented_apm.png" width="50%" height="50%" style="display: block; margin: auto;" /> --- # RAxGPM with Box Score Prior ## Previous Formula: `$$\beta^* = \arg \min_{\beta} ||\Delta G - tX\beta||_2^2 + \lambda||\beta||_2^2$$` ## New Formula: `$$\beta^* = \arg \min_{\beta} ||\Delta \mathbb{xG} - tX\beta||_2^2 + \lambda||\beta - \beta_{prior}||_2^2$$` - `\(\Delta \mathbb{xG}\)`: Expected Goal differential, `\(\mathbb{xG}_{home} - \mathbb{xG}_{away}\)` - `\(\beta_{prior}\)`: Prior value for each player learned from **box score data** ## Idea: - More frequent response variable - Less collinearity between players --- # Model Pipeline .center[![](image/structure.png)] --- class: inverse center middle # Example: English Premier League 2021-22 Season --- # Data: Prior Stage ## Box Score Data: EPL season 2020-21 and 2021-22 - 3,420 observations and more than 180 features - only consider player with 900 minutes and above in corresponding season - after variable selection, around 30 features are left including the following criteria: - Scoring, creating, dribbling, passing and defensive actions, etc. ## FIFA Ratings 2022 Data: before starts of 2021-22 season - only collected one overall rating for each unique player --- # Data: Prior Stage ## Number of Players by Position Group <img src="presentation_files/figure-html/number_by_position-1.png" width="100%" /> --- # Model Training: Prior ## Feature Importance in Prior Value by Position .center[![](image/feature_importance.png)] --- # Model Training: Prior ## Comparing between FIFA rating and Prior value distribution <img src="presentation_files/figure-html/fifa_rating_prior_distribution-1.png" width="100%" /> --- # Data: RAPM Stage ## Match Summary data: EPL season 2021-22 - collected line-ups, substitutions, and every events for every game - create stint with time start and length: - a stint is created when there is a substitution, a red card, or a goal happens - 4000 stints over 380 matches ## Shooting Data: EPL season 2021-22 - collected shooting information with corresponding expected goals for every shot in the season --- # Data: RAPM Stage ## Match Summary data: EPL season 2021-22 <img src="image/stint_distribution.png" width="60%" height="60%" style="display: block; margin: auto;" /> --- # Data: RAPM Stage .center[![](image/rapm_table.png)] --- # Model Training: RAPM .center[![](image/model_training_1.png)] --- # Model Training: RAPM .center[![](image/model_training_2.png)] --- # Model Testing: Predictability .center[![](image/model_testing.png)] --- # Model Testing: Predictability 10-fold cross-validation with accuracy measured by RMSE: - **RAPM_only**: RAPM model without any prior - **RAPM_FIFA**: RAPM model with FIFA rating directly as prior - **RAPM_box**: RAPM model with prior created from box score data <img src="image/comparison_result.png" width="50%" height="50%" style="display: block; margin: auto;" /> --- # Result: Top 10 Players
Player
Team
Pos
Min
FIFA
FIFA_rating
Box
box_rating
1
Mohamed Salah
Liverpool
FW
2762
89
0.12174911
90.13389
0.1497441
2
Reece James
Chelsea
DF
1864
81
0.02262148
87.35088
0.1168662
3
İlkay Gündoğan
Manchester City
MF
1857
85
0.07182136
87.76583
0.1142307
4
Sadio Mané
Liverpool
FW
2819
89
0.11953468
87.65664
0.1121862
5
João Cancelo
Manchester City
DF
3227
86
0.06593877
88.39155
0.1061954
6
Paul Pogba
Manchester Utd
MF
1349
87
0.11021187
86.09682
0.1060792
7
Harry Kane
Tottenham
FW
3232
90
0.14763053
86.78122
0.1040146
8
Son Heung-min
Tottenham
FW
3006
89
0.13393589
86.69345
0.1037975
9
Alisson
Liverpool
GK
3240
89
0.11339948
87.27274
0.1029704
10
Phil Foden
Manchester City
FW
2128
84
0.05020127
87.30791
0.1029035
--- # Result: Top 10-20 Players
Player
Team
Pos
Min
FIFA
FIFA_rating
Box
box_rating
1
Ederson
Manchester City
GK
3330
89
0.10897867
88.14836
0.10238281
2
Rúben Dias
Manchester City
DF
2402
87
0.09109630
87.23700
0.09924433
3
Rodri
Manchester City
MF
2884
86
0.07158721
87.50666
0.09765550
4
Roberto Firmino
Liverpool
FW
990
85
0.08445760
85.44965
0.09521369
5
Gabriel Jesus
Manchester City
FW
1877
83
0.04402586
86.32616
0.09429259
6
Raheem Sterling
Manchester City
FW
2128
88
0.11026872
86.64789
0.09390230
7
Luis Díaz
Liverpool
FW
958
80
0.01084548
85.32550
0.09277333
8
Kevin De Bruyne
Manchester City
MF
2201
91
0.15216341
86.60226
0.09247912
9
Cristiano Ronaldo
Manchester Utd
FW
2456
91
0.15745070
85.19735
0.09123895
10
Aymeric Laporte
Manchester City
DF
2828
86
0.07031695
86.94222
0.08930280
--- # Result: Bottom 10 Players
Player
Team
Pos
Min
FIFA
FIFA_rating
Box
box_rating
1
Jeremy Ngakia
Watford
DF
903
69
-0.13144729
69.64005
-0.12464625
2
Shandon Baptiste
Brentford
MF
910
66
-0.17288570
70.38732
-0.11385921
3
Chris Wood
Burnley
FW
2694
79
0.01453329
70.97824
-0.09518406
4
Chris Wood
Newcastle Utd
FW
2694
79
0.01453329
70.97824
-0.09518406
5
Aaron Lennon
Burnley
MF
1551
73
-0.07654352
71.67621
-0.09010116
6
Ashley Young
Aston Villa
DF
1250
78
-0.00644885
72.19038
-0.08966248
7
Cucho
Watford
FW
1147
75
-0.04432508
72.11470
-0.08795021
8
Mads Roerslev
Brentford
DF
1240
67
-0.14916419
72.14783
-0.08398980
9
Marc Albrighton
Leicester City
FW
1132
76
-0.03443061
72.65074
-0.08230763
10
Jóhann Berg Guðmundsson
Burnley
MF
1102
75
-0.04856484
72.45273
-0.08168250
--- # Model Comparison ## Relation <img src="presentation_files/figure-html/output_relation-1.png" width="100%" /> --- # Model Comparison ## Distribution <img src="presentation_files/figure-html/output_comparison-1.png" width="100%" /> --- # Observation ## Position Value <img src="presentation_files/figure-html/output_position-1.png" width="100%" /> --- # Discussion ##Future Work - adding more data, increasing sample size to enhance model training and prediction accuracy - employing more supervised learning techniques to increase prior model quality - bagging to improve the stability and the accuracy of RAxGPM model - choice of response variable: could we find a better measurement for soccer? - implementing tracking data to build a more comprehensive model - constructing Bayesian framework to add uncertainty to the model through distribution ## Applications - predicting player’s market value and salary - optimal line-up recommendation - predicting game results and simulating league result - evaluating players across different leagues --- # References - Hvattum, L. (2019). A comprehensive review of plus-minus ratings for evaluating individual players in team sports. International Journal of Computer Science in Sport. - Matano, F., Richardson, L. F., Pospisil, T., Eubanks, C., & Qin, J. (2018). Augmenting adjusted plus-minus in soccer with FIFA ratings. arXiv preprint arXiv:1810.08032. - Rosenbaum, D. T. (2004, April 30). Picking the difference makers for the All-NBA Teams. 82games.com. Retrieved July 28, 2022, from https://www.82games.com/comm30.html - Sill, J. (2010). Improved NBA adjusted+/-using regularization and out-of-sample testing. In Proceedings of the 2010 MIT Sloan Sports Analytics Conference. - Zhang, B., Tran Hoac, E., Hoang P. (2022). A RAPM Model for Soccer Player Ratings. https://www.stat.cmu.edu/cmsac/sure/2022/showcase/soccer_rapm.html --- class: inverse center middle # Thanks! <br /> [@GaryBoyuanZhang](https://twitter.com/GaryBoyuanZhang)