A New Approach to Quantify Soccer Players’ Readiness through Machine Learning Techniques

Mandorino, Mauro; Tessitore, Antonio; Leduc, Cédric; Persichetti, Valerio; Morabito, Manuel; Lacome, Mathieu

doi:10.3390/app13158808

Open AccessArticle

A New Approach to Quantify Soccer Players’ Readiness through Machine Learning Techniques

by

Mauro Mandorino

^1,2

,

Antonio Tessitore

²

,

Cédric Leduc

^3,4

,

Valerio Persichetti

¹,

Manuel Morabito

¹ and

Mathieu Lacome

^1,5,*

¹

Performance and Analytics Department, Parma Calcio 1913, 43121 Parma, Italy

²

Department of Movement, Human and Health Sciences, University of Rome “Foro Italico”, 00135 Rome, Italy

³

Carnegie Applied Rugby Research (CARR) Center, Institute for Sport, Physical Activity and Leisure, Carnegie School of Sport, Leeds Beckett University, Leeds LS6 3QS, UK

⁴

Sport Science and Medicine Department, Crystal Palace FC, London SE25 6PU, UK

⁵

Sport Expertise and Performance Laboratory, French National Institute of Sports (INSEP), 75012 Paris, France

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(15), 8808; https://doi.org/10.3390/app13158808

Submission received: 22 June 2023 / Revised: 27 July 2023 / Accepted: 28 July 2023 / Published: 30 July 2023

(This article belongs to the Special Issue Analytics in Sports Sciences: State of the Art and Future Directions)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Previous studies have shown that variation in PlayerLoad (PL) could be used to detect fatigue in soccer players. Machine learning techniques (ML) were used to develop a new locomotor efficiency index (LEI) based on the prediction of PL. Sixty-four elite soccer players were monitored during an entire season. GPS systems were employed to collect external load data, which in turn were used to predict PL during training/matches. Random Forest Regression (RF) produced the best performance (mean absolute percentage error = 0.10 ± 0.01) and was included in further analyses. The difference between the PL value predicted by the ML model and the real one was calculated, individualized for each player using a z-score transformation (LEI), and interpreted as a sign of fatigue (negative LEI) or neuromuscular readiness (positive LEI). A linear mixed model was used to analyze how LEI changed according to the period of the season, day of the week, and weekly load. Regarding seasonal variation, the lowest and highest LEI values were recorded at the beginning of the season and in the middle of the season, respectively. On a weekly basis, our results showed lower values on match day − 2, while high weekly training loads were associated with a reduction in LEI.

Keywords:

soccer; fatigue; machine learning; training load; PlayerLoad

1. Introduction

With the increasing competitiveness of soccer at the elite level and thus increasing in-game and training intensity [1], load and associated player fatigue monitoring have become mainstream. To quantify training load, wearables sensors such as micro electromechanical systems (MEMSs) containing global positioning system (GPS) capabilities and accelerometers are becoming widely used by practitioners. Despite advancements in knowledge regarding external load monitoring, challenges remain regarding the assessment of players’ responses to load and associated fatigue in elite soccer environments [2]. Common fatigue monitoring methods in soccer encompass subjective (e.g., wellness questionnaires) and objective measures (e.g., Countermovement Jump [CMJ]) [3]. Although, several issues regarding such tests, such as players’ buy-in, lack of a direct effect on on-pitch performance, and the lack of specificity in the tests, remain [4]. One avenue that is currently being explored to improve fatigue monitoring strategies is the use of the triaxial accelerometers within MEMSs to capture the three-dimensional movements that are commonly quantified using a vector termed PlayerLoad^TM (PL) [5,6]. PL is generally defined as the magnitude of changes in acceleration and has been used as a global indicator of body load or to detect a fatigue condition in soccer players.

For example, using PL collected during standardized runs have been shown to be valid, reliable, and sensitive methods to monitor players’ neuromuscular status [4,7]. Such an approach has the advantage of being able to be conducted in a time-efficient manner as well as being task-specific. Although, the integration of such a procedure during warm-ups limit the decisions practitioners can make with respect to potential training modifications after such a test. Instead, Rowell et al. [8] monitored PL during standardized small-sided games to potentially detect neuromuscular fatigue. After showing that PL-derived metrics could be used reliably, they observed an increase in PL per minute during small-sided games after a high previous weekly training load. The author advocated that such a change in players’ activity might be related to potential fatigue status. Although, such a procedure requires a high degree of standardization (e.g., rules, pitch dimensions) ascertaining the changes observed is not reliant on contextual variables as previous values are used to make comparisons. This standardization is mostly unseen in elite football as tactical and physical needs usually drive the design of these games. As such, there is still a need for more robust monitoring variables and/or analyses that could be used with confidence, irrespective of daily training contexts [9].

In this regard, Lacome et al. [9] proposed the notion of assessing the efficacy of a direct comparison between a predictive a model (i.e., multivariate individual regression predicting heart rate responses from external load variables) and the actual value (i.e., heart rate) measured as a surrogate measure of player fitness. An advantage of such an approach is that it allows one to make an assumption about a player’s readiness without testing them, therefore overcoming the above-mentioned limitations of player monitoring strategies. While this approach seems promising, it only assesses player fitness. Consequently, developing similar approaches to estimate players’ neuromuscular status could be worthwhile. Considering how PL might be affected by player neuromuscular status, it seems pertinent to examine the potential use of this variable in the context of a direct comparison between actual and predictive values.

For this reason, the main objective of this study was to develop a novel method to assess player neuromuscular status using machine learning techniques (ML). First, the ability to predict PL based on the external load data collected during training and matches was assessed. The second aim of this study was to analyze how this new approach could fluctuate during the course of the entire season, within the week, and in relation to the cumulative weekly training load.

2. Materials and Methods

2.1. Participants

The study was conducted during the 2022/23 soccer season, involving a total of sixty-four elite male players from the first team (n: 24, age: 23.9 ± 4.7, body mass: 80.0 ± 6.5, height: 184.6 ± 5.4), U19 team (n: 19, age: 18.4 ± 0.7, body mass: 75.5 ± 8.8, height: 181.3 ± 6.8), and U18 team (n: 21, age: 17.5 ± 0.49, body mass: 70.3 ± 5.3, height: 182.1 ± 7.4) of a professional Italian soccer club. Data were obtained from the club as the players were routinely monitored throughout the course of the season. Therefore, requesting ethics committee clearance, as one would in usual research procedures, was not required [10]. Nevertheless, to guarantee team and player confidentiality, all data were anonymized before the analysis, and the research was conducted in accordance with the Declaration of Helsinki.

2.2. External Load Data Collection

Players’ external load was recorded during 511 training sessions (first team, n = 199, duration: 67.7 ± 14.1 min; U19: 166, duration: 85.6 ± 18.2 min; U18: 146, duration: 85.6 ± 13.5 min) and 116 official matches (first team, n = 38; U19, n = 40; U18, n = 38). An average of 169 ± 31 observations per player were recorded. Subjects who registered less than 85 observations were excluded from further analysis in order to remove players who only partially played during the season due to injuries or any other reason (e.g., transfer to another club). Thus, a total of 10,857 individual observations were collected. External training/match load was obtained using the WIMU Pro system (RealTrack Systems, Almería, Spain), which consists of various inertial sensors (three 3D gyroscopes with 8000°/s full-scale output range, a 3D magnetometer, a 10-Hz global positioning system, a 20-Hz ultra-wide band) whose validity and reliability have been previously tested [11,12,13]. The GPS devices were placed between the players’ scapulae through a tight vest. Eighteen different external load metrics were extracted (Table 1).

3. Data Analysis

The main objective of this study was to develop a novel method to assess player neuromuscular status using machine learning techniques (ML) using external load data to predict training/match players’ PL. Therefore, multiple ML models were built using a training dataset, where each example represented the player’s training session and consisted of a vector of features describing the players’ external load. The different ML models were employed to predict the training/match PL, which was our target variable. The PL accumulated during training, and in matches, PL was calculated as suggested by RealTrack Systems and as presented in a previous study [11].

3.1. Algorithm Selection

Features (Table 1) such as external load data, player position (center-back, full-back, midfielder, winger, forward), and session type (training, match) were considered as predictors in the ML models and modeled on the target variable (PL). The algorithms selected were as follows:

Extreme gradient boosting (XGBoost)
Random Forest Regression (RF)
Linear Regression (LR)

Describing the underlying mathematical functions of the ML models is not representative of the purpose of this study. However, XGBoost (i.e., boosting algorithm) and RF (i.e., bagging algorithm) are based on ensemble learning methods, and they were selected for their efficiency in regression problems and their ability to identify non-linear interactions in high-dimensional data [14,15,16,17]. In contrast, LR was selected for its simplicity in identifying the linear relationship between the independent and dependent variables.

3.2. Data Pre-Processing

Pre-processing techniques were employed to maximize the performance of each model. First of all, partial (i.e., the player trained less than 90% of the total session), individual (i.e., the player trained separately from the team to do an individual session), and rehabilitation (i.e., the players performed a training inside a rehabilitation program) sessions were excluded from the analysis in order to remove any confounding factors that could affect the results. Therefore, only training sessions that were fully completed by the players were considered. Categorical predictors (player position, type of training session) were subjected to a one-hot coding process before being inserted into the ML models. In addition, prior to linear regression training, all of the features were normalized using Min Max Scaler (MMS). Normalization ensured that all of the features fairly contributed to the learning process [18]. XGBoost and RF, on the other hand, as tree-based algorithms, did not require normalization [19].

3.3. Feature Elimination, Hyperparameter Tuning, and Cross-Validation

Recursive Feature Elimination (RFE) was performed to remove correlated features that could increase the risk of overfitting [20]. The RFE algorithm was implemented to identify the most relevant features for predicting PL values. After this initial process, a Randomized Search was used to tune hyperparameters in RF and XGBoost. LR does not present any tunable hyperparameters. Hyperparameters were tuned using three-fold cross-validation, and the combination of hyperparameters that yielded the best performance across each fold (i.e., mean absolute percentage error [MAPE]) was selected. RFE and Randomized Search were performed on 20% of the dataset, while the remaining 80% was used to test the different models using five-fold cross-validation. All analyses were performed using the Anaconda (Version 3.9.12, Anaconda Inc., New York, NY, USA) and Python libraries.

3.4. Model Evaluation

The suitability of each model was assessed using the root mean squared error (RMSE) and MAPE. Low RMSE and MAPE values indicate that a model has good predictive abilities. In particular, a MAPE below 10% indicates very accurate forecasting, between 10% and 20% indicates good forecasting, between 20% and 50% indicates reasonable forecasting, and higher than 50% indicates inaccurate forecasting [21]. In addition, the performance of the models was compared with two baselines: baseline B1 returned to the most frequent class label in the observed training set; baseline B2 generated predictions uniformly at random from the classes observed in the training set. The model which produced the best performance was subjected to feature importance analysis. Moreover, to understand the applicability of the model in a real-world scenario, we simulated the course of the season in a similar manner to that suggested by Rossi et al. [20]. At each day d_i, the training set T_i consisted of all the training sessions collected up to that day, and the model built was used to predict the PL in the day d_i + 1. At day d_i + 1, we assessed the accuracy of the model using MAPE and RMSE.

3.5. Calculation of Locomotor Efficiency Index

The differences between the PL values predicted by the ML model and the real PL values (ΔPL) were calculated. We assumed that, if the real PL value is lower than the predicted value (positive ΔPL), the player showed an ability to maximize locomotor activity and minimize the load imposed on the body compared to the value predicted by the model. In contrast, a negative ΔPL indicates that the player cumulated a higher PL than the expected value; therefore, this was interpreted as a decrease in the player’s locomotor efficiency [22]. Given the high individual variability of PL [23], the ΔPL was reported in relation to the individual’s absolute average and normal variation throughout the entire season using a z-score transformation. Individual z-scores were used to define our Locomotor Efficiency Index (LEI) and were calculated as follows:

\frac{I n d i v i d u a l Δ P L - I n v i d i u a l Δ P L a v e r a g e}{i n d i v u a l Δ P L s t a n d a r d d e v i a t i o n}

Also, in this case, a decrease in z-score values (LEI) was interpreted as an individual reduction in locomotor efficiency, whereas an increase in z-score values was interpreted as an individual increase in locomotor efficiency.

3.6. Analysis of the Relationship between LEI and Weekly Training Load

One of the aims of the current study was to understand whether the locomotor index was sensitive to variations in the weekly training load. For this purpose, the weekly training load was calculated as the rolling sum of the 7 previous days for the following external load variables: total distance, distance > 19.8 km/h, distance > 25.2 km/h, number of accelerations > 3.5 m/s², number of decelerations < −3.5 m/s². Since weekly training load can be influenced by a player’s position and profile, weekly load was also individualized in relation to the players’ season average and standard deviation during the season by adopting z-score transformation. Considering the z-score variations, the following cutoffs were identified [24], and different classifications were made:

Decrease in weekly training load (D-WL): the player registered a z-score < −1.
Stability in the weekly training load (S-WL): the player registered a z-score between −1 and 1.
Increase in weekly training load (I-WL): the player registered a z-score > 1.

All of the steps previously performed are summarized in Figure 1.

4. Statistical Analysis

A within-subject linear mixed model was employed to analyze mean differences in LEI with respective confidence intervals of 95% in relation to the period of the season (first part = July, August; second part = September, October; third part = November, December; fourth part = January, February; fifth part = March, April). The linear mixed model was also used to analyze the variation in LEI in relation to the day of the week. The days of the week were classified according to the days preceding a match (MD-4; MD-3; MD-2; MD-1; MD). To avoid confounding factors with respect to MD, only players who played 90 min were considered. In addition, we examined how LEI changed as a function of variation in workload (D-WL; S-WL; I-WL) cumulated in the previous 7 days (total distance, distance > 19.8 km/h, distance > 25.2 km/h, and number of accelerations, and number of decelerations). Considering that load management and load distribution over the week changed in relation to the team’s/coach’s philosophy, we limited these analyses to the first team only. First team weekly load distribution is presented in Table 2. When statistically significant differences were found, the least significant difference approach to multiple comparisons was adopted (as suggested in [25]). Standardized effect sizes, estimated from the ratio of the mean difference to the pooled standard deviation, were also calculated. Effect size (ES) values of 0.2, 0.5, and 0.8 were interpreted as small, moderate, and large differences, respectively [26]. The statistical analyses were performed using the Statistical Package for the Social Science, version 28.0 (SPSS Inc., Chicago, IL, USA). The threshold for statistical significance was set at p < 0.05.

5. Results

Among the selected algorithms, RF exhibited the best performance (MAPE = 0.10 ± 0.01; RMSE = 14.90 ± 5.18). The results of the other models are presented in Table 3. After showing the best performance after five-fold cross-validation, RF was employed in further analyses. After RFE, seven external load metrics were selected and subjected to feature importance analysis (Figure 2). The evolution of the quality of the model during the season is presented in Figure 3.

Evolution of LEI during the season: the Nov–Dec period showed the highest LEI value, while the Jul–Aug period showed the lowest. The 95% CI for this change was 0.48–0.67 (ES = 0.92; p < 0.01) (Figure 4).

Relationship between LEI and day of the week: the highest value was recorded on MD-4, while the lowest on MD-2 and MD-1. The 95% CI for this change was 0.36–0.56 (ES = 0.81; p < 0.01) and 0.25–0.54 (ES = 0.42) (Figure 5).

Relationship between LEI and weekly workload: Among the different workload metrics considered, only total distance, distance > 25.2 km/h, and number of accelerations (> 3.5 m/s²) showed significant differences. For the total distance, the condition D-WL produced the highest LEI value compared to the condition I-WL. The 95% CI for this change was 0.38–0.77 (ES = 0.63; p < 0.01) (Figure 6a). Similarly, for the distance > 25.2 km/h, the condition D-WL produced the highest LEI value compared to the condition I-WL. The 95% CI for this change was 0.05–0.27 (ES = 0.82; p < 0.01) (Figure 6c). For the number of accelerations, the condition D-WL produced the highest LEI value compared to the condition I-WL. The 95% CI for this change was 0.04–0.39 (ES = 0.32; p < 0.01) (Figure 6d).

6. Discussion

The main objective of this study was to develop a novel method to assess player neuromuscular status using machine learning techniques (ML) through using a combination of predicted players’ PL from external load and real PL. The second aim of this paper was to assess whether this index was sensitive to the day of the week, period of the season, and/or variations in training load. According to our hypothesis, uncoupling between real and predicted PL could be used to detect the neuromuscular readiness of players. Specifically, the LEI developed in this study proved to be sensitive to the period of the season, the day of the week, and the load cumulated in the previous week.

6.1. ML Model Development

External loads collected during training and matches were used to predict players’ PL. Among the different models selected, RF produced the best performance (Table 3), outperforming the two baselines (B1 and B2). RF is an ensemble of decision tree and, unlike the individual learners, it is able to reduce overfitting problems in decision trees, reduce bias and variance, and consequently improve predictive ability [27]. Moreover, RF is able to detect non-linear relationships in high-dimensional data [14]. Since it showed the best performance, RF was used for further analyses. Feature importance analysis revealed that, although total distance was identified as the most important feature for predicting PL (in agreement with previous studies [28]), it was not the only feature (Figure 2). This result confirmed that using only total distance could be reductive since other activities (e.g., accelerations, decelerations) could also have an impact on PL. To understand the utility of the model in a real-world scenario, we simulated the course of the season, as suggested by Rossi et al. [20]. Starting data collection from the first day of the season, the ML model minimized the error and reached stable performance after 23 weeks (Figure 3). This implies that adequate data collection must be ensured in the development of the model before this approach can be used with good accuracy.

6.2. Relationship between LEI and Period of the Season and Day of the Week

The LEI was calculated as the difference between the PL value predicted by the ML model and the real PL value (ΔPL). Similar to the concept presented in previous studies [9,22,29], comparing players locomotor activities (external load) with internal load (e.g., HR) or whole-body load (e.g., PL) can be used to track changes in fitness or locomotor efficiency. In this paper, a positive difference between locomotor activity (predicted PL) compared with actual PL was interpreted as an improvement in locomotor efficiency. In contrast, a negative value was interpreted as a condition in which the player was subjected to a greater body load compared to the cumulated external load. Considering the high individual variability of PL [23], the ΔPL was individualized using z-score transformation (LEI). The LEI was analyzed in relation to the period of the season, the day of the week, and in relation to the variation in load in the previous week. We were aware that load management could change in relation to the different styles of the teams and coaches; thus, to remove any confounding factors, we limited these analyses to the first team. Therefore, U19 and U18 data were only used to increase the size of the dataset and the quality of the ML predictions.

Regarding the trend of LEI during the season (Figure 4), our results revealed that this index changes according to the period of the season, showing a small to large effect (ES 0.18–0.92). In particular, the lowest values were observed during pre-season (July–August), while the highest values were observed in the period November–December period (95% CI 0.48–0.67; ES = 0.92). The most interesting finding is that the LEI showed a significant reduction at the end of the season (March–April). To the best of our knowledge, few studies have examined the progression of fatigue over an entire season. Similar to our study, Nobari et al. [30,31] found lower values of fatigue and delayed- onset muscle soreness (DOMS) in the mid-season period than in the pre- and end-season periods. In this regard, we can speculate that the low LEI value in the pre-season period is related to the poor physical conditions that characterize the beginning of the season. During this period, players are exposed to sudden and severe increases in training load, which could lead to acute fatigue [24]. Subsequently, the LEI increases following the players’ adaptation to training. Therefore, in the middle of the season, players increase their tolerance to a high workload due to improved physical capacity. Instead, the decrease in LEI at the end of the season might be related to the cumulation of training load, stress, and fatigue over the course of the season and could be used to fine-tune the workload of players in the second part of the season.

Changes in the LEI were also investigated with respect to the day of the week (Figure 5). The focus of the first days of the weekly microcycle is generally to determine player overload (Table 2). Particularly, strength-oriented sessions were completed on MD-4, during which a high number of accelerations and decelerations were encouraged, while extensive, metabolic-oriented sessions were carried out on MD-3, with a focus on high intensity running (Table 2). In contrast, during MD-2, the main focus is on active recovery and regeneration, while MD-1 is designed to further favor recovery (using a combination of recovery methods and training load management) and maximize neuromuscular freshness (e.g., explosively in the gym, agility on the pitch, short training session with intensity). It is worth noting that the LEI follows the trend of the weekly training load distribution (ES 0.17–0.81) (Figure 5). The highest LEI values were registered on MD-4, MD-3, and MD-1. On MD-4, the players returned to training after having a recovery session (MD+1) and a day-off (MD+2). MD-1 is the day in which recovery strategies (e.g., reduction in training load) intended to maximize players’ neuromuscular freshness are deployed. In contrast, the lowest LEI value was registered on MD-2, after two hard training sessions, and on MD, when players were exerting maximal effort (Table 2). To avoid confounding factors with respect to MD, only players who played the entire game were considered for analysis.

6.3. Effect of Training Load Variations on LEI

To better understand the effects of training load, we examined the relationship between LEI and weekly load variation. In particular, we considered the workload accumulated in the previous 7 days (total distance, distance > 19.8 km/h, distance > 25.2 km/h, and number of accelerations and number of decelerations). Considering that workload can change from player to player depending on different factors (e.g., playing position), it was individualized using z-score transformation. Three different conditions were identified: (D-WL), the player experienced a decrease in weekly load compared to normal (z-score < −1); (S-WL), the load remained within a normal range (z-score between −1 and 1); and (I-WL), the player experienced an increase in weekly load compared to normal (z-score > 1). When players registered an increase in weekly load (I-WL) for total distance (ES 0.20–0.63), distance > 25.2 km/h (ES 0.56–0.82), and number of accelerations (ES 0.10–0.32), a significant decrease in LEI was observed. (Figure 6). A similar approach was adopted in a previous study, wherein the authors found an association between a higher acute load and an increase in fatigue [24]. As stated by Hader et al. [32], a high volume of total distance, a very high-intensity running distance, and the number of accelerations could be associated with repeated eccentric muscle contractions, muscle damage, and disturbances in the biochemical milieu, consequently increasing the fatigue state. As such, the use of the LEI could support daily practitioners in assessing players that are responding properly to variations in training load and those that present decreased neuromuscular status; in turn, this would support better workload management for elite soccer players.

Although the approach developed in the current study showed encouraging results, some limitations must be addressed. First, although the LEI showed an interesting association with the period of the season, day of the week, and weekly load variation, this metric was not corroborated by any objective fatigue markers (e.g., creatine kinase, testosterone–cortisol ratio, Countermovement Jump outputs such as flight time–contraction time ratio). For this reason, future studies should further investigate the relationship between the LEI and objective markers of neuromuscular fatigue. In addition, future research should involve investigating how the interaction of different external load parameters affect the LEI. Second, the ML showed good accuracy with at least 23 weeks of data within the training dataset, implying that predictions made at the beginning of the season could present a certain degree of error. Therefore, future studies could extend the analysis to multiple seasons and increase the size of the dataset.

7. Practical Applications

The adoption of ML techniques could be used to calculate the LEI for each player and quantify their neuromuscular readiness. This approach could have several advantages: (1) assessment of neuromuscular freshness on a daily basis; (2) the possibility to detect fatigue through an “invisible” approach without having to subject players to testing. Knowing the daily readiness of players could help coaches and physical trainers to improve recovery strategies design and optimize training load management.

8. Conclusions

The present study aimed to develop a new locomotor efficiency index based on a machine learning approach (ML). Considering the inherent challenges associated with monitoring player status in elite soccer, the present study opens up new possibilities. Indeed, the initial findings of this study showed clear fluctuations in the locomotor efficiency index according to the period of the season, the day within the weekly micro cycle, and variations in training load. However, future studies are needed to further validate this approach against neuromuscular fatigue gold standard measurement.

Author Contributions

M.M. (Mauro Mandorino), A.T., C.L. and M.L. were responsible for the design of the study. M.M. (Mauro Mandorino) and M.L. conducted the analyses. M.M. (Mauro Mandorino), M.M. (Manuel Morabito) and V.P. were responsible for data collection. All authors contributed to the interpretation of the findings and had full access to all data. M.M. (Mauro Mandorino) and M.L. produced the first draft of the paper, which was critically revised by C.L. and A.T. The final manuscript was approved by all authors. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Approval for data collection was obtained from the club (as player’s data were routinely collected over the course of the season). The study was conducted in accordance with the Declaration of Helsinki (2013).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data are available upon reasonable request.

Acknowledgments

The authors would like to thank the club Parma Calcio 1913 (medical staff, coaching staff, and all players) for their participation in the study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Barnes, C.; Archer, D.T.; Hogg, B.; Bush, M.; Bradley, P. The Evolution of Physical and Technical Performance Parameters in the English Premier League. Int. J. Sports Med. 2014, 35, 1095–1100. [Google Scholar] [CrossRef] [PubMed]
Carling, C.; Lacome, M.; McCall, A.; Dupont, G.; Le Gall, F.; Simpson, B.; Buchheit, M. Monitoring of Post-Match Fatigue in Professional Soccer: Welcome to the Real World. Sports Med. 2018, 48, 2695–2702. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Silva, J.R.; Rumpf, M.C.; Hertzog, M.; Castagna, C.; Farooq, A.; Girard, O.; Hader, K. Acute and Residual Soccer Match-Related Fatigue: A Systematic Review and Meta-Analysis. Sports Med. 2018, 48, 539–583. [Google Scholar] [CrossRef] [PubMed]
Leduc, C.; Tee, J.; Lacome, M.; Weakley, J.; Cheradame, J.; Ramirez, C.; Jones, B. Convergent Validity, Reliability, and Sensitivity of a Running Test to Monitor Neuromuscular Fatigue. Int. J. Sports Physiol. Perform. 2020, 15, 1067–1073. [Google Scholar] [CrossRef] [PubMed]
Cormack, S.J.; Mooney, M.G.; Morgan, W.; McGuigan, M.R. Influence of Neuromuscular Fatigue on Accelerometer Load in Elite Australian Football Players. Int. J. Sports Physiol. Perform. 2013, 8, 373–378. [Google Scholar] [CrossRef]
Fitzpatrick, J.F.; Hicks, K.M.; Russell, M.; Hayes, P.R. The Reliability of Potential Fatigue-Monitoring Measures in Elite Youth Soccer Players. J. Strength Cond. Res. 2021, 35, 3448–3452. [Google Scholar] [CrossRef]
Garrett, J.; Graham, S.R.; Eston, R.G.; Burgess, D.J.; Garrett, L.J.; Jakeman, J.; Norton, K. A Novel Method of Assessment for Monitoring Neuromuscular Fatigue in Australian Rules Football Players. Int. J. Sports Physiol. Perform. 2019, 14, 598–605. [Google Scholar] [CrossRef]
Rowell, A.E.; Aughey, R.J.; Clubb, J.; Cormack, S.J. A Standardized Small Sided Game Can Be Used to Monitor Neuromuscular Fatigue in Professional A-League Football Players. Front. Physiol. 2018, 9, 1011. [Google Scholar] [CrossRef]
Lacome, M.; Simpson, B.; Broad, N.; Buchheit, M. Monitoring Players’ Readiness Using Predicted Heart-Rate Responses to Soccer Drills. Int. J. Sports Physiol. Perform. 2018, 13, 1273–1280. [Google Scholar] [CrossRef]
Winter, E.M.; Maughan, R.J. Requirements for Ethics Approvals. J. Sports Sci. 2009, 27, 985. [Google Scholar] [CrossRef]
Gómez-Carmona, C.D.; Pino-Ortega, J.; Sánchez-Ureña, B.; Ibáñez, S.J.; Rojas-Valverde, D. Accelerometry-Based External Load Indicators in Sport: Too Many Options, Same Practical Outcome? Int. J. Environ. Res. Public Health 2019, 16, 5101. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Gómez-Carmona, C.D.; Bastida-Castillo, A.; García-Rubio, J.; Ibáñez, S.J.; Pino-Ortega, J. Static and Dynamic Reliability of WIMU PRO^TM Accelerometers According to Anatomical Placement. Proc. Inst. Mech. Eng. Part P J. Sports Eng. Technol. 2019, 233, 238–248. [Google Scholar]
Muñoz-López, A.; Granero-Gil, P.; Pino-Ortega, J.; De Hoyo, M. The Validity and Reliability of a 5-Hz GPS Device for Quantifying Athletes’ Sprints and Movement Demands Specific to Team Sports. J. Hum. Sport Exerc. 2017, 12, 156–166. [Google Scholar] [CrossRef]
Kensert, A.; Alvarsson, J.; Norinder, U.; Spjuth, O. Evaluating Parameters for Ligand-Based Modeling with Random Forest on Sparse Data Sets. J. Cheminformatics 2018, 10, 49. [Google Scholar] [CrossRef] [Green Version]
Kiangala, S.K.; Wang, Z. An Effective Adaptive Customization Framework for Small Manufacturing Plants Using Extreme Gradient Boosting-XGBoost and Random Forest Ensemble Learning Algorithms in an Industry 4.0 Environment. Mach. Learn. Appl. 2021, 4, 100024. [Google Scholar] [CrossRef]
Mandorino, M.; Figueiredo, A.J.; Cima, G.; Tessitore, A. Predictive Analytic Techniques to Identify Hidden Relationships between Training Load, Fatigue and Muscle Strains in Young Soccer Players. Sports 2021, 10, 3. [Google Scholar] [CrossRef]
Mandorino, M.; Figueiredo, A.J.; Cima, G.; Tessitore, A. Analysis of Relationship between Training Load and Recovery Status in Adult Soccer Players: A Machine Learning Approach. Int. J. Comput. Sci. Sport 2022, 21, 1–16. [Google Scholar] [CrossRef]
Singh, D.; Singh, B. Investigating the Impact of Data Normalization on Classification Performance. Appl. Soft Comput. 2020, 97, 105524. [Google Scholar] [CrossRef]
Mandorino, M.; Figueiredo, A.J.; Cima, G.; Tessitore, A. A Data Mining Approach to Predict Non-Contact Injuries in Young Soccer Players. Int. J. Comput. Sci. Sport 2021, 20, 147–163. [Google Scholar] [CrossRef]
Rossi, A.; Pappalardo, L.; Cintia, P.; Iaia, F.M.; Fernández, J.; Medina, D. Effective Injury Forecasting in Soccer with GPS Training Data and Machine Learning. PLoS ONE 2018, 13, e0201264. [Google Scholar] [CrossRef] [Green Version]
Pao, H.-T. Forecasting Energy Consumption in Taiwan Using Hybrid Nonlinear Models. Energy 2009, 34, 1438–1446. [Google Scholar] [CrossRef]
Buchheit, M.; Lacome, M.; Cholley, Y.; Simpson, B.M. Neuromuscular Responses to Conditioned Soccer Sessions Assessed via GPS-Embedded Accelerometers: Insights into Tactical Periodization. Int. J. Sports Physiol. Perform. 2018, 13, 577–583. [Google Scholar] [CrossRef] [PubMed]
Barrett, S.; Midgley, A.W.; Towlson, C.; Garrett, A.; Portas, M.; Lovell, R. Within-Match PlayerLoad^TM Patterns during a Simulated Soccer Match: Potential Implications for Unit Positioning and Fatigue Management. Int. J. Sports Physiol. Perform. 2016, 11, 135–140. [Google Scholar] [CrossRef]
Hulin, B.T.; Gabbett, T.J.; Lawson, D.W.; Caputi, P.; Sampson, J.A. The Acute: Chronic Workload Ratio Predicts Injury: High Chronic Workload May Decrease Injury Risk in Elite Rugby League Players. Br. J. Sports Med. 2016, 50, 231–236. [Google Scholar] [CrossRef] [Green Version]
Thorpe, R.T.; Strudwick, A.J.; Buchheit, M.; Atkinson, G.; Drust, B.; Gregson, W. Tracking Morning Fatigue Status across In-Season Training Weeks in Elite Soccer Players. Int. J. Sports Physiol. Perform. 2016, 11, 947–952. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cohen, J. Quantitative Methods in Psychology: A Power Primer. Psychol. Bull. 1992, 112, 1155–1159. [Google Scholar] [CrossRef]
Zhang, C.-X.; Wang, G.-W.; Zhang, J.-S. An Empirical Bias–Variance Analysis of DECORATE Ensemble Method at Different Training Sample Sizes. J. Appl. Stat. 2012, 39, 829–850. [Google Scholar] [CrossRef]
Scott, B.R.; Lockie, R.G.; Knight, T.J.; Clark, A.C.; de Jonge, X.A.J. A Comparison of Methods to Quantify the In-Season Training Load of Professional Soccer Players. Int. J. Sports Physiol. Perform. 2013, 8, 195–202. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Buchheit, M.; Simpson, B.M. Player-Tracking Technology: Half-Full or Half-Empty Glass? Int. J. Sports Physiol. Perform. 2017, 12, S2-35–S2-41. [Google Scholar] [CrossRef] [Green Version]
Nobari, H.; Aquino, R.; Clemente, F.M.; Khalafi, M.; Adsuar, J.C.; Pérez-Gómez, J. Description of Acute and Chronic Load, Training Monotony and Strain over a Season and Its Relationships with Well-Being Status: A Study in Elite under-16 Soccer Players. Physiol. Behav. 2020, 225, 113117. [Google Scholar] [CrossRef]
Nobari, H.; Fani, M.; Clemente, F.M.; Carlos-Vivas, J.; Pérez-Gómez, J.; Ardigò, L.P. Intra-and Inter-Week Variations of Well-Being across a Season: A Cohort Study in Elite Youth Male Soccer Players. Front. Psychol. 2021, 12, 671072. [Google Scholar] [CrossRef] [PubMed]
Hader, K.; Rumpf, M.C.; Hertzog, M.; Kilduff, L.P.; Girard, O.; Silva, J.R. Monitoring the Athlete Match Response: Can External Load Variables Predict Post-Match Acute and Residual Fatigue in Soccer? A Systematic Review with Meta-Analysis. Sports Med. Open 2019, 5, 48. [Google Scholar] [CrossRef] [PubMed] [Green Version]

Figure 1. Flow chart of steps performed to calculate the LEI.

Figure 2. Features selected after RFE and presented according to feature importance analysis per formed during five-fold cross-validation.

Figure 3. Evolution of the quality of RF model (RMSE, MAPE) over the course of the season. w = week; RF = random forest regression; RMSE = root mean squared error; MAPE = mean absolute percentage error.

Figure 4. Analysis of LEI in relation to the period of the season. Data are presented as mean ± SD (standard deviation). 1 denotes sig. difference vs. Sep−Oct; 2 denotes sig. difference vs. Nov−Dec; 3 denotes sig. difference vs. Jan−Feb; 4 denotes sig. difference vs. Mar−Apr.

Figure 5. Analysis of LEI in relation to the day of the week. Data are presented as mean ± SD (standard deviation). 3 denotes sig. difference vs. MD; 4 denotes sig. difference vs. MD−1; 5 denotes sig. difference vs. MD−2.

Figure 6. Analysis of the relationship between variation in weekly training load (D−WL, S−WL, I−WL) and LEI. (a) weekly total distance; (b) weekly distance > 19.8 km/h; (c) weekly distance > 25.2 km/h; (d) weekly number of accelerations > 3.5 m/s²; (e) weekly number of decelerations < −3.5 m/s². Data are presented as mean ± SD (standard deviation). (a) Analysis of LEI in relation to weekly load variation (total distance). 2 denotes sig. difference vs. S−WL; 3 denotes sig. difference vs. I−WL. (b) Analysis of LEI in relation to weekly load variation (distance > 19.8 km/h). (c) Analysis of LEI in relation to weekly load variation (distance > 25.2 km/h). 2 denotes sig. difference vs. S−WL; 3 denotes sig. difference vs. I−WL. (d) Analysis of LEI in relation to weekly load variation (n. of accelerations). 3 denotes sig. difference vs. I−WL. (e) Analysis of LEI in relation to weekly load variation (n. of decelerations).

Table 1. Features inserted in the ML models as predictors.

External Load Data	Training Duration (minutes), Total Distance (m), Distance > 7.2 km/h (m), Distance > 14.4 km/h (m), Distance > 19.8 km/h (m), Distance > 25.2 km/h (m), Max Speed (km/h), Average Speed (km/h), Number of accelerations > 2.5 (m/s²), Number of decelerations < −2.5 (m/s²), Number of accelerations > 3.5 (m/s²), Number of decelerations < −3.5 (m/s²), Number of accelerations > 4.5 (m/s²), Number of decelerations < −4.5 (m/s²), Max Accelerations (m/s²), Max Deceleration (m/s²), Number of Sprints (count)
Additional Information	Playing Position (center-back, full-back, midfielder, winger, forward)
	Type of session (Training, Match)

Table 2. Teams’ weekly load distribution.

Team	Day of the Week	Total Distance (m)	Distance > 19.8 km/h (m)	Distance > 25.2 km/h (m)	N° Accelerations > 3.5 m/s² (cnt)	N° Decelerations < −3.5 m/s² (cnt)
First Team	MD−4	5450 ± 1413	360 ± 235	45 ± 62	22 ± 11	24 ± 12
	MD−3	5504 ± 2073	464 ± 314	98 ± 100	17 ± 9	19 ± 11
	MD−2	3333 ± 1294	68 ± 143	11 ± 36	9 ± 7	9 ± 8
	MD−1	4026 ± 782	107 ± 95	10 ± 18	12 ± 6	13 ± 6
	MD	10591 ± 1343	630 ± 186	112 ± 72	26 ± 7	46 ± 11

Table 3. Performance of the ML models.

ML Models	MAPE (%)	RMSE (a.u)
RF	0.10 ± 0.01	14.90 ± 5.18
XGBoost	0.14 ± 0.03	15.63 ± 4.75
LR	0.24 ± 0.03	17.36 ± 4.01
B₁	0.98 ± 0.25	53.54 ± 30.59
B₂	0.79 ± 0.09	52.04 ± 6.77

RF = Random Forest Regression; XGBoost = Extreme Gradient Boosting; LR = Linear Regression; B₁ = first baseline model; B₂ = second baseline model.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mandorino, M.; Tessitore, A.; Leduc, C.; Persichetti, V.; Morabito, M.; Lacome, M. A New Approach to Quantify Soccer Players’ Readiness through Machine Learning Techniques. Appl. Sci. 2023, 13, 8808. https://doi.org/10.3390/app13158808

AMA Style

Mandorino M, Tessitore A, Leduc C, Persichetti V, Morabito M, Lacome M. A New Approach to Quantify Soccer Players’ Readiness through Machine Learning Techniques. Applied Sciences. 2023; 13(15):8808. https://doi.org/10.3390/app13158808

Chicago/Turabian Style

Mandorino, Mauro, Antonio Tessitore, Cédric Leduc, Valerio Persichetti, Manuel Morabito, and Mathieu Lacome. 2023. "A New Approach to Quantify Soccer Players’ Readiness through Machine Learning Techniques" Applied Sciences 13, no. 15: 8808. https://doi.org/10.3390/app13158808

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Approach to Quantify Soccer Players’ Readiness through Machine Learning Techniques

Abstract

1. Introduction

2. Materials and Methods

2.1. Participants

2.2. External Load Data Collection

3. Data Analysis

3.1. Algorithm Selection

3.2. Data Pre-Processing

3.3. Feature Elimination, Hyperparameter Tuning, and Cross-Validation

3.4. Model Evaluation

3.5. Calculation of Locomotor Efficiency Index

3.6. Analysis of the Relationship between LEI and Weekly Training Load

4. Statistical Analysis

5. Results

6. Discussion

6.1. ML Model Development

6.2. Relationship between LEI and Period of the Season and Day of the Week

6.3. Effect of Training Load Variations on LEI

7. Practical Applications

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI