# Tracking Health, Performance and Recovery in Athletes Using Machine Learning

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Ethics Statement

#### 2.2. Subjects

#### 2.3. Analysis of Blood Parameters

#### 2.4. Urine Analysis

#### 2.5. Statistical Analysis

#### 2.6. Data Preparation

- All string data, such as “Gender,” “Type of load,” and “Type of sport,” were converted into categorical features.
- Missing values in the numerical features were replaced by the average of the feature range, and missing values in the categorical features were replaced by the mode of values of this feature.
- All numerical features were normalized using Z-scaling of the data based on mean and standard deviation (Equation (1)):

#### 2.7. Data Classification Using the Random Forest Algorithm

- Overall accuracy (Equation (2)) is defined as the number of correctly predicted items (true positive (TP), true negative (TN)) over total of item to predict (true positive (TP), true negative (TN), false positive (FP) and false negative (FN)).

- Average accuracy is defined as the arithmetic mean of all the accuracy scores of different classes (Equation (3)).

- Macro-average precision is defined as the arithmetic mean of all the precision scores of different classes (Equation (5)).

- Macro-average recall is defined as the arithmetic mean of all the recall scores of different classes (Equation (7)).

- Micro-average recall is defined as the sum of true positives (TP) for all the classes divided by the actual positives and not the predicted positives (False Negative (FN)), as described in Equation (9):

#### 2.8. Classification Using the Multinomial Logistic Regression Algorithm

- For each repetition k in 1…K:
- Randomly shuffle column of the dataset to generate a corrupted version of the data named ${\tilde{D}}_{k,j}$
- Compute the score ${s}_{k,j}$ (accuracy) of the model on corrupted data ${\tilde{D}}_{k,j}$
- Compute importance score ${i}_{j}$ for feature ${f}_{j}$ defined as (Equation (12)):$${i}_{j}=s-\frac{1}{K}{{\displaystyle \sum}}_{k=1}^{K}{s}_{k,j}$$

- Number of decision trees;
- Depth of the tree;
- Number of random splits;
- Number of samples per leaf.

- Optimization tolerance;
- L1 (Lasso) regularization weight;
- L2 (Ridge) regularization weight.

## 3. Results

#### 3.1. Decision Forest and Multinomial Regression Data Classification Models

#### 3.2. Parameters of Blood and Urine Biochemistry as Predictors for the Classification of Phenotypes “Catabolism” and “Anabolism”

## 4. Discussion

## 5. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Barnett, A. Using Recovery Modalities between Training Sessions in Elite Athletes: Does It Help? Sports Med. Auckl. N. Z.
**2006**, 36, 781–796. [Google Scholar] [CrossRef] [PubMed] - Stander, Z.; Luies, L.; Mienie, L.J.; Van Reenen, M.; Howatson, G.; Keane, K.M.; Clifford, T.; Stevenson, E.J.; Loots, D.T. The Unaided Recovery of Marathon-Induced Serum Metabolome Alterations. Sci. Rep.
**2020**, 10, 11060. [Google Scholar] [CrossRef] [PubMed] - Tomlin, D.L.; Wenger, H.A. The Relationship between Aerobic Fitness and Recovery from High Intensity Intermittent Exercise. Sports Med.
**2001**, 31, 1–11. [Google Scholar] [CrossRef] [PubMed] - Havermale, L.A. Nutrition Knowledge of Collegiate Athletes in Endurance and Non-Endurance Sports; Southern Illinois University: Carbondale, IL, USA, 2017; p. 38. [Google Scholar]
- Jentjens, R.; Jeukendrup, A. Determinants of Post-Exercise Glycogen Synthesis during Short-Term Recovery. Sports Med. Auckl. N. Z.
**2003**, 33, 117–144. [Google Scholar] [CrossRef] [PubMed] - Shirreffs, S.M.; Armstrong, L.E.; Cheuvront, S.N. Fluid and Electrolyte Needs for Preparation and Recovery from Training and Competition. J. Sports Sci.
**2004**, 22, 57–63. [Google Scholar] [CrossRef] - Cheung, K.; Hume, P.; Maxwell, L. Delayed Onset Muscle Soreness: Treatment Strategies and Performance Factors. Sports Med. Auckl. N. Z.
**2003**, 33, 145–164. [Google Scholar] [CrossRef] - Moore, D.R. Maximizing Post-Exercise Anabolism: The Case for Relative Protein Intakes. Front. Nutr.
**2019**, 6, 147. [Google Scholar] [CrossRef] - Vohra, R.; Hussain, A.; Dudyala, A.K.; Pahareeya, J.; Khan, W. Multi-class classification algorithms for the diagnosis of anemia in an outpatient clinical setting. PLoS ONE
**2022**, 17, e0269685. [Google Scholar] [CrossRef] - Ahsan, M.M.; Luna, S.A.; Siddique, Z. Machine-Learning-Based Disease Diagnosis: A Comprehensive Review. Healthcare
**2022**, 10, 541. [Google Scholar] [CrossRef] - Choi, S.B.; Kim, W.J.; Yoo, T.K.; Park, J.S.; Chung, J.W.; Lee, Y.; Kang, E.S.; Kim, D.W. Screening for Prediabetes Using Machine Learning Models. Comput. Math. Methods Med.
**2014**, 2014, 618976. [Google Scholar] [CrossRef] - Meng, X.-H.; Huang, Y.-X.; Rao, D.-P.; Zhang, Q.; Liu, Q. Comparison of Three Data Mining Models for Predicting Diabetes or Prediabetes by Risk Factors. Kaohsiung J. Med. Sci.
**2013**, 29, 93–99. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Kopylov, A.T.; Petrovsky, D.V.; Stepanov, A.A.; Rudnev, V.R.; Malsagova, K.A.; Butkova, T.V.; Zakharova, N.V.; Kostyuk, G.P.; Kulikova, L.I.; Enikeev, D.V.; et al. Convolutional Neural Network in Proteomics and Metabolomics for Determination of Comorbidity between Cancer and Schizophrenia. J. Biomed. Inform.
**2021**, 122, 103890. [Google Scholar] [CrossRef] [PubMed] - Hsieh, C.-H.; Lu, R.-H.; Lee, N.-H.; Chiu, W.-T.; Hsu, M.-H.; Li, Y.-C.J. Novel Solutions for an Old Disease: Diagnosis of Acute Appendicitis with Random Forest, Support Vector Machines, and Artificial Neural Networks. Surgery
**2011**, 149, 87–93. [Google Scholar] [CrossRef] - Balasubramanian, J.B.; Boes, R.D.; Gopalakrishnan, V. A Novel Approach to Modeling Multifactorial Diseases Using Ensemble Bayesian Rule Classifiers. J. Biomed. Inform.
**2020**, 107, 103455. [Google Scholar] [CrossRef] [PubMed] - GOST R 52623.4-2015 Technologies for Performing Simple Medical Services of Invasive Interventions. Available online: https://allgosts.ru/11/160/gost_r_52623.4-2015 (accessed on 31 August 2022). In Rusiian.
- R Core Team. European Environment Agency. 2020. Available online: https://www.eea.europa.eu/data-and-maps/indicators/oxygen-consuming-substances-in-rivers/r-development-core-team-2006 (accessed on 31 August 2022).
- Kassambara, A. Rstatix: Pipe-Friendly Framework for Basic Statistical Tests. 2021. Available online: https://github.com/kassambara/rstatix (accessed on 31 August 2022).
- Pettersson, J.; Hindorf, U.; Persson, P.; Bengtsson, T.; Malmqvist, U.; Werkström, V.; Ekelund, M. Muscular Exercise Can Cause Highly Pathological Liver Function Tests in Healthy Men. Br. J. Clin. Pharmacol.
**2008**, 65, 253–259. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Pavletic, A.J.; Pao, M.; Wright, M.E. Exercise-Induced Elevation of Liver Enzymes in a Healthy Female Research Volunteer. Psychosomatics
**2015**, 56, 604–606. [Google Scholar] [CrossRef] [Green Version] - Fragala, M.S.; Bi, C.; Chaump, M.; Kaufman, H.W.; Kroll, M.H. Associations of aerobic and strength exercise with clinical laboratory test values. PLoS ONE
**2017**, 12, e0180840. [Google Scholar] [CrossRef] [Green Version] - Ekun, O.A.; Emiabata, A.F.; Abiodun, O.C.; Ogidi, N.O.; Adefolaju, F.O.; Ekun, O.O. Effects of Football Sporting Activity on Renal and Liver Functions among Young Undergraduate Students of a Nigerian Tertiary Institution. BMJ Open Sport Exerc. Med.
**2017**, 3, e000223. [Google Scholar] [CrossRef] - Khatri, P.; Neupane, A.; Sapkota, S.R.; Bashyal, B.; Sharma, D.; Chhetri, A.; Chirag, K.C.; Banjade, A.; Sapkota, P.; Bhandari, S. Strenuous Exercise-Induced Tremendously Elevated Transaminases Levels in a Healthy Adult: A Diagnostic Dilemma. Case Rep. Hepatol.
**2021**, 2021, e6653266. [Google Scholar] [CrossRef] - Banfi, G.; Colombini, A.; Lombardi, G.; Lubkowska, A. Metabolic Markers in Sports Medicine. Adv. Clin. Chem.
**2012**, 56, 1–54. [Google Scholar] [CrossRef] - Villavicencio Kim, J.; Wu, G.Y. Body Building and Aminotransferase Elevations: A Review. J. Clin. Transl. Hepatol.
**2020**, 8, 161–167. [Google Scholar] [CrossRef] [PubMed] - Tietze, D.C.; Borchers, J. Exertional Rhabdomyolysis in the Athlete. Sports Health
**2014**, 6, 336–339. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Thomas, B.D.; Motley, C.P. Myoglobinemia and Endurance Exercise: A Study of Twenty-Five Participants in a Triathlon Competition. Am. J. Sports Med.
**1984**, 12, 113–119. [Google Scholar] [CrossRef] [PubMed] - Morales, A.P.; Maciel, R.N.; Jorge, F.S.; Arêas Neto, N.T.; Cordeiro, D.d.C.; Viana, M.A.S.; de Oliveira, C.J.L. Changes in Serum Creatinine, Uric Acid, Creatine Kinase and Glomerular Filtration in Street Runners. Rev. Bras. Cineantropometria Desempenho Hum.
**2013**, 15, 71–81. [Google Scholar] [CrossRef]

**Figure 1.**Structure of comparison groups. To classify study participants into anabolism and catabolism phenotypes, the metadata were ranked into four classes that characterize catabolism (classes 1 and 2) in the muscle and liver and anabolism (classes 3 and 4) also in the muscle and liver.

**Figure 2.**Confusion matrix of decision forest predictive results obtained on the testing dataset (

**a**) for the muscle metabolism: Groups 1 and 2 (catabolism), Groups 3 and 4 (anabolism) and (

**b**) for the liver metabolism: Groups 1 and 2 (catabolism), Groups 3 and 4 (anabolism). The percentage of recognized files is indicated in the intercept of the different classes, unless the number is in the intercept between different phenotypes.

**Figure 3.**Confusion matrix of multinomial regression predictive results obtained on the testing dataset (

**a**) for the muscle metabolism: Groups 1 and 2 (catabolism), Groups 3 and 4 (anabolism) and (

**b**) for the liver metabolism: Groups 1 and 2 (catabolism), Groups 3 and 4 (anabolism). The percentage of recognized files is indicated in the intercept of the different classes unless the number is in the intercept between different phenotypes.

**Figure 4.**The similarity graph between studied classes constructed using data for decision forest (

**a**) and multinomial regression (

**b**). Distance between nodes indicates similarity as a measure of the amount of overlap between classes.

**Figure 5.**Distribution histograms of urine and blood biochemistry indicators for the studied metabolism in muscles (

**a**–

**e**) and liver (

**f**–

**h**) indicating classes (1 and 2 correspond to the “catabolism” phenotype, 3 and 4 correspond to the “anabolism” phenotype), which have the greatest impact on result in multinomial regression model: AST, aspartate aminotransferase (

**a**), creatine kinase (

**b**), lactate dehydrogenase, LDH (

**c**), myoglobin (

**d**), uric acid (

**e**,

**f**), urea (

**g**) and creatinine (

**h**), The orange background shows the ranges of “reference” values for the presented indicators.

Kind of Sport | Number of Athletes | Gender Ration, % (M/F) | Age (Mean), Years Old | SD | Phenotype “Anabolism” | Phenotype “Catabolism” | ||
---|---|---|---|---|---|---|---|---|

Group 1 Muscle (Classes 1 and 2) | Group 2 Liver (Classes 3 and 4) | Group 3 Muscle (Classes 1 and 2) | Group 4 Liver (Classes 3 and 4) | |||||

endurance | 823 | 59/41 | 21, 35 | 6, 73 | 763 | 663 | 60 | 160 |

psycho-emotional | 137 | 57/43 | 23, 61 | 10, 99 | 136 | 102 | 1 | 35 |

strength endurance | 479 | 63/37 | 20, 21 | 5, 55 | 431 | 356 | 48 | 123 |

speed endurance | 225 | 60/40 | 21, 22 | 5, 82 | 209 | 162 | 16 | 63 |

speed-power | 479 | 60/40 | 21, 14 | 6, 60 | 442 | 327 | 37 | 152 |

difficult coordination | 165 | 45/55 | 19, 04 | 4, 86 | 138 | 142 | 27 | 23 |

technical | 1353 | 56/44 | 21, 35 | 7, 08 | 1261 | 1059 | 92 | 294 |

Total | 3661 | 58/42 | 21, 15 | 6, 83 | 3380 | 2811 | 281 | 850 |

Number of Decision Trees | Maximum Depth of the Tree | Number of Random Splits | Minimum Number of Samples per Leaf |
---|---|---|---|

64 | 128 | 128 | 128 |

Metabolism in Muscles | Metabolism in the Liver | ||
---|---|---|---|

Feature | Score | Feature | Score |

Creatine phosphokinase (CFK) | 0.95 | Creatinine | 0.35 |

Lactate dehydrogenase (LDH) | 0.74 | Urea acid | 0.32 |

Aspartate aminotransferase (AST) | 0.63 | Urea | 0.22 |

**Table 4.**Features with the highest impact in the multinomial regression model (importance score > 0.1).

Metabolism in Muscles | Metabolism in the Liver | ||
---|---|---|---|

Feature | Score | Feature | Score |

Aspartate aminotransferase (AST) | 0.57 | Uric acid | 0.46 |

Creatine kinase (CK) | 0.50 | Urea | 0.32 |

Lactate dehydrogenase (LDH) | 0.43 | Creatinine | 0.19 |

Alanine aminotransferase (ALT) | 0.29 | Sex (F) | 0.16 |

Myoglobin | 0.29 | ||

Uric acid | 0.14 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Petrovsky, D.V.; Pustovoyt, V.I.; Nikolsky, K.S.; Malsagova, K.A.; Kopylov, A.T.; Stepanov, A.A.; Rudnev, V.R.; Balakin, E.I.; Kaysheva, A.L.
Tracking Health, Performance and Recovery in Athletes Using Machine Learning. *Sports* **2022**, *10*, 160.
https://doi.org/10.3390/sports10100160

**AMA Style**

Petrovsky DV, Pustovoyt VI, Nikolsky KS, Malsagova KA, Kopylov AT, Stepanov AA, Rudnev VR, Balakin EI, Kaysheva AL.
Tracking Health, Performance and Recovery in Athletes Using Machine Learning. *Sports*. 2022; 10(10):160.
https://doi.org/10.3390/sports10100160

**Chicago/Turabian Style**

Petrovsky, Denis V., Vasiliy I. Pustovoyt, Kirill S. Nikolsky, Kristina A. Malsagova, Arthur T. Kopylov, Alexander A. Stepanov, Vladimir. R. Rudnev, Evgenii I. Balakin, and Anna L. Kaysheva.
2022. "Tracking Health, Performance and Recovery in Athletes Using Machine Learning" *Sports* 10, no. 10: 160.
https://doi.org/10.3390/sports10100160