Next Article in Journal
Hymenolepis diminuta Infection Affects Apoptosis in the Small and Large Intestine
Previous Article in Journal
An Improved Genetic Algorithm for Location Allocation Problem with Grey Theory in Public Health Emergencies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimation of Functional Fitness of Korean Older Adults Using Machine Learning Techniques: The National Fitness Award 2015–2019

1
Division of Mechanical and Aerospace Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05029, Korea
2
Physical Activity and Performance Institute, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05029, Korea
3
Department of Sports Medicine and Science, Graduate School, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05029, Korea
4
Department of Physical Education, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05029, Korea
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2022, 19(15), 9754; https://doi.org/10.3390/ijerph19159754
Submission received: 12 July 2022 / Revised: 3 August 2022 / Accepted: 5 August 2022 / Published: 8 August 2022

Abstract

:
Measuring functional fitness (FF) to track the decline in physical abilities is important in order to maintain a healthy life in old age. This paper aims to develop an estimation model of FF variables, which represents strength, flexibility, and aerobic endurance, using easy-to-measure physical parameters for Korean older adults aged over 65 years old. The estimation models were developed using various machine learning techniques and were trained with the National Fitness Award datasets from 2015 to 2019 compiled by the Korea Sports Promotion Foundation. The machine-learning-based nonlinear regression models were employed to improve the performance of the previous linear regression models. To derive the optimal estimation model that showed the best estimation accuracy, we developed five different machine-learning-based estimation models and compares the estimation accuracy not only among the machine learning models, but also with the previous linear regression model. The coefficient of determination of the FF variables was used to compare the performance of each model; the mean absolute percentage error (MAPE) and standard error of estimation (SEE) were used to evaluate the model performance. The deep neural network (DNN) model presented the best performance among the regression models for the estimation of all of the FF variables. The coefficient of determination in the HGS test was 0.784, while those of the others were less than 0.5 meaning that the HGS of older adults can be reliably estimated using easy-to-measure independent variables.

1. Introduction

The number of elderly people older than 65 years of age has increased rapidly in recent decades, and has been a critical social issue in many countries [1]. Human aging leads to the degradation of physical functionality, such as weakening of the muscle forces, and can cause serious diseases and impairments [2,3]. The degradation of physical functionality is critically correlated with diseases that arise in old age. For example, the timed up-and-go (TUG) test results of elderly people show a correlation with their mental and physical health [4]. Health-related quality of life is associated with physical functionality, and it is recommended that physical functionality is maintained in old age in order to ensure a healthy life without illness [5].
Habitual physical activity and proper nutrition are the top priorities for delaying the loss of physical functionality [6]. The World Health Organization (WHO) recommends that elderly individuals perform moderate-intensity aerobic exercises for at least 150–300 min per week, vigorous-intensity aerobic exercise for at least 75–150 min, or a combination of both. The WHO also recommends strength training for more than 2 days a week and multicomponent physical activities more than 3 days a week [7]. Although regular physical activity is important to preserve a healthy life in old age, <40% of the elderly population exercise regularly in their daily life [8]. Insufficient physical activity primarily causes functional disability and limited mobility, and these cascades result in more critical illnesses [9]. Smart fitness services that provide personalized workout programs can be an effective solution to encourage the elderly population to exercise regularly. A personalized workout program that fits an individual’s physical ability prevents over- or under-exercise and cramping during exercise. Estimation of physical fitness levels, including the muscle strength, flexibility, coordination, agility/dynamic balance, and aerobic endurance of elderly persons, is important to construct personalized workout programs, and functional fitness (FF) assessment tests have been used to evaluate individual physical fitness levels [10].
However, performing the FF assessment and monitoring the physical abilities periodically in older adults are associated with many difficulties. In addition, measuring FF variables requires a sophisticated device and is costly. To address these inconveniences, several previous studies have proposed methods for estimating an individual’s FF variables with simple physical parameters using multiple linear regression (MLR). Nevertheless, the MLR model has a critical limitation in that it can only represent the linear relationship between the inputs and outputs. In previous studies, machine-learning-based prediction models, such as support vector machine (SVM) and random forest (RF), have been used to consider nonlinear relations in FF prediction. Mahajan et al. reported that the RF model improved the estimation accuracy of 231 divers’ physical fitness levels compared with the linear regression model [11]. Akay et al. predicted the hamstring and quadriceps strength of athletes using an SVM estimation model [12]. Zhu et al. used SVM to predict athletes’ performance [13], Taha et al. estimated archers’ physical fitness level using the k-nearest neighbors and SVM [14], and Matteo et al. proposed nearest neighbor models to predict athlete performance in team sports [15].
Nevertheless, these nonlinear prediction models focused on being trained with individuals who have superior physical functionality, which is not the general population. Previous studies have measured athletes’ physical information to train a prediction model with complicated equipment, which is not adequate for ordinary applications. Lee et al. presented an artificial neural network-based regression model for Korean adults aged <65 years [16]. However, they did not consider older adults whose variables in the FF test were different. In this paper, we propose a machine-learning-based estimation model for FF variables with easy-to-measure physical variables in Korean older adults. To derive the optimal estimation model that shows the best estimation performance, we constructed various machine-learning-based estimation models and evaluated the estimation performance of each model.
The main contributions of this study are as follows:
  • This study proposed the FF variable estimation model for evaluating the physical fitness level of elderly adults using easy-to-measure independent variables. The proposed model can be used as an effective tool to evaluate the personal fitness level in smart fitness services.
  • Various nonlinear machine learning regression models were constructed and evaluated to compare the accuracy with the previous linear model and to derive the optimal estimation model presenting the best estimation performance.

2. Materials and Methods

2.1. Ethics Statement

The study was conducted in accordance with the guidelines of the Declaration of Helsinki and was approved by the International Review Board of Konkuk University (7001355-202101-E-132).

2.2. Dataset

The National Fitness Award (NFA) is a program carried out by the Ministry of Culture, Sports, and Tourism (MCST) and the Korea Sports Promotion Foundation (KSPF) to measure the physical fitness levels of general Koreans aiming to help people live healthier. This paper employs the NFA datasets of elderly adults (age: ≥65 years), gathered by the KSPF to train the machine-learning-based estimation models. The NFA dataset includes the physical fitness levels of individuals that are measured under strict measurement protocols at 75 sites throughout the Republic of Korea. The participants of the NFA dataset, who were collected between 2015 and 2019, were older adults (total = 210,490) who were older than 65 years of age. We excluded missing values of older adults’ independent variables and FF variables, resulting in 178,960 adults in the regression model datasets (men: n = 61,465, women: n = 117,495). The regression models in the study used independent variables (e.g., sex, age, body mass index (BMI), and percent body fat) as the inputs and predicted the FF variables, including hand grip strength (HGS), lower body strength (30 s chair stand test), lower body flexibility (chair sit-and-reach test), coordination (figure-of-eight walk test), agility/dynamic balance (TUG test), and aerobic endurance (2 min step test), as the outputs; 70% of the data (total: n = 125,272, men: n = 42,911, women: n = 82,281) were used as the training dataset, and 30% of the data (total: n = 53,688; men: n = 18,474; women: n = 35,214) were used as the validation dataset. A summary of the NFA datasets is presented in Table 1.
Measurement of physical independent variables and FF variables: The measurements of independent variables and FF variables followed the NFA guidelines, as presented in a previous study [17].

2.3. Data Pre-Processing

Pearson’s correlation analysis was used to assess the linear relationship between the independent variables and FF variables. Table 2 shows the degree of correlation between the input and output variables. HGS had a positive linear correlation with height and weight and a negative correlation with sex and percent body fat. Figure-of-eight walk and TUG had a positive correlation with age, and the 30 s chair stand had a negative correlation with age. The chair sit-and-reach showed a positive correlation with sex and a negative correlation with height. As most of the FF variables showed a weak linear relation with the independent variables, it is possible to consider using nonlinear prediction models rather than linear prediction models to improve the prediction accuracy.
Standardization: Standardization, which is a feature scaling technique, was used for the input variables to avoid data redundancy and dependency caused by feature scale differences (Equation (1)). All data, except sex, were centered around the mean of 0 with a standard deviation of 1.
x ^ i = x i μ σ
where x i   and   x ^ i denote the values of the input and standardized input, respectively. μ and σ are the average and standard deviation of the input variable x i , respectively. For standardization, sex was expressed as 1 (male) or 2 (female).
Outlier removal: Outliers, which can distort statistical analyses and create prediction models of poor outcomes, are data with abnormal values from other data. To manage outliers in the training dataset, the studentized residual (SRE) was used, and outlier data were removed when the absolute value of the SRE was >2 [17].
Feature selection: Feature selection methods were used to increase the estimation performance and shorten the training regression model. The p-value was used to validate the relationship between the independent variables and FF variables. The independent variables with p-value is >0.05 were removed for dimension reduction and estimation accuracy improvement. In addition, feature selection using the Boruta algorithm was used to assess variables that could decrease the performance of the regression model and cause overfitting. In the ranking of the features, 1 means confirmed, 2 means tentative, and 3 means rejected. The p-values for each variable and the Boruta algorithm ranking are listed in Table 3. With reference to feature selection, we selected the input variables and maximized the estimation performance.

2.4. Machine Learning-Based Estimation Models

Various machine-learning-based regression models were used to predict the FF variable with independent variables. Each model was evaluated using R2 and SEE values and was compared with the other models. A summary of this method is shown in Figure 1.

2.4.1. Support Vector Regression

SVM, which predicts the optimal hyperplane generated in an n-dimensional feature space, is a supervised learning algorithm for classification and regression. SVR is specifically used for regression, and Equation (2) represents the linear approximation function [18].
y = ω · x + b
where ω is the weight vector of the function. Equations (3) and (4) represents the objective function of SVR, as follows:
L s v r = m i n 1 2 ω 2 + C   i = 1 n ξ i + ξ i *
s . t . ω T x i + b y i ϵ + ξ i y i ω T x i + b ϵ + ξ i *               ξ i , ξ i *               0
where the positive constant, C, which is the regularization parameter, determines the flatness of the approximation function. x i   and   y i are the input and output variables of the i-th instance, respectively. ϵ is the error tolerance margin of the approximation function and ξ i   and   ξ i * are slack variables for measuring the distance to the points outside the margin. The SVR input space computation can be performed using the kernel function, which returns the inner product of the input feature vectors, to solve the nonlinear problem by mapping lower-dimensional data into higher-dimensional data. This study used kernels in SVR, as follows (Equation (5)):
K x i   ,   x j = Φ x i ·   Φ x j
where Φ x i   and   Φ x j are feature space mapping functions.
Using the Lagrangian dual problem and kernel trick, SVR can be expressed as follows (Equation (6)):
y = i = 1 n α i * α i K x i   ,   x j + b
where α i   and   α i * are Lagrange multipliers.

2.4.2. Decision Tree

A decision tree is a decision-support-tree-like model formed of nodes and edges [19]. In the tree structure, class labels are represented by leaves and feature combinations are represented by branches. A decision tree splits nodes based on the result of the Gini impurity, which is a measure of diversity in a dataset (Equation (7)).
G i = 1 k = 1 n p i ,     k 2
where p i ,   k 2 is the proportion of samples belonging to class k for the i’th node.

2.4.3. Random Forest Regression

Random forest regression is a bagging ensemble method of decision tree regression that is trained using the classification and regression tree (CART) algorithm. The objective function of CART is as follows (Equation (8)) [20]:
J ( k ,   t k ) = m l e f t m G l e f t + m r i g h t m G r i g h t
where k   and   t k are the single feature and threshold, respectively; G l e f t / r i g h t is the impurity of the subset; and m l e f t / r i g h t   is the number of samples of the subset. The ensemble method constructs multiple decision trees using a bagging algorithm known as bootstrap aggression. Each decision tree is trained by a sampling dataset with replacement and is aggregated by the average regression outcomes of the models. RF can mitigate the prediction variance and maintain unbiasedness as compared with a single decision tree.

2.4.4. EXtreme Gradient Boost (XGBoost)

XGBoost is an ensemble algorithm that implements gradient-boosted decision trees [21]. Gradient boosting trains weak learners to create a strong ensemble model. Gradient boosting recursively adds a new decision tree model to correct the prior predictor model. Each decision tree was trained on the residual errors of the prior tree model. The sum of all of the prediction outcomes of the trees is the same as the ensemble prediction outcome.

2.4.5. Deep Neural Network (DNN)

The DNN, which is composed of node layers, consists of an input layer, hidden layer, and output layer. Each node has a weight and threshold and is activated when the output of the node is above the specified threshold when using the activation function [22]. Batch normalization was used for each layer to avoid gradient vanishing or exploding. The model hyperparameters (the number of hidden layers and number of nodes in each layer) were determined by a grid search, and we determined the number of nodes and layers for the best estimation performance. The hidden layers were composed of three layers with 32, 64, and 32 nodes, respectively. A rectified linear unit was used for the activation function, the mean square error was the loss function used in the training, and Adam was used as an optimizer.

2.4.6. Mixture Density Network (MDN)

The MDN, which is combined with a convolution network and mixture density model, models the mixture of parametric distributions, as shown in Equations (9) and (10) [23].
p y | x = i = 1 n α i x Φ y   |   θ i
s . t . i = 1 n α i x = 1 Φ y   |   θ i = μ i ,   σ i 2
where x   and   y are the input and output variables, respectively; n is the number of mixture components; and α i x are mixing coefficients, which are prior probabilities (conditioned on x) corresponding to the mixture weight. Φ y   |   θ i is the conditional density composed of the mean ( μ i ) and variance ( σ i 2 ).

2.5. Model Evaluation

Using 30% of the total data, which were divided in the Bernoulli trial, the validation of the regression models was tested with the mean error and SEE, as shown in Equations (11) and (12).
MAPE   % = 100 N y ^ i y i y i
SEE = i = 0 N y ^ i y i 2 N 2
where y i   and   y ^ i are the measured and estimated values, respectively, and N   is the number of test samples.

3. Results

Detailed results of the regression model analysis are presented in Table 4 and Table 5. For each trained regression model, the coefficients of determination (R2), adjusted coefficients of determination, and SEE were used to analyze the estimated explanatory power of the regression models.

3.1. Performance Evaluation of the Regression Models

Table 4 presents a comparison of the FF variable estimation performance in the machine learning models. The DNN models presented the best performance with respect to R2 for estimation of the HGS (R2 = 0.622) and 30 s chair stand (R2 = 0.175), while the random forest model showed the best performance in the estimation of the chair sit-and-reach (R2 = 0.279), figure-of-eight walk (R2 = 0.381), and TUG (R2 = 0.212). For the estimation of the 2 min step test, the MDN model showed the most accurate estimation results (R2 = 0.119). Compared with the linear regression model [17], with the DNN model, R2 was improved by 3.7% and 1.2% in the HGS and 30 s chair stand estimation, respectively. It was also improved by 0.4% and 15.3% in estimation of the chair sit-and-reach and figure-of-eight walk, respectively, with the random forest model.

3.2. Performance Evaluation of the Regression Models without Outlier Data

Table 5 shows a comparison of the FF variable estimation performance in machine learning models without outlier data. In this performance evaluation, the outliers in the NFA datasets were removed using SRE to improve the training performance. Additionally, the Boruta algorithm and p-value were applied for feature selection of the input variables, as mentioned in Section 2.3. The input variables with a rank higher than 1 in the Boruta algorithm were excluded in the training (BMI and weight in TUG estimation). Furthermore, the input variables with a p-value higher than 0.05 (sex in the 2 min step test) were also excluded in the model training. The DNN-based regression model showed the best performance with respect to the R2 values for all FF variable estimations. Compared with the previous linear regression model [17], R2 was improved by 1.1%, 0.6%, 1.1%, 0.6%, 1%, and 1.4% for the HGS, 30 s chair, chair sit-and-reach, figure-of-eight walk, TUG, and 2 min step test with the DNN model.

3.3. Regression Model Validity

Table 6 shows a comparison of the regression models’ validity with the test data, which is 30% of the total data. The mean absolute percentage error ranged from 0.084% to 22.68% in the regression models (DNN model, HGS: MAPE = 0.16% and SEE = 4.135 kg, 30 s chair stand test: MAPE = 0.205% and SEE = 4.169 times, chair sit-and-reach test: MAPE = 20.92% and SEE = 6.228 cm, figure-of-eight walk test: MAPE = 0.097% and SEE = 3.546 s, TUG test: MAPE = 0.084% and SEE = 0.805 s, and 2 min step test: MAPE = 0.099% and SEE = 13.00 times). Figure 2 shows the relationship between the measured and predicted FF variables using scatter plots.

4. Discussion

FF variables, which can be used as an index of healthcare, have been used to assess the health conditions of older adults, and several researchers have studied the correlation between independent variables and FF variables. In previous studies, MLR was used to develop a prediction model for the FF variables. However, MLR, which cannot represent the nonlinearity of data, has limitations in estimating FF variables. In addition, prior studies focused on predicting a specific group’s superior physical functionality, such as that of athletes, which is not appropriate for the prediction of FF variables in older adults. The present study focused on developing a regression model for estimating the FF variables of older adults in Korea with easy-to-measure independent variables. To obtain an accurate regression model, we compared various machine learning and deep learning regression models. This study demonstrated the highest performance of the DNN model in FF variable estimation compared with the other regression models. With the developed regression model, it would be helpful to monitor the FF in older adults in daily life.
The correlation coefficient shown in Table 2 represents the strength and direction of the linear relation between the input and output variables. In a previous study, height, weight, and BMI were significantly correlated with HGS for older adults [24]. In this study, HGS had a higher correlation coefficient with these independent variables, and presented the most accurate estimation results compared with the other FF variables. From these results, we can infer that it is important to select input variables with a strong correlation in order to obtain higher estimation results.
Using nonlinear regression models, we focused on predicting the FF variables of older adults using independent variables. The mean explanatory power of HGS was high in the HGS and DNN regression models (MLR: 61.4%, SVM: 62.1%, RF: 61.9%, XGBoost: 62.0%, DNN: 62.2%, and MDN: 61.7%). In this study, outlier removal and feature selection were conducted. The mean explanatory power of HGS without outlier data was 78.4%, which was the highest value in the DNN model (MLR: 77.3%, SVM: 78.4%, RF: 74.2%, XGBoost: 78.3%, DNN: 78.4%, MDN: 78.3%). Our proposed regression model’s explanatory power of HGS was improved by approximately 25% compared with previous studies [25,26]. In our previous study, we developed a linear regression model for predicting FF variables of South Korean older adults [17]. However, the previous study did not cover the nonlinearity of the dataset and only used multiple linear regression models without considering other regression models, which may likely improve the prediction accuracy. Hence, we tested various regression models covering data nonlinearity and proposed the best performance regression model. The DNN-based regression model had a better performance than the linear regression model. Comparing the model’s validation, SEE was improved by 16.6% in HGS, 28.2% in 30 s chair stand, 25.9% in chair sit-and-reach, 50.1% in figure-of-eight walk, 56.7% in TUG, and 48.5% in the 2 min step test.
The coefficient of determination of the proposed model was too low, making it insufficient for practical applications, except for predicting HGS. The coefficients of determination in the 30 s chair stand (adjusted R2 = 0.300), chair sit-and-reach (adjusted R2 = 0.441), figure-of-eight walk (adjusted R2 = 0.395), TUG (adjusted R2 = 0.389), and 2 min step tests (adjusted R2 = 0.207) were in the mid-range. It was inferred that more input variables were required to analyze the relationship with the FF variables. Hence, additional variables, such as the individual physical activity level or nutrition, which are correlated with the FF variables [27], were needed to improve the prediction accuracy. Moreover, we used the general older adults’ independent variables, which did not contain their health status, such as personal physical illness/disease information, even though these might be correlated with the FF variables. Chronic diseases, such as cardiovascular disease and type 2 diabetes, cause mortality in older adults [28]. Information obtained from blood pressure measurements and blood glucose tests could be used as input variables to predict the correlation with FF variables. These parameters may also be used as indicators to isolate the effects of physical illness/disease information. The DNN-based regression model showed the highest performance for most of the FF variables, but the amount of improvement was <1.6% compared with the other regression models in validation. Selecting machine learning models with a computational efficiency is considered practical for predicting HGS.

5. Conclusions

Herein, we proposed an FF variable prediction model based on machine learning and deep learning regression with easy-to-measure independent variables, and compared the performance of each model. This study demonstrated a correlation between older adults’ independent variables and the FF variables, especially HGS. However, the estimation results of the FF variables, except for HGS, were unsatisfactory for monitoring older adults’ physical functionality and providing personalized workout programs. The results showed the difficulty in predicting the FF variables using six independent variables (age, sex, height, weight, percent body fat, and BMI), which were insufficient for representing the correlation of FF variables. In future research, additional variables, including the physical activity level and nutritional status, will be used to enhance the accuracy of the estimation results.

Author Contributions

Data curation, S.-H.L. (Sang-Hun Lee), S.-H.L. (Seung-Hun Lee) and S.-W.K.; formal analysis, S.-H.L. (Sang-Hun Lee), S.-W.K. and H.-Y.P.; funding acquisition, K.L. and H.J.; investigation, H.J; methodology, S.-W.K. and H.-Y.P.; project administration, H.J.; software, S.-H.L. (Sang-Hun Lee); supervision, K.L. and H.J.; validation, S.-H.L. (Sang-Hun Lee), S.-H.L. (Seung-Hun Lee) and S.-W.K.; writing—original draft, S.-H.L. (Sang-Hun Lee); writing—review and editing, H.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Culture, Sports and Tourism R&D Program through the Korea Creative Content Agency grant funded by the Ministry of Culture, Sports and Tourism in 2020 (project name: Development of customized smart fitness service to support the personal life span, project number: SR202006002, contribution rate: 100%).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and was approved by the Institutional Review Board of Konkuk University (7001355-202101-E-132).

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of data. Data were obtained from the Korea Sports Promotion Foundation.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. United Nations Department of Economic and Social Affairs, Population Division. World Population Prospects 2022: Summary of Results; UN DESA/POP/2022/TR/NO. 3; United Nations Department of Economic and Social Affairs, Population Division: New York, NY, USA, 2022. [Google Scholar]
  2. Grimmer, M.; Riener, R.; Walsh, C.J.; Seyfarth, A. Mobility Related Physical and Functional Losses Due to Aging and Disease-a Motivation for Lower Limb Exoskeletons. J. Neuroeng. Rehabil. 2019, 16, 2. [Google Scholar] [CrossRef] [PubMed]
  3. Hurst, C.; Weston, K.L.; McLaren, S.J.; Weston, M. The Effects of Same-Session Combined Exercise Training on Cardiorespiratory and Functional Fitness in Older Adults: A Systematic Review and Meta-Analysis. Aging Clin. Exp. Res. 2019, 31, 1701–1717. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Garber, C.E.; Greaney, M.L.; Riebe, D.; Nigg, C.R.; Burbank, P.A.; Clark, P.G. Physical and Mental Health-Related Correlates of Physical Function in Community Dwelling Older Adults: A Cross Sectional Study. BMC Geriatr. 2010, 10, 6. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Zhao, Y.; Chung, P.-K. Differences in Functional Fitness among Older Adults with and without Risk of Falling. Asian Nurs. Res. 2016, 10, 51–55. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Santos, D.A.; Silva, A.M.; Baptista, F.; Santos, R.; Vale, S.; Mota, J.; Sardinha, L.B. Sedentary Behavior and Physical Activity Are Independently Related to Functional Fitness in Older Adults. Exp. Gerontol. 2012, 47, 908–912. [Google Scholar] [CrossRef] [PubMed]
  7. World Health Organization. WHO Guidelines on Physical Activity and Sedentary Behaviour; World Health Organization: Geneva, Switzerland, 2020; ISBN 978-92-4-001512-8. [Google Scholar]
  8. Brown, D.R.; Yore, M.M.; Ham, S.A.; Macera, C.A. Physical Activity among Adults > Or = 50 Yr with and without Disabilities, BRFSS 2001. Med. Sci. Sports Exerc. 2005, 37, 620–629. [Google Scholar] [CrossRef] [PubMed]
  9. Rikli, R.E.; Jones, C.J. Senior Fitness Test Manual; Human kinetics: Champaign, IL, USA, 2013; ISBN 1-4504-1118-5. [Google Scholar]
  10. Church, T.S.; Earnest, C.P.; Skinner, J.S.; Blair, S.N. Effects of Different Doses of Physical Activity on Cardiorespiratory Fitness among Sedentary, Overweight or Obese Postmenopausal Women with Elevated Blood Pressure: A Randomized Controlled Trial. JAMA 2007, 297, 2081–2091. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Mahajan, U.; Krishnan, A.; Malhotra, V.; Sharma, D.; Gore, S. Predicting Fitness and Performance of Diving Using Machine Learning Algorithms. In Proceedings of the 2019 IEEE Pune Section International Conference (PuneCon), Pune, India, 18–20 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–5. [Google Scholar]
  12. Akay, M.F.; Abut, F.; Cetin, E.; Yarim, I.; Sow, B. Support Vector Machines for Predicting the Hamstring and Quadriceps Muscle Strength of College-Aged Athletes. Turk. J. Electr. Eng. Comput. Sci. 2017, 25, 2567–2582. [Google Scholar] [CrossRef] [Green Version]
  13. Zhu, P.; Sun, F. Sports Athletes’ Performance Prediction Model Based on Machine Learning Algorithm. In Proceedings of the International Conference on Applications and Techniques in Cyber Security and Intelligence, Huainan, China, 22–24 June 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 498–505. [Google Scholar]
  14. Taha, Z.; Musa, R.M.; Majeed, A.P.A.; Alim, M.M.; Abdullah, M.R. The Identification of High Potential Archers Based on Fitness and Motor Ability Variables: A Support Vector Machine Approach. Hum. Mov. Sci. 2018, 57, 184–193. [Google Scholar] [CrossRef] [PubMed]
  15. Matteo, D.; Gastin, P.; Suppiah, H.; Carey, D. Predicting Athlete Performance in Team Sports Using Nearest Neighbour Modelling. In Proceedings of the International Conference on Security, Privacy, and Anonymity in Computation, Communication, and Storage, Nanjing, China, 18 December 2020; Springer: Berlin/Heidelberg, Germany, 2022; pp. 101–108. [Google Scholar]
  16. Lee, S.-H.; Ju, H.-S.; Lee, S.-H.; Kim, S.-W.; Park, H.-Y.; Kang, S.-W.; Song, Y.-E.; Lim, K.; Jung, H. Estimation of Health-Related Physical Fitness (HRPF) Levels of the General Public Using Artificial Neural Network with the National Fitness Award (NFA) Datasets. Int. J. Environ. Res. Public Health 2021, 18, 10391. [Google Scholar] [CrossRef] [PubMed]
  17. Kim, S.-W.; Park, H.-Y.; Jung, H.; Lee, J.; Lim, K. Estimation of Health-Related Physical Fitness Using Multiple Linear Regression in Korean Adults: National Fitness Award 2015–2019. Front. Physiol. 2021, 12, 668055. [Google Scholar] [CrossRef] [PubMed]
  18. Smola, A.J.; Schölkopf, B. A Tutorial on Support Vector Regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef] [Green Version]
  19. Myles, A.J.; Feudale, R.N.; Liu, Y.; Woody, N.A.; Brown, S.D. An Introduction to Decision Tree Modeling. J. Chemom. A J. Chemom. Soc. 2004, 18, 275–285. [Google Scholar] [CrossRef]
  20. Géron, A. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2019; ISBN 1-4920-3259-X. [Google Scholar]
  21. Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K. Xgboost: Extreme Gradient Boosting. R Package Version 0.4-2 2015, 1, 1–4. [Google Scholar]
  22. Schmidhuber, J. Deep Learning in Neural Networks: An Overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Bishop, C.M. Mixture Density Networks; Aston University: Birmingham, UK, 1994. [Google Scholar]
  24. Kim, C.R.; Jeon, Y.-J.; Kim, M.C.; Jeong, T.; Koo, W.R. Reference Values for Hand Grip Strength in the South Korean Population. PLoS ONE 2018, 13, e0195485. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Mukherjee, S.; Mishra, D.; Satapathy, S. Prediction of Hand Grip Strength among Elderly Farmers of Odisha in India. Mater. Today Proc. 2020, 24, 318–325. [Google Scholar] [CrossRef]
  26. Pan, P.-J.; Lin, C.-H.; Yang, N.-P.; Chen, H.-C.; Tsao, H.-M.; Chou, P.; Hsu, N.-W. Normative Data and Associated Factors of Hand Grip Strength among Elderly Individuals: The Yilan Study, Taiwan. Sci. Rep. 2020, 10, 6611. [Google Scholar] [CrossRef]
  27. Clegg, M.E.; Williams, E.A. Optimizing Nutrition in Older People. Maturitas 2018, 112, 34–38. [Google Scholar] [CrossRef] [PubMed]
  28. Mcleod, J.C.; Stokes, T.; Phillips, S.M. Resistance Exercise Training as a Primary Countermeasure to Age-Related Chronic Disease. Front. Physiol. 2019, 10, 645. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Summary of various regression models to predict FF variables with independent variables. Estimation models predicted each output variable. FF, functional fitness; BMI, body mass index; HGS, hand grip strength; XGBoost, extreme gradient boosting.
Figure 1. Summary of various regression models to predict FF variables with independent variables. Estimation models predicted each output variable. FF, functional fitness; BMI, body mass index; HGS, hand grip strength; XGBoost, extreme gradient boosting.
Ijerph 19 09754 g001
Figure 2. Relationship between the measured and predicted FF variables using scatter plots. (A) HGS, (B) 30 s chair stand test, (C) chair sit-and-reach test, (D) figure-of-eight walk test, (E) timed up-and-go test, and (F) 2 min step test results.
Figure 2. Relationship between the measured and predicted FF variables using scatter plots. (A) HGS, (B) 30 s chair stand test, (C) chair sit-and-reach test, (D) figure-of-eight walk test, (E) timed up-and-go test, and (F) 2 min step test results.
Ijerph 19 09754 g002
Table 1. Summary of the NFA dataset used to train the FF variable estimation model.
Table 1. Summary of the NFA dataset used to train the FF variable estimation model.
Data TypeVariablesTraining DatasetValidation Dataset
Men
(n = 42,991)
Women
(n = 82,281)
Men
(n = 18,474)
Women
(n = 35,214)
Independent
Variable
(Input)
Age (year)73.26 ± 5.4572.55 ± 5.6273.31 ± 5.4572.51 ± 5.59
Height (cm)165.10 ± 5.86152.37 ± 5.54165.10 ± 5.79152.38 ± 5.52
Weight (kg)66.33 ± 8.8757.54 ± 8.0066.39 ± 8.8457.49 ± 7.93
Percent body fat (%)24.31 ± 2.8024.77 ± 3.1324.33 ± 2.7924.75 ± 3.09
BMI (kg/m2)26.01 ± 6.3934.97 ± 6.4226.06 ± 6.4234.93 ± 6.38
Functional
Fitness
Variable
(Output)
HGS (kg)30.76 ± 6.6619.45 ± 4.8430.75 ± 6.6519.49 ± 4.82
30-s chair stand (n)20.58 ± 6.4018.23 ± 6.0520.53 ± 6.4018.27 ± 6.05
Chair sit-and-reach (cm)3.87 ± 9.7013.06 ± 8.063.79 ± 9.6413.12 ± 8.02
Figure of 8 walk (s)26.04 ± 7.0128.00 ± 7.9626.13 ± 7.1627.95 ± 7.90
Timed up-and-go (s)6.20 ± 1.816.79 ± 2.076.20 ± 1.786.77 ± 2.02
2-sim step test (n)107.20 ± 24.90100.31 ± 27.49107.02 ± 24.50100.58 ± 27.18
NFA, National Fitness Award; BMI, body mass index; HGS, hand grip strength.
Table 2. Pearson’s correlation analysis between the independent variables and FF variables.
Table 2. Pearson’s correlation analysis between the independent variables and FF variables.
HGS30-s Chair StandChair Sit-and-ReachFigure of 8 WalkTimed Up-and-Go2-min Step Test
Age−0.223−0.317−0.2500.4290.397−0.306
Sex−0.693−0.1720.4540.1220.144−0.126
Height0.6880.169−0.307−0.223−0.2390.196
Weight0.4920.029−0.229−0.070−0.0820.074
Percent body fat−0.466−0.2460.1530.2120.212−0.189
BMI0.013−0.111−0.0160.1060.104−0.079
FF, functional fitness; Sex, male is expressed as 1 and female is expressed as 2.
Table 3. p-values and Boruta feature selection of each independent variable.
Table 3. p-values and Boruta feature selection of each independent variable.
HGS30 s Chair StandChair Sit-and-ReachFigure of 8 WalkTimed Up-and-Go2 min Step Test
p-ValueRankp-ValueRankp-ValueRankp-ValueRankp-ValueRankp-ValueRank
Age0.00010.00010.00010.00010.00010.0001
Sex0.00010.00010.00010.00010.00010.5541
Height0.00010.00010.00010.00010.00010.0001
Weight0.00010.00010.00010.00010.00030.0001
Percent body fat0.00010.00010.00010.00010.00010.0001
BMI0.00010.00010.00010.03410.05920.0001
Table 4. Comparison of the estimated regression models predicting the FF variables.
Table 4. Comparison of the estimated regression models predicting the FF variables.
Support Vector RegressionRR2Adjusted R2SEE
HGS0.7880.6210.6214.750 kg
30 s chair stand0.4170.1740.1745.635 n
Chair sit-and-reach0.5150.2650.2658.395 cm
Figure-of-eight walk0.4230.1790.1796.784 s
Timed up-and-go0.3910.1530.1531.846 s
2 min step test0.3130.0980.09825.73 n
Random ForestRR2Adjusted R2SEE
HGS0.7870.6190.6194.681 kg
30 s chair stand0.4060.1650.1655.293 n
Chair sit-and-reach0.5280.2790.2798.124 cm
Figure-of-eight walk0.6170.3810.3813.116 s
Timed up-and-go0.4600.2120.2121.198 s
2 min step test0.3100.0960.09622.63 n
XGBoostRR2Adjusted R2SEE
HGS0.7870.6200.6204.755 kg
30 s chair stand0.4090.1670.1675.660 n
Chair sit-and-reach0.5210.2720.2728.355 cm
Figure-of-eight walk0.4420.1950.1956.716 s
Timed up-and-go0.4160.1730.1731.823 s
2 min step test0.3210.1030.10325.66 n
DNNRR2Adjusted R2SEE
HGS0.7890.6220.6224.741 kg
30 s chair stand0.4180.1750.1755.640 n
Chair sit-and-reach0.5230.2740.2748.347 cm
Figure-of-eight walk0.4490.2020.2026.688 s
Timed up-and-go0.4230.1790.1791.817 s
2 min step test0.3290.1080.10825.59 n
MDNRR2Adjusted R2SEE
HGS0.7850.6170.6174.771 kg
30 s chair stand0.3940.1550.1555.700 n
Chair sit-and-reach0.5220.2730.2738.349 cm
Figure-of-eight walk0.4480.2010.2016.693 s
Timed up-and-go0.4360.1900.1901.804 s
2 min step test0.3450.1190.11925.44 n
FF, functional fitness; SEE, standard error of estimation; XGBoost, extreme gradient boosting; DNN, deep neural network; MDN, mixture density network.
Table 5. Comparison of the estimated regression models predicting the FF variables without outlier data.
Table 5. Comparison of the estimated regression models predicting the FF variables without outlier data.
FF VariablesSREIndependent Variables (Input Variables)
HGSSRE 32: n = 101,438Age, Sex, Height, Weight, Percent body fat, BMI
30 s chair standSRE 39: n = 102,726Age, Sex, Height, Weight, Percent body fat, BMI
Chair sit-and-reachSRE 35: n = 102,640Age, Sex, Height, Weight, Percent body fat, BMI
Figure-of-eight walkSRE 22: n = 79,724Age, Sex, Height, Weight, Percent body fat, BMI
Timed up-and-goSRE 36: n = 94,621Age, Sex, Height, Percent body fat
2-min step testSRE 28: n = 91,420Age, Height, Weight, Percent body fat, BMI
Support Vector RegressionRR2Adjusted R2SEE
HGS 0.8850.7840.7843.069 kg
30-s chair stand0.5480.3000.3003.800 n
Chair sit-and-reach0.6640.4410.4415.436 cm
Figure-of-eight walk0.6280.3950.3953.083 s
Timed up-and-go0.6240.3890.3890.705 s
2-min step test0.4550.2070.20712.46 n
Random ForestRR2Adjusted R2SEE
HGS 0.8610.7420.7423.336 kg
30-s chair stand0.5140.2640.2645.492 n
Chair sit-and-reach0.6550.4290.4293.197 cm
Figure-of-eight walk0.5900.3480.3483.197 s
Timed up-and-go0.5880.3460.3460.729 s
2-min step test0.4380.1920.19212.57 n
XGBoostRR2Adjusted R2SEE
HGS 0.8850.7830.7833.069 kg
30-s chair stand0.5480.3010.3013.800 n
Chair sit-and-reach0.6620.4380.4385.448 cm
Figure-of-eight walk0.6280.3950.3953.080 s
Timed up-and-go0.6260.3920.3920.704 s
2-min step test0.4530.2050.20512.48 n
DNNRR2Adjusted R2SEE
HGS 0.8850.7840.7843.054 kg
30-s chair stand0.5500.3020.3023.794 n
Chair sit-and-reach0.6680.4460.4465.418 cm
Figure-of-eight walk0.6290.3960.3963.078 s
Timed up-and-go0.6280.3940.3940.702 s
2-min step test0.4580.2100.21012.44 n
MDNRR2Adjusted R2SEE
HGS 0.8850.7830.7833.069 kg
30-s chair stand0.5290.2800.2803.852 n
Chair sit-and-reach0.6460.4170.4175.552 cm
Figure-of-eight walk0.6280.3940.3943.083 s
Timed up-and-go0.6220.3870.3870.707 s
2-min step test0.4510.2030.20312.49 n
SRE, studentized residual; SEE, standard error of estimation.
Table 6. Validation of estimating accuracy.
Table 6. Validation of estimating accuracy.
HGS
(kg)
30 s Chair Stand (n)Chair Sit-and-Reach (cm)Figure of 8 Walk
(s)
Timed Up-and-Go
(s)
2 min Step Test
(n)
MAPE
(%)
SEEMAPE
(%)
SEEMAPE
(%)
SEEMAPE
(%)
SEEMAPE
(%)
SEEMAPE
(%)
SEE
MLR0.1604.2160.2064.21420.126.3150.1003.5650.0890.8220.10013.13
SVR0.1574.1470.2024.18319.816.2740.0983.5490.0870.8170.09913.03
RF0.1604.2580.2104.30720.756.4550.1023.6490.0910.8400.10313.56
XGBoost0.1584.1600.2054.17819.966.2520.0983.5290.0870.8140.09913.06
DNN0.1574.1350.2054.16920.926.2280.0973.5460.0840.8050.09913.00
MDN0.1584.1410.2144.22822.686.3130.0963.5170.0860.8310.09913.03
MAPE, mean absolute percentage error; MLR, multiple linear regression; SVR, support vector regression; RF, random forest; SEE, standard error of estimation.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Lee, S.-H.; Lee, S.-H.; Kim, S.-W.; Park, H.-Y.; Lim, K.; Jung, H. Estimation of Functional Fitness of Korean Older Adults Using Machine Learning Techniques: The National Fitness Award 2015–2019. Int. J. Environ. Res. Public Health 2022, 19, 9754. https://doi.org/10.3390/ijerph19159754

AMA Style

Lee S-H, Lee S-H, Kim S-W, Park H-Y, Lim K, Jung H. Estimation of Functional Fitness of Korean Older Adults Using Machine Learning Techniques: The National Fitness Award 2015–2019. International Journal of Environmental Research and Public Health. 2022; 19(15):9754. https://doi.org/10.3390/ijerph19159754

Chicago/Turabian Style

Lee, Sang-Hun, Seung-Hun Lee, Sung-Woo Kim, Hun-Young Park, Kiwon Lim, and Hoeryong Jung. 2022. "Estimation of Functional Fitness of Korean Older Adults Using Machine Learning Techniques: The National Fitness Award 2015–2019" International Journal of Environmental Research and Public Health 19, no. 15: 9754. https://doi.org/10.3390/ijerph19159754

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop