Next Article in Journal
Implications of the Matrix Metalloproteinases, Their Tissue Inhibitors and Some Other Inflammatory Mediators Expression Levels in Children Obesity-Related Phenotypes
Next Article in Special Issue
Associations between Chronic Kidney Disease and Migraine Incidence: Findings from a Korean Longitudinal Big Data Study
Previous Article in Journal
Significant Increase in Oxidative Stress Indices in Erythrocyte Membranes of Obese Patients with Metabolically-Associated Fatty Liver Disease
Previous Article in Special Issue
Advancing Precision Medicine in South Tyrol, Italy: A Public Health Development Proposal for a Bilingual, Autonomous Province
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Lowering Barriers to Health Risk Assessments in Promoting Personalized Health Management

1
KakaoHealthCare Corp., Seongnam-si 13529, Gyeonggi-do, Republic of Korea
2
Department of Digital Healthcare, Seoul National University Bundang Hospital, Seongnam-si 13620, Gyeonggi-do, Republic of Korea
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
J. Pers. Med. 2024, 14(3), 316; https://doi.org/10.3390/jpm14030316
Submission received: 28 February 2024 / Revised: 12 March 2024 / Accepted: 14 March 2024 / Published: 18 March 2024
(This article belongs to the Special Issue Precision Medicine for Epidemiology and Public Health)

Abstract

:
This study investigates the feasibility of accurately predicting adverse health events without relying on costly data acquisition methods, such as laboratory tests, in the era of shifting healthcare paradigms towards community-based health promotion and personalized preventive healthcare through individual health risk assessments (HRAs). We assessed the incremental predictive value of four categories of predictor variables—demographic, lifestyle and family history, personal health device, and laboratory data—organized by data acquisition costs in the prediction of the risks of mortality and five chronic diseases. Machine learning methodologies were employed to develop risk prediction models, assess their predictive performance, and determine feature importance. Using data from the National Sample Cohort of the Korean National Health Insurance Service (NHIS), which includes eligibility, medical check-up, healthcare utilization, and mortality data from 2002 to 2019, our study involved 425,148 NHIS members who underwent medical check-ups between 2009 and 2012. Models using demographic, lifestyle, family history, and personal health device data, with or without laboratory data, showed comparable performance. A feature importance analysis in models excluding laboratory data highlighted modifiable lifestyle factors, which are a superior set of variables for developing health guidelines. Our findings support the practicality of precise HRAs using demographic, lifestyle, family history, and personal health device data. This approach addresses HRA barriers, particularly for healthy individuals, by eliminating the need for costly and inconvenient laboratory data collection, advancing accessible preventive health management strategies.

1. Introduction

Recent advancements in biomedicine and information technology have catalyzed a paradigm shift in healthcare, moving from treating the sick in healthcare facilities to preventing illness in healthy individuals through personalized health management in communities, a concept central to P4 (predictive, preventive, personalized, and participatory) medicine [1]. This approach, emphasizing prediction, prevention, personalization, and participation, aims to preemptively identify disease susceptibility and prevent progression through tailored healthcare interventions [2]. The success of P4 medicine increasingly relies on precise health risk assessments (HRAs), leveraging data science, wearable technology, and the Internet of Things (IoT) to predict individual health risks and potential mortality [2,3].
Originally developed in the late 1940s and evolving significantly since the mid-2000s, the applications of HRAs have transitioned from clinical settings to community health promotion programs [4,5,6,7,8,9], involving questionnaires on demographic details, lifestyle factors, medical history, and physiological data to gauge individual health risks [10,11]. Nonetheless, evidence substantiating the predictive accuracy of HRA instruments has remained limited, a situation largely attributable to the scarcity of data linking assessment inputs to health outcomes over extended time frames with regard to issues of data linkage, not to mention the imperative need for cost-effectiveness to rationalize data collection efforts vis-à-vis prediction precision, thereby expanding the instruments’ utility [12].
HRA constitutes a systematic process involving the evaluation of an individual’s health risks based on factors including lifestyle, medical history, and biomarkers [13]. However, the challenge of obtaining these data, especially from healthy individuals, is significant, as evidenced by low participation rates in wellness programs, like the 24% participation in the Annual Wellness Visit by Medicare fee-for-service beneficiaries in 2017 [14].
The objective of this study was to investigate the feasibility of conducting HRAs without relying on high-cost data such as laboratory tests, which often necessitate visits to healthcare facilities. By leveraging machine learning methods, the predictive performance of HRA models with and without laboratory data was compared and the feature importance of the models was analyzed to gain insights useful for developing personalized health management guidelines. The study results indicated that the predictive performances of the models utilizing demographic, lifestyle, family history, and personal health device data, with or without laboratory data, were comparable. Moreover, the models without laboratory data identified important features that were more valuable in developing health guidelines, thus emphasizing modifiable lifestyle factors. These findings could facilitate easier access to personalized health management for healthy individuals, thereby supporting the broader implementation of P4 medicine.

2. Materials and Methods

2.1. Data

Our study utilized the National Health Insurance Service (NHIS)-National Sample Cohort (NSC) provided by the NHIS of Korea, covering nearly all residents except those under a medical aid program funded by general taxation [15]. The NHIS-NSC, a population-based cohort, integrates four key datasets: insurance eligibility, medical check-ups, insurance claims, and death registry data. This cohort represents a 2.2% sample (1 million individuals) of NHIS members from 2002 to 2003, carefully stratified to mirror age, sex, and income distributions in Korea, with data updated through 2019. Our research primarily utilized medical check-up data (2009–2015), insurance claims data (2002–2019), and death registry data (2009–2019) from this cohort.
The NHIS administers biennial medical check-ups for beneficiaries aged 40 and above through the National Health Screening Program (NHSP). This program also includes younger blue-collar workers and household heads. Those in high-risk work environments are eligible for annual check-ups. The NHSP involves laboratory tests and self-reported health behavior and medical history questionnaires.
The insurance claims dataset, processed by the Health Insurance Review and Assessment Service (HIRA), includes details on patient identification, provider information, service descriptions, diagnoses (ICD-10 codes), and total charges. The death registry dataset, sourced from Statistics Korea [16], records the date and cause of death. We excluded deaths due to external causes such as accidents or suicides from our study, in line with Kwon et al.’s criteria [17].
Our initial dataset comprised 489,461 records from the cohort that had medical check-ups conducted between 2009 and 2012. After applying various exclusion criteria, the final dataset included 425,148 records. Exclusions were made for individuals under 30 years old (as the NHSP primarily targets adults over 40), records with character values in birth year fields, missing data, and records with extreme values indicating probable typographical errors. Figure 1 illustrates the schematic diagram of the study dataset.

2.2. Variables

In this study, we evaluated the health risk of individuals by quantifying the likelihood of future adverse health events, such as mortality and chronic diseases, within a predefined time frame. We accomplished this by utilizing machine learning models to predict the incidence of these events.
We categorized predictor variables into five distinct groups: demographic variables (DEMO), lifestyle variables encompassing health behaviors and body measurements (LS), family history variables (FH), personal health device variables (PHD), and laboratory variables (LAB). In Table 1, we present the predictor variables’ definitions, notations and descriptions, as well as the descriptive statistics and frequency distributions for both male and female datasets.
In recent studies, lifestyle variables, which hold the potential guiding personal lifestyle interventions to prevent or treat adverse health events, encompass multiple interconnected aspects such as body weight, body mass index (BMI), and waist circumference [18,19]. Previous studies addressing the global burden of disease have considered smoking, alcohol intake, and substance use as behavioral risk factors in efforts to mitigate health-related losses [20]. Our study incorporates lifestyle variables, including body measurements and health behavior variables; the former are measured during medical check-ups and the latter are derived from self-reported survey responses to NHIS-NSC questionnaires administered during medical check-ups.
We defined health behavior variables across three domains: smoking, alcohol intake, and physical activity; all of these directly influence an individual’s health status [21]. Smoking amount (SMK) is quantified as the cumulative amount of smoking undertaken over an individual’s lifetime in pack-years using Equation (1) [22]. Alcohol intake (DRK) is calculated as the amount of alcohol consumped weekly in bottles using Equation (2) [23]. Physical activity (PA) is computed based on parameters from NHIS-NSC questionnaires, considering light activity, moderate activity, and vigorous activity, and converting them into metabolic equivalents (METs) using Equation (3) [24].
SMK (Pack-year) = # of cigarettes a day × 0.05 × # of years smoked
DRK (Bottle/week) = Mean alcohol intake a day (g) × 0.02 × # of days drank a week
PA (Metabolic equivalents) = # of light activity days a week × 2.9 × 30 + # of moderate activity days a week × 4 × 30 + # of vigorous activity days a week × 7 × 20
Family history variables are represented as (0, 1) indicator variables across four areas of adverse health events: heart disease, stroke, hypertension, and diabetes. These variables are computed using self-reported survey responses obtained during medical check-ups.
With the proliferation of technology and the increased accessibility of medical wearable devices, a growing reservoir of clinical data is now available outside traditional clinical settings. Such data are employed by both patients and healthy individuals to manage their health from the comfort of their homes. We refer to this subset of variables as personal health device variables (PHD), and blood pressure (BP) and fasting blood sugar (FBS) were included in this study. Our study utilizes medical check-up data to compute PHD variables.
On the other hand, we define a category for laboratory variables (LAB), encompassing measurements obtained from blood and urine samples analyzed in clinical laboratories. This includes biomarkers such as cholesterol, aspartate aminotransferase (AST), and hemoglobin (HGB). Data from medical check-ups were employed to calculate the LAB variables used in our study.
Our study focused on predicting mortality and the incidence of five major chronic diseases: heart disease, stroke, cancer, hypertension, and diabetes. These diseases are prominent contributors to global morbidity, disability, and mortality, and pose substantial individual and socioeconomic burdens due to their prolonged management and associated costs [25,26]. By predicting these adverse health events, our models aim to facilitate early detection and personalized risk management, thereby improving public health outcomes and healthcare system cost-effectiveness [27].
The study dataset, compiled from medical check-up data (2009–2012) and claims data (2002–2019), was analyzed to identify these adverse health events. Our approach involved assessing three prediction timeframes, namely three, five, and ten years, starting from the year following the medical check-up. This analysis prioritized data free from recorded health issues up to the year of the check-up, excluding records of adverse health events in or before the check-up year and those indicating mortality within the prediction timeframe.
Chronic disease incidences were determined based on the ICD-10 diagnosis codes present in the claims data and laboratory test outcomes obtained from medical check-ups. Heart disease was identified when ICD-10 codes I20–I25 were recorded as a principal or a secondary diagnosis in the claims data, and ICD-10 codes I60–I69 were associated with stroke, as used in prior studies [28,29,30,31]. Similarly, cancer incidences were detected based on principal or secondary diagnoses in the claims data, focusing on the five most common cancer types by gender [32]: lung (C33, C34), gastric (C16), colorectal (C18, C19, C20), prostate (C61), and liver (C22) cancer for male, and breast (C50, D05), colorectal (C18, C19, C20), gastric (C16), lung (C33, C34), and liver (C22) cancer for female. Thyroid cancer was excluded from the list of adverse health events in this study because the five-year survival rate in Korea is over 99% [33]. The incidences of hypertension and diabetes were established when BP ≥ 140/90 mmHg was recorded in medical check-ups or ICD-10 codes I10–I15 were recorded as a principal or a secondary diagnosis in the claims data during the data search period, and when diabetes with fasting glucose ≥ 126 mg/dL was recorded in the medical check-ups or when ICD-10 codes R81, E10–E14 were recorded as a principal or a secondary diagnosis in the claims data. Details of the number of records and the prevalence of adverse health events in the male and female datasets are presented in Table 2.

2.3. Analytical Models

We designed this study to evaluate whether including predictor variables with higher acquisition costs improves the predictive accuracy of HRAs for the personalized prediction and prevention of adverse health events. We systematically introduced groups of variables one at a time (Models 1–4 in Figure 1) to assess the incremental predictive accuracy gained by adding the groups of variables to the models. Conceptually, the data acquisition costs reflect the financial and logistical burden associated with obtaining the data, as well as the discomfort and inconvenience experienced by individuals during the acquisition process. We posited that acquiring laboratory data would be the most resource-intensive and cumbersome process due to the need for individuals to undergo procedures involving needles and blood extraction [34]. Considering the significantly different characteristics between male and female datasets (Table 1 and Table 2), we conducted separate analyses for each gender.
A comparative analysis of the models enabled us to examine the incremental predictive accuracy introduced by each group of variables. The models were trained on 70% of the dataset and tested on the remaining 30%. The evaluation metrics included the area under the curve (AUC), accuracy, and F1-score [35]. We utilized Youden’s J statistic to determine the optimal threshold for maximizing the accuracy and F1-score. The significance levels of the AUC differences for each model were assessed using DeLong’s method [36,37].
Our primary analytical tool was the XGBoost model, known for its exceptional predictive capabilities [38,39,40,41,42]. To validate the XGBoost results, we also applied logistic regression with stepwise variable selection. Hyperparameter optimization was performed using the grid search method [43,44,45], and multiple hyperparameter combinations were evaluated to compare their predictive performance [46]. We enhanced this process through 10-fold cross-validation. To evaluate the importance of each predictor variable, we conducted a gain analysis using XGBoost’s feature importance algorithm. All computations were performed in R version 4.3.0.

3. Results

Table 1 presents the categories and definitions of the predictor variables, along with the descriptive statistics for continuous variables and the frequency distributions for binary categorical variables, for both males and females. All differences in the statistics between the male and female data were statistically significant at a significance level of α = 1%. The average age for male records was 48.8, while for female records, it was 51.7. Table 2 presents the number of records used in the prediction models and the prevalence of adverse health events for each prediction timeframe, with the significance levels of the differences in prevalence between the male and female data. The lowest prevalence was observed for cancer in the three-year prediction period (1.09% for males and 0.62% for females), while the highest prevalence was noticed for hypertension in the ten-year prediction period (28.02% for males and 23.57% for females).

3.1. Incremental Predictive Performance Achieved by the Inclusion of Groups of Predictor Variables

Our study presents an in-depth analysis of the predictive efficacy of four models (Models 1–4), as detailed in Figure 2 and Table A1 in Appendix A. We evaluated these models based on their area under the curve (AUC), accuracy, and F1-score in the testing datasets. To assess the impact of incorporating different groups of predictor variables, we measured changes in the model performance before and after their addition. DeLong’s test results for the significance of AUC differences are provided in Table A1, with logistic regression results for comparison in Table A2 in Appendix A.
The AUC, a measure of a model’s ability to distinguish between records with and without adverse health event incidences, showed a range of 0.623 (three-year hypertension prediction for males, Model 1) to 0.897 (five-year mortality prediction for males, Model 4). Models 3 and 4 consistently achieved AUCs above 0.7 in all predictions except for female cancer predictions. Accuracy, representing the percentage of correct predictions, varied notably from 0.482 (ten-year cancer prediction for females, Model 4) to 0.830 (ten-year mortality prediction for males, Model 4). The F1-score, indicating the balance between precision and recall, ranged from a low of 0.020 (three-year cancer prediction for females, Model 1) to a high of 0.533 (ten-year hypertension prediction for males, Model 4). An interesting pattern observed in Figure 2 is the improvement in F1-scores with longer prediction timeframes, especially evident in the hypertension and diabetes predictions, as opposed to mortality and cancer.
Upon adding LS and FH variables to Model 2, the AUC values improved for most adverse health event predictions, especially for hypertension and diabetes, demonstrating the value of these variables in enhancing the prediction accuracy. Model 3, which incorporated PHD variables alongside DEMO, LS, and FH variables, showed an increase in AUC values across most predictions, with marked improvements in hypertension and diabetes predictions for both genders. However, the transition from Model 3 to Model 4, involving the addition of LAB variables, resulted in relatively modest improvements in AUC values, with limited gains in accuracy and F1-scores. These findings suggest that the inclusion of LAB variables, despite their high acquisition cost, contributed only marginally to the overall predictive performance for most health risks.
In summary, our analysis demonstrates that while the addition of LS, FH, and PHD variables significantly enhanced the predictive efficacy of our models, the incremental gain from incorporating LAB variables was limited, indicating a nuanced balance between data acquisition costs and the predictive performance in health risk assessments.

3.2. Feature Importance

Table 3 presents the top five significant features in our prediction models, highlighting the proportion of variance each feature explained. This analysis is crucial for developing personalized health promotion guidelines based on individual HRA results. Notably, we focused on the impact of introducing laboratory variables by comparing the key features in Model 3 (without LAB variables) and Model 4 (with LAB variables).
AGE consistently emerges as a dominant feature in most predictions. However, other variables such as SBP and FBS were more significant in predicting hypertension and diabetes, respectively. Body measurements like WC and BMI were prominent predictors for heart disease and stroke, while WT was significant for mortality. Health behavior variables like SMK (for heart disease), PA (for females in Model 2), and DRK (for three-year female cancer) were notable predictors for certain health risks. LAB variables showed varying levels of significance, with their overall contribution to predictive performance being limited when combined with other variable groups. Family history variables consistently appeared among the top five features predicting various health risks, albeit with lower rankings.

4. Discussion

In our study, we categorized the predictor variables of Health Risk Assessment (HRA) models into four tiers based on acquisition costs: demographic (DEMO), lifestyle (LS) and family history (FH), personal health device (PHD), and laboratory (LAB) variables. This categorization enabled us to evaluate the incremental predictive performance of each tier, balancing data acquisition costs against the predictive effectiveness of HRA models.
Our results demonstrated that Model 3, incorporating DEMO, LS, FH, and PHD variables, had a predictive performance comparable to Model 4, which added LAB variables, across various adverse health events and prediction timeframes. Interestingly, even Model 2, which included only DEMO, LS, and FH variables, performed effectively in most predictions. However, the accuracy measures, especially for stroke predictions in females, tended to decrease with the addition of PHD and LAB variables. These findings suggest that excluding costly and inconvenient LAB variables from HRAs does not significantly impair the predictive efficacy, potentially enhancing the accessibility and widespread adoption of personalized healthcare.
Feature importance analyses reinforced the well-established connections between health behavior and outcomes [20,47,48]. Notably, the significant features from Models 2 and 3, encompassing modifiable factors like WC and BMI, provided valuable insights for health guideline development compared to those from Model 4. The associations between PHD variables (SBP, DBP, FBS) and LS variables imply that lifestyle changes can influence PHD variables.
All the models in our study predicted the incidences fairly accurately, with varying degrees of accuracy depending on the type of incidence and prediction timeframe. These findings highlight the effectiveness of our assessment models in formulating personalized health promotion strategies. Notably, the first and third-ranked diseases that incurred the highest expenditures for the National Health Insurance Service of Korea in 2021 were hypertension and type 2 diabetes [49]. While the predictive performance of the models for heart diseases, stroke, and cancer can be deemed decent, further improvements may be necessary depending on the assessment’s purpose.
Additionally, these study findings bear significant implications in the era of technological advancements that enable individuals to access their health data through personal health devices without visiting healthcare facilities [50]. As the availability and reliability of personal health device data continue to improve, the depth of person-generated health data will increase, offering detailed and continuous information [51,52,53]. Our study has shown that increasing the accessibility of health data from personal health devices can be a key factor in HRA, potentially replacing data from clinical settings and expanding the market potential of HRAs.
We conducted an examination of three evaluation measures, namely AUC, accuracy, and F1-score, in the context of risk predictions for six adverse health events across three different prediction timeframes. While our overall findings align with the major trends observed, we did encounter irregular results for a few specific target risks and evaluation measures, particularly the accuracy and F1-scores. These findings underscore the importance of making judicious selections when choosing an appropriate measure in the evaluation of HRA models, depending on the specific target event to be predicted and the intended use of the assessment results. For instance, F1-scores are designed to address issues related to measuring the predictive performance in imbalanced data scenarios, such as the incidences of mortality and cancer. Additionally, when the cost associated with a false negative (missing the incidence of events in the prediction) outweighs the cost of a false positive (predicting negatives as positives), it becomes evident that sensitivity and specificity should not be given equal weight in the evaluation.
This study has limitations. Firstly, the predominance of NHSP participants over 40 years old in our database may limit its generalizability across different age groups and countries. Secondly, there may be an underestimation of disease prevalence, particularly for diseases in which individuals had a disease but were insensitive to symptoms and did not seek care at healthcare facilities. This is particularly relevant for diseases such as diabetes and hypertension, which are known to affect a large proportion of individuals who are unaware of their condition [54,55,56]. Thirdly, the reliance on self-reported questionnaire data for lifestyle and family history variables introduces the potential for omission and recall errors [57,58,59]. We anticipate that this limitation will be addressed as the accessibility and accuracy of wearable and IoT data continue to improve [60,61]. Lastly, we employed body measurements and personal health device data collected in clinical settings for our analyses and assumed their equivalence when measured in non-clinical settings. We expect this assumption would not impact the study’s overall implications, as our focus was on future risk assessment rather than immediate disease diagnosis.
Future research should explore the potential of wearables and IoT data beyond blood pressure and blood sugar measurements in HRAs. These sources of data offer accuracy, automatic reporting, non-invasiveness, and continuous monitoring. Leveraging such high-quality data has the potential to significantly enhance HRAs and contribute to more effective lifestyle modification and health promotion efforts.

5. Conclusions

This study aimed to assess the incremental predictive performance of four tiers of predictor variables—demographic, lifestyle and family history, personal health device, and laboratory—in predicting mortality and five chronic diseases across different timeframes. Our primary goal was to strike a balance between data acquisition costs and prediction accuracy to facilitate the widespread implementation of personalized health promotion strategies through HRAs. Our research yields three significant contributions.
Firstly, our findings indicate that the addition of laboratory variables beyond demographic, lifestyle, family history, and personal health device variables did not significantly improve model performance across all examined health events. This insight suggests that removing the need for costly and inconvenient laboratory data acquisition could lower barriers to HRAs, especially for healthy individuals, thereby enhancing accessibility to personalized health promotion.
Secondly, the models incorporating lifestyle and family history variables alongside demographic variables demonstrated comparable performance to full models when assessing the risk of heart diseases, stroke, and cancer. Notably, for certain assessments like cancer in females, the inclusion of further variables resulted in decreased accuracy and F1-scores. This underscores the reliability of Model 2 when aiming to perform accurate risk assessments for these health events.
Lastly, our analysis of important features from Models 2 and 3, which include modifiable body measurements and health behavior variables, suggests their suitability for designing health guidelines compared to models incorporating laboratory data. This implies that guidance from Models 2 and 3 is more relevant and practical for health practitioners and policymakers in shaping effective personalized health management strategies.
In conclusion, our study’s findings offer valuable insights for healthcare practitioners and policymakers, aiding in the formulation of personalized health promotion strategies without significant data acquisition costs. By extending these strategies to a broader population, including those with limited access to healthcare facilities, we can foster a new era of more accessible and effective personalized health management. As we continue to navigate the delicate balance between data acquisition costs and prediction performance, we anticipate further advancements in personalized healthcare.

Author Contributions

Conceptualization, H.P., S.Y.J. and M.K.H.; methodology, H.P.; software and formal analysis, Y.J., Y.R.M. and T.K.; resources, S.Y.J. and H.H.; data curation, T.K.; writing—original draft preparation, H.P., M.K.H., Y.J., T.K. and S.-Y.S.; writing—review and editing, S.Y.J. and H.H.; visualization, H.P., M.K.H. and Y.J.; supervision, H.P. and S.-Y.S.; project administration, M.K.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Institutional Review Board (IRB) exemption for this study was granted by Seoul National University Bundang Hospital (No. X-2201-7732-902) on 30 December 2021, as the data used were anonymized.

Informed Consent Statement

Informed consent is not applicable to this study as it utilized the National Health Insurance Service (NHIS)-National Sample Cohort (NSC) data, a population-based cohort, which was constructed by the NHIS of Korea using information from insurance eligibility, medical check-ups, insurance claims, and death registry data. It was anonymized and made available to researchers upon request and approval.

Data Availability Statement

The authors do not have the authority to provide access to the NHIS-NSC data to other researchers. Access to the study data is strictly regulated by the NHIS of Korea. Acquiring access entails applying for an account to log in to their server, working exclusively within their server environment, and securing their approval to retrieve results from it.

Conflicts of Interest

M.K.H., Y.R.M., T.K. and S.-Y.S. are/were affiliated with the Technology Lab of KakaoHealthCare (KHC), while H.P. serves as an Executive Adviser and H.H. as the CEO of KHC. However, the business activities of KHC are unrelated to this study. S.Y.J. and Y.J. have received grant support from KHC for endeavors not related to the current research.

Appendix A

Table A1. Predictive performance of the Models 1–4.
Table A1. Predictive performance of the Models 1–4.
Adverse Health EventGenderPre. TimenAUCAccuracyF1-Score
Model
1
Model
2
Model
3
Model
4
Model
1
Model
2
Model
3
Model
4
Model
1
Model
2
Model
3
Model
4
MortalityMale362,8600.8670.877 ***0.8800.894 ***0.7990.7750.7630.8140.0950.0900.0880.107
562,8600.8720.882 ***0.8850.897 ***0.8030.8200.8240.8270.1660.1790.1830.191
1029,0060.8710.8770.881 *0.889 ***0.7620.8050.7870.8300.2710.3030.2900.331
Female362,2770.8580.865 ***0.8670.878 *0.8150.8130.7830.8150.0620.0620.0560.064
562,2770.8630.868 **0.8690.879 ***0.8090.8150.8270.7910.1110.1140.1190.108
1024,4000.8670.870 ***0.872 *0.875 *0.8130.8040.8050.8050.2470.2420.2420.244
Heart diseaseMale356,9060.7000.707 ***0.7100.711 **0.5540.5630.5890.6060.0970.0990.1020.104
556,2650.6960.708 ***0.7100.7110.6180.6110.6060.6040.1540.1560.1570.157
1025,1010.6870.698 ***0.701 **0.7030.5900.6140.6050.6340.2530.2600.2610.265
Female355,7030.7190.728 ***0.7310.733 *0.5710.5580.6230.6190.0990.1000.1060.106
555,3450.7140.723 ***0.7260.727 *0.5830.5690.6050.5840.1510.1520.1570.155
1021,0290.6960.703 *0.705 ***0.7060.5440.5760.6060.5860.2530.2590.2640.261
StrokeMale359,3620.7720.7740.775 **0.7760.7360.6850.6480.6470.1220.1130.1070.107
558,7330.7690.772 ***0.7730.774 *0.6670.6790.6890.6780.1650.1690.1710.169
1026,2040.7490.753 ***0.7540.7550.6920.6680.6790.6760.2830.2780.2820.281
Female357,4080.7520.756 ***0.758 ***0.7590.6910.6630.6490.6440.1330.1280.1260.126
558,7330.7520.755 ***0.7570.757 ***0.7010.6710.6750.6470.1990.1920.1940.188
1021,7040.7270.729 *0.730 **0.7300.6460.6410.6160.6120.3260.3250.3210.320
CancerMale361,5150.7530.761 ***0.7600.7660.6800.6790.6690.6810.0480.0480.0470.048
560,7730.7480.7530.7530.766 **0.6880.6600.6580.6580.0720.0700.0690.070
1026,8770.7290.735 **0.735 *0.7400.6610.6430.6580.6250.1170.1150.1170.113
Female361,3630.6820.680 *0.6810.6840.5740.6140.5760.5820.0200.0210.0200.020
560,8970.6790.6790.6790.683 **0.5880.5890.5870.5660.0320.0320.0320.032
1023,0270.6530.651 **0.6510.652 **0.5830.5340.5540.4820.0630.0610.0620.059
HypertensionMale340,5570.6230.661 ***0.721 ***0.7230.5620.5610.6500.6360.2640.2810.3260.325
540,2790.6350.676 ***0.729 ***0.732 ***0.6300.5990.6480.6390.3160.3400.3820.382
1022,8690.6240.675 ***0.726 ***0.7290.6200.6160.6500.6560.4420.4890.5310.533
Female340,9050.7110.739 ***0.790 ***0.7890.6300.6270.7060.6780.2340.2450.2940.282
540,7810.7110.741 ***0.788 ***0.7890.6400.6570.6970.7070.2980.3180.3610.365
1015,3680.6970.728 ***0.775 ***0.777 ***0.6400.6530.6980.6880.4590.4860.5300.530
DiabetesMale351,0560.6560.686 ***0.733 ***0.7380.6020.6250.6630.6510.1680.1800.2030.202
550,5850.6660.698 ***0.736 ***0.742 **0.5860.6270.6680.6530.2310.2490.2750.274
1022,8690.6480.689 ***0.720 ***0.7300.5970.6060.6330.6790.3640.3910.4110.423
Female350,8090.6910.710 ***0.740 ***0.7440.6180.6090.6800.6780.1700.1740.1970.198
550,5330.6910.713 ***0.739 ***0.7430.6290.5940.6420.6400.2480.2510.2710.271
1019,2680.6740.700 ***0.725 ***0.732 ***0.5780.6120.6650.6730.3980.4160.4360.442
* α = 5%, ** α = 1%, *** α < 0.001, significance of additional AUC tested by DeLong’s method.
Table A2. Predictive performance of Models 1–4 assessed by logistic regressions.
Table A2. Predictive performance of Models 1–4 assessed by logistic regressions.
Adverse Health EventGenderPre. TimenAUC
Model 1Model 2Model 3Model 4
MortalityMale363,0840.8730.880***0.882 0.896***
563,0840.8760.880***0.882**0.892***
1029,0910.8750.880***0.882*0.889***
Female362,0530.8670.869 0.870 0.881**
562,0530.8670.868 0.869*0.875**
1024,3150.8660.867 0.868*0.870
Heart diseaseMale356,8250.7040.714***0.716 0.715
556,1890.7020.712***0.715**0.714
1025,0990.6910.698**0.701**0.700
Female355,7840.7210.726*0.728*0.728
555,4230.7140.721***0.725***0.724
1021,0320.7050.709*0.711*0.709
StrokeMale359,2090.7720.776***0.778**0.776
558,7270.7680.771***0.773**0.773
1026,2840.7470.752***0.753**0.753
Female357,5610.7520.755**0.756*0.754*
557,0500.7590.760*0.761 0.761
1021,6250.7300.732*0.733 0.733
CancerMale361,5000.7570.757 0.757 0.760
560,8010.7530.758**0.758 0.763**
1026,7600.7280.730 0.730 0.734*
Female361,3780.6960.695 0.697 0.698
560,8690.6780.679 0.679 0.683
1023,1450.6630.661 0.661 0.662
HypertensionMale340,4190.6130.651***0.709***0.658***
540,1250.6260.667***0.718***0.672***
1018,4040.6270.673***0.719***0.680***
Female341,0430.7170.743***0.785***0.745***
540,9360.7140.740***0.782***0.743***
1015,3530.7000.732***0.776***0.734***
DiabetesMale351,0480.6530.682***0.721***0.693***
550,7260.6680.697***0.730***0.707***
1022,7950.6490.688***0.716***0.700***
Female350,8180.6900.707***0.736***0.716***
550,3920.6930.711***0.734***0.722***
1019,3420.6730.698***0.718***0.707
* α = 5%, ** α = 1%, *** α < 0.001, significance of additional AUC tested by DeLong’s method.

References

  1. Alonso, S.G.; de la Torre Díez, I.; Zapiraín, B.G. Predictive, Personalized, Preventive and Participatory (4P) Medicine Applied to Telemedicine and eHealth in the Literature. J. Med. Syst. 2019, 43, 140. [Google Scholar] [CrossRef] [PubMed]
  2. Flores, M.; Glusman, G.; Brogaard, K.; Price, N.D.; Hood, L. P4 medicine: How systems medicine will transform the healthcare sector and society. Pers. Med. 2013, 10, 565–576. [Google Scholar] [CrossRef] [PubMed]
  3. Sobradillo, P.; Pozo, F.; Agustí, A. P4 medicine: The future around the corner. Arch. Bronconeumol. 2011, 47, 35–40. [Google Scholar] [CrossRef] [PubMed]
  4. Sadusk, J.F., Jr.; Robbins, L.C. Proposal for health-hazard appraisal in comprehensive health care. JAMA 1968, 203, 1108–1112. [Google Scholar] [CrossRef]
  5. Charlson, M.E.; Pompei, P.; Ales, K.L.; MacKenzie, C.R. A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation. J. Chronic Dis. 1987, 40, 373–383. [Google Scholar] [CrossRef]
  6. Lindström, J.; Tuomilehto, J. The diabetes risk score: A practical tool to predict type 2 diabetes risk. Diabetes Care 2003, 26, 725–731. [Google Scholar] [CrossRef] [PubMed]
  7. Conroy, R.M.; Pyörälä, K.; Fitzgerald, A.P.; Sans, S.; Menotti, A.; De Backer, G.; De Bacquer, D.; Ducimetière, P.; Jousilahti, P.; Keil, U.; et al. Estimation of ten-year risk of fatal cardiovascular disease in Europe: The SCORE project. Eur. Heart J. 2003, 24, 987–1003. [Google Scholar] [CrossRef]
  8. D’Agostino, R.B., Sr.; Vasan, R.S.; Pencina, M.J.; Wolf, P.A.; Cobain, M.; Massaro, J.M.; Kannel, W.B. General cardiovascular risk profile for use in primary care: The Framingham Heart Study. Circulation 2008, 117, 743–753. [Google Scholar] [CrossRef]
  9. Perk, J.; De Backer, G.; Gohlke, H.; Graham, I.; Reiner, Z.; Verschuren, M.; Verschuren, M.; Albus, C.; Benlian, P.; Boysen, G.; et al. European Guidelines on cardiovascular disease prevention in clinical practice (version 2012). The Fifth Joint Task Force of the European Society of Cardiology and Other Societies on Cardiovascular Disease Prevention in Clinical Practice (constituted by representatives of nine societies and by invited experts). Eur. Heart J. 2012, 33, 1635–1701. [Google Scholar] [CrossRef] [PubMed]
  10. Schoenbach, V.J. Appraising health risk appraisal. Am. J. Public Health 1987, 77, 409–411. [Google Scholar] [CrossRef]
  11. Goetzel, R.Z.; Staley, P.; Ogden, L.; Stange, P.V.; Fox, J.; Spangler, J.; Tabrizi, M.; Beckowski, M.; Kowlessar, N.; Glasgow, R.E.; et al. A Framework for Patient-Centered Health Risk Assessments—Providing Health Promotion and Disease Prevention Services to Medicare Beneficiaries. US Department of Health and Human Services, Centers for Disease Control and Prevention: Atlanta, GA, USA, 2011. Available online: http://www.cdc.gov/policy/paeo/hra/frameworkforhra.pdf (accessed on 13 March 2024).
  12. Elstad, M.; Ahmed, S.; Røislien, J.; Douiri, A. Evaluation of the reported data linkage process and associated quality issues for linked routinely collected healthcare data in multimorbidity research: A systematic methodology review. BMJ Open 2023, 13, e069212. [Google Scholar] [CrossRef] [PubMed]
  13. Fletcher, D.J.; Smith, G.L. Health-risk appraisal. Helping patients predict and prevent health problems. Postgrad. Med. 1986, 80, 69–83. [Google Scholar] [CrossRef] [PubMed]
  14. Misra, A.; Lloyd, J.T. Hospital utilization and expenditures among a nationally representative sample of Medicare fee-for-service beneficiaries 2 years after receipt of an Annual Wellness Visit. Prev. Med. 2019, 129, 105850. [Google Scholar] [CrossRef] [PubMed]
  15. Shin, D.W.; Cho, B.; Guallar, E. Korean National Health Insurance Database. JAMA Intern. Med. 2016, 176, 138. [Google Scholar] [CrossRef] [PubMed]
  16. Lee, J.; Lee, J.S.; Park, S.H.; Shin, S.A.; Kim, K. Cohort Profile: The National Health Insurance Service-National Sample Cohort (NHIS-NSC). South Korea Int. J. Epidemiol. 2017, 46, e15. [Google Scholar] [CrossRef] [PubMed]
  17. Kwon, H.-S.; Oh, P.-J.; Kang, M.-Y.; Woo, K.-S. A study on insurance premium rate differentiation by simplified issue insurance product type using the health level scoring model based on national health insurance data. J. Risk Manag. 2021, 32, 99–147. [Google Scholar] [CrossRef]
  18. Galani, C.; Schneider, H. Prevention and treatment of obesity with lifestyle interventions: Review and meta-analysis. Int. J. Public Health 2007, 52, 348–359. [Google Scholar] [CrossRef]
  19. Kyrou, I.; Tsigos, C.; Mavrogianni, C.; Cardon, G.; Van Stappen, V.; Latomme, J.; Tsochev, K.; Nanasi, A.; Semanova, C.; Lamiquiz-Moneo, I.; et al. Sociodemographic and lifestyle-related risk factors for identifying vulnerable groups for type 2 diabetes: A narrative review with emphasis on data from Europe. BMC Endocr. Disord. 2020, 20 (Suppl. S1), 134. [Google Scholar] [CrossRef]
  20. GBD 2017 Risk Factor Collaborators. Global, regional, and national comparative risk assessment of 84 behavioural, environmental and occupational, and metabolic risks or clusters of risks for 195 countries and territories, 1990–2017: A systematic analysis for the Global Burden of Disease Study 2017. Lancet 2018, 392, 1923–1994. [Google Scholar] [CrossRef]
  21. Grembowski, D.; Patrick, D.; Diehr, P.; Durham, M.; Beresford, S.; Kay, E.; Hecht, J. Self-efficacy and health behavior among older adults. J. Health Soc. Behav. 1993, 34, 89–104. [Google Scholar] [CrossRef] [PubMed]
  22. Wood, D.M.; Mould, M.G.; Ong, S.B.; Baker, E.H. “Pack year” smoking histories: What about patients who use loose tobacco? Tob. Control. 2005, 14, 141–142. [Google Scholar] [CrossRef]
  23. Kim, Y.G.; Han, K.D.; Choi, J.I.; Boo, K.Y.; Kim, D.Y.; Lee, K.N.; Shim, J.; Kim, J.S.; Kim, Y.H. Frequent drinking is a more important risk factor for new-onset atrial fibrillation than binge drinking: A nationwide population-based study. EP Europace 2020, 22, 216–224. [Google Scholar] [CrossRef]
  24. Chung, A.E.; Skinner, A.C.; Steiner, M.J.; Perrin, E.M. Physical activity and BMI in a nationally representative sample of children and adolescents. Clin. Pediatr. 2012, 51, 122–129. [Google Scholar] [CrossRef]
  25. Yach, D.; Hawkes, C.; Gould, C.L.; Hofman, K.J. The global burden of chronic diseases: Overcoming impediments to prevention and control. JAMA 2004, 291, 2616–2622. [Google Scholar] [CrossRef]
  26. GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: A systematic analysis for the Global Burden of Disease Study 2019. Lancet 2020, 396, 1204–1222. [Google Scholar] [CrossRef]
  27. Sagner, M.; McNeil, A.; Puska, P.; Auffray, C.; Price, N.D.; Hood, L.; Lavie, C.J.; Han, Z.-G.; Chen, Z.; Brahmachari, S.K.; et al. The P4 health spectrum—A predictive, preventive, personalized and participatory continuum for promoting healthspan. Prog. Cardiovasc. Dis. 2017, 59, 506–521. [Google Scholar] [CrossRef] [PubMed]
  28. Lee, S.; Lee, H.; Kim, H.S.; Koh, S.B. Incidence, Risk factors, and prediction of myocardial infarction and stroke in farmers: A Korean nationwide population-based study. J. Prev. Med. Public Health 2020, 53, 313–322. [Google Scholar] [CrossRef] [PubMed]
  29. Kim, K.I.; Ji, E.; Choi, J.Y.; Kim, S.W.; Ahn, S.; Kim, C.H. Ten-year trends of hypertension treatment and control rate in Korea. Sci. Rep. 2021, 11, 6966. [Google Scholar] [CrossRef]
  30. Yeo, Y.; Shin, D.W.; Han, K.; Park, S.H.; Jeon, K.-H.; Lee, J.; Kim, J.; Shin, A. Individual 5-year lung cancer risk prediction model in Korea using a nationwide representative database. Cancers 2021, 13, 3496. [Google Scholar] [CrossRef] [PubMed]
  31. Lee, J.; Choi, E.; Choo, E.; Linda, S.; Jang, E.J.; Lee, I.H. Relationship between continuity of care and clinical outcomes in patients with dyslipidemia in Korea: A real world claims database study. Sci. Rep. 2022, 12, 3062. [Google Scholar] [CrossRef]
  32. Jung, K.W.; Won, Y.J.; Kong, H.J.; Lee, E.S. Prediction of cancer incidence and mortality in Korea, 2019. Cancer Res. Treat. 2019, 51, 431–437. [Google Scholar] [CrossRef]
  33. Yi, K.H. The revised 2016 Korean Thyroid Association guidelines for thyroid nodules and cancers: Differences from the 2015 American Thyroid Association guidelines. Endocrinol. Metab. 2016, 31, 373–378. [Google Scholar] [CrossRef]
  34. Grannemann, T.W.; Brown, R.S.; Pauly, M.V. Estimating hospital costs. A multiple-output analysis. J. Health Econ. 1986, 5, 107–127. [Google Scholar] [CrossRef] [PubMed]
  35. Youden, W.J. Index for rating diagnostic tests. Cancer 1950, 3, 32–35. [Google Scholar] [CrossRef] [PubMed]
  36. Robin, X.; Turck, N.; Hainard, A.; Tiberti, N.; Lisacek, F.; Sanchez, J.-C.; Müller, M. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform. 2011, 12, 77. [Google Scholar] [CrossRef]
  37. Sun, X.; Xu, W. Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Process. Lett. 2014, 21, 1389–1393. [Google Scholar] [CrossRef]
  38. Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
  39. Ye, C.; Fu, T.; Hao, S.; Zhang, Y.; Wang, O.; Jin, B.; Xia, M.; Liu, M.; Zhou, X.; Wu, Q.; et al. Prediction of incident hypertension within the next year: Prospective study using statewide electronic health records and machine learning. J. Med. Internet Res. 2018, 20, e22. [Google Scholar] [CrossRef]
  40. Taninaga, J.; Nishiyama, Y.; Fujibayashi, K.; Gunji, T.; Sasabe, N.; Iijima, K.; Naito, T. Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study. Sci. Rep. 2019, 9, 12384. [Google Scholar] [CrossRef] [PubMed]
  41. Inoue, T.; Ichikawa, D.; Ueno, T.; Cheong, M.; Inoue, T.; Whetstone, W.D.; Endo, T.; Nizuma, K.; Tominaga, T. XGBoost, a machine learning method, predicts neurological recovery in patients with cervical spinal cord injury. Neurotrauma Rep. 2020, 1, 8–16. [Google Scholar] [CrossRef]
  42. Binson, V.; Subramoniam, M.; Sunny, Y.; Mathew, L. Prediction of pulmonary diseases with electronic nose using SVM and XGBoost. IEEE Sens. J. 2021, 21, 20886–20895. [Google Scholar] [CrossRef]
  43. Ogunleye, A.; Wang, Q.G. XGBoost model for chronic kidney disease diagnosis. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020, 17, 2131–2140. [Google Scholar] [CrossRef]
  44. Budholiya, K.; Shrivastava, S.K.; Sharma, V. An optimized XGBoost based diagnostic system for effective prediction of heart disease. J. King Saud. Univ. Comput. Info Sci. 2020, 34, 4514–4523. [Google Scholar] [CrossRef]
  45. Li, Q.; Yang, H.; Wang, P.; Liu, X.; Lv, K.; Ye, M. XGBoost-based and tumor-immune characterized gene signature for the prediction of metastatic status in breast cancer. J. Transl. Med. 2022, 20, 177. [Google Scholar] [CrossRef] [PubMed]
  46. Putatunda, S.; Rama, K. A comparative analysis of hyperopt as against other approaches for hyper-parameter optimization of XGBoost. In Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, Shanghai, China, 28–30 November 2018; pp. 6–10. [Google Scholar] [CrossRef]
  47. Gong, Q.; Zhang, P.; Wang, J.; Ma, J.; An, Y.; Chen, Y.; Zhang, B.; Feng, X.; Li, H.; Chen, X.; et al. Morbidity and mortality after lifestyle intervention for people with impaired glucose tolerance: 30-year results of the Da Qing Diabetes Prevention Outcome Study. Lancet Diabetes Endocrinol. 2019, 7, 452–461. [Google Scholar] [CrossRef] [PubMed]
  48. Valenzuela, P.L.; Carrera-Bastos, P.; Gálvez, B.G.; Ruiz-Hurtado, G.; Ordovas, J.M.; Ruilope, L.M.; Lucia, A. Lifestyle interventions for the prevention and treatment of hypertension. Nat. Rev. Cardiol. 2021, 18, 251–275. [Google Scholar] [CrossRef] [PubMed]
  49. Korea National Health Insurance Services. 2021 National Health Insurance Statistical Yearbook. Available online: https://www.hira.or.kr/bbsDummy.do?pgmid=HIRAA020045020000&brdScnBltNo=4&brdBltNo=2314&pageIndex=1&pageIndex2=1#none (accessed on 6 March 2023).
  50. Roehrs, A.; da Costa, C.A.; Righi, R.D.; de Oliveira, K.S. Personal health records: A systematic literature review. J. Med. Internet Res. 2017, 19, e13. [Google Scholar] [CrossRef] [PubMed]
  51. Vuong, A.M.; Huber, J.C., Jr.; Bolin, J.N.; Ory, M.G.; Moudouni, D.M.; Helduser, J.; Begaye, D.; Bonner, T.J.; Forjuoh, S.N. Factors affecting acceptability and usability of technological approaches to diabetes self-management: A case study. Diabetes Technol. Ther. 2012, 14, 1178–1182. [Google Scholar] [CrossRef]
  52. Sora, N.D.; Shashpal, F.; Bond, E.A.; Jenkins, A.J. Insulin pumps: Review of technological advancement in diabetes management. Am. J. Med. Sci. 2019, 358, 326–331. [Google Scholar] [CrossRef]
  53. Grant, A.K.; Golden, L. Technological advancements in the management of type 2 diabetes. Curr. Diabetes Rep. 2019, 19, 163. [Google Scholar] [CrossRef]
  54. Shin, J.Y. Trends in the prevalence and management of diabetes in Korea: 2007–2017. Epidemiol. Health 2019, 41, e2019029. [Google Scholar] [CrossRef]
  55. Kim, D.H.; Kim, B.; Han, K.; Kim, S.W. The relationship between metabolic syndrome and obstructive sleep apnea syndrome: A nationwide population-based study. Sci. Rep. 2021, 11, 8751. [Google Scholar] [CrossRef] [PubMed]
  56. Korea Diabetes Association. Diabetes Fact Sheet in Korea. Available online: https://www.diabetes.or.kr/bbs/?code=fact_sheet&mode=view&number=2390&page=1&code=fact_sheet (accessed on 6 March 2023).
  57. Smith, K.W.; McKinlay, S.M.; Thorington, B.D. The validity of health risk appraisal instruments for assessing coronary heart disease risk. Am. J. Public Health 1987, 77, 419–424. [Google Scholar] [CrossRef]
  58. Lemmens, P.H.; Volovics, L.; Haan, Y.D. Measurement of lifetime exposure to alcohol: Data quality of a self-administered questionnaire and impact on risk assessment. Contemp. Drug Prob. 1997, 24, 581–600. [Google Scholar] [CrossRef]
  59. Gallagher, K.A.; Sonneville, K.R.; Hazzard, V.M.; Carson, T.L.; Needham, B.L. Evaluating gender bias in an eating disorder risk assessment questionnaire for athletes. Eat. Disord. 2021, 29, 29–41. [Google Scholar] [CrossRef]
  60. Brown, J.; Cug, J.; Kolencik, J. Internet of things-based smart healthcare systems: Real-time patient-generated medical data from networked wearable devices. Am. J. Med. Res. 2020, 7, 21–26. [Google Scholar] [CrossRef]
  61. Miyaji, T.; Kawaguchi, T.; Azuma, K.; Suzuki, S.; Sano, Y.; Akatsu, M.; Torii, A.; Kamimura, T.; Ozawa, Y.; Tsuchida, A.; et al. Patient-generated health data collection using a wearable activity tracker in cancer patients—A feasibility study. Support. Care Cancer 2020, 28, 5953–5961. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The schematic diagram of the study dataset and analytical models.
Figure 1. The schematic diagram of the study dataset and analytical models.
Jpm 14 00316 g001
Figure 2. Predictive performance of the Models 1–4.
Figure 2. Predictive performance of the Models 1–4.
Jpm 14 00316 g002
Table 1. Definition of predictor variables with descriptive statistics/frequency distribution for male and female datasets (N = 425,148).
Table 1. Definition of predictor variables with descriptive statistics/frequency distribution for male and female datasets (N = 425,148).
CategoryVariableDefinitionMean ± STD/Freq %p-Value
Male
(n = 214,613)
Female
(n = 210,535)
Demographic (DEMO)AGEAge (years)48.8 ± 12.951.7 ± 12.6<0.001
Health behavior (LS)SMKSmoking amount (pack-year)11.9 ± 14.00.4 ± 2.6<0.001
DRKAlcohol intake (bottle/week)1.6 ± 2.20.2 ± 0.7<0.001
PAPhysical activity (MET-minute scores, IPAQ analysis)540.3 ± 528.7457.1 ± 498.0<0.001
Body measurement (LS)HTHeight (cm)169.6 ± 6.4156.1 ± 6.1<0.001
WTWeight (kg)69.9 ± 10.557.2 ± 8.5<0.001
WCWaist circumference (cm)83.9 ± 7.577.2 ± 8.7<0.001
BMIBody mass index (kg/m2)24.3 ± 3.023.5 ± 3.3<0.001
Family history (FH)FH_HTFamily history of heart diseases3.4%3.7%<0.001
FH_STRFamily history of stroke6.3%6.6%0.004
FH_HTNFamily history of hypertension10.6%13.7%<0.001
FH_DMFamily history of diabetes8.8%10.0%<0.001
Personal health device (PHD)SBPSystolic blood pressure (mmHg)125.0 ± 14.3120.8 ± 16.0<0.001
DBPDiastolic blood pressure (mmHg)78.2 ± 9.974.7 ± 10.2<0.001
FBSFasting blood sugar (mg/dL)100.4 ± 25.396.3 ± 21.3<0.001
Laboratory (LAB)TCHOLTotal cholesterol (mg/dL)194.9 ± 35.6197.9 ± 37.0<0.001
HDLHigh density lipoprotein (mg/dL)51.9 ± 12.957.8 ± 13.9<0.001
LDLLow density lipoprotein (mg/dL)113.6 ± 32.6117.2 ± 33.7<0.001
TGTriglycerides (mg/dL)147.3 ± 82.4113.6 ± 64.8<0.001
HGBHemoglobin (g/dL)14.9 ± 1.212.8 ± 1.2<0.001
SCRCreatinine (mg/dL)1.0 ± 0.20.8 ± 0.2<0.001
EGFR 1Glomerular filtration rate (GFR) ≥ 9045.7%48.2%<0.001
EGFR 260 ≤ GFR < 9050.3%46.0%
EGFR 330 ≤ GFR < 603.9%5.7%
EGFR 415 ≤ GFR < 300.0%0.1%
ASTAspartate aminotransferase (U/L)26.7 ± 11.723.1 ± 9.8<0.001
ALTAlanine aminotransferase (U/L)28.8 ± 18.520.2 ± 13.1<0.001
GGTGamma glutamyl transferase (U/L)45.4 ± 37.522.3 ± 18.9<0.001
UPROT 0Urine protein 0 g/day94.9%95.3%<0.001
UPROT 1<0.52.4%2.3%
UPROT 20.5 ≤ UPROT < 11.8%1.6%
UPROT 31 ≤ UPROT < 20.7%0.6%
UPROT 42 ≤ UPROT0.2%0.1%
Table 2. Number of records (n) and prevalence (%) of adverse health events in the prediction models.
Table 2. Number of records (n) and prevalence (%) of adverse health events in the prediction models.
Adverse Health EventThree Year Five Year Ten Year
MaleFemalep-valueMaleFemalep-ValueMaleFemalep-Value
nPrev. (%)nPrev. (%)nPrev. (%)nPrev. (%)nPrev. (%)nPrev. (%)
Mortality209,5321.37207,5890.81<0.001212,5222.48209,7401.52<0.00196,6875.3281,3323.91<0.001
Heart diseases189,6883.16185,6753.060.079187,5525.08184,4854.950.08883,6709.7270,0979.920.239
Stroke197,8742.73191,3593.43<0.001195,7764.42190,1465.52<0.00187,3498.8672,34712.24<0.001
Cancer205,0501.09204,5430.62<0.001202,5771.71202,9880.99<0.00189,5903.1676,7572.12<0.001
Hypertension135,18812.41136,3518.25<0.001134,26215.48135,93911.41<0.00161,29828.0251,22623.57<0.001
Diabetes170,1876.27169,3655.77<0.001168,6159.17168,4429.150.90076,23118.0164,22519.37<0.001
Table 3. The top five important features and the proportion of model variance explained by each feature.
Table 3. The top five important features and the proportion of model variance explained by each feature.
Adverse Health EventGenderFea. Rank.Three YearFive YearTen Year
Model 2Model 3Model 4Model 2Model 3Model 4Model 2Model 3Model 4
MortalityMale1AGE(0.816)AGE(0.780)AGE(0.646)AGE(0.863)AGE(0.832)AGE(0.729)AGE(0.910)AGE(0.850)AGE(0.769)
2WT(0.056)WT(0.051)HGB(0.050)WT(0.046)WT(0.043)HGB(0.045)SMK(0.022)WT(0.030)HGB(0.027)
3BMI(0.036)BMI(0.032)AST(0.035)SMK(0.024)FBS(0.028)GGT(0.034)WT(0.022)FBS(0.025)GGT(0.024)
4PA(0.026)FBS(0.030)GGT(0.032)BMI(0.022)SMK(0.023)WT(0.029)BMI(0.021)SMK(0.022)WT(0.023)
5SMK(0.024)PA(0.022)WT(0.031)PA(0.019)BMI(0.021)FBS(0.020)PA(0.010)BMI(0.020)AST(0.019)
Female1AGE(0.875)AGE(0.840)AGE(0.723)AGE(0.906)AGE(0.873)AGE(0.706)AGE(0.853)AGE(0.858)AGE(0.776)
2WT(0.036)WT(0.035)HGB(0.051)WT(0.025)WT(0.025)HGB(0.039)BMI(0.039)FBS(0.030)HGB(0.020)
3BMI(0.030)FBS(0.029)GGT(0.046)BMI(0.021)FBS(0.023)GGT(0.038)WT(0.025)WT(0.024)FBS(0.018)
4PA(0.027)BMI(0.026)WT(0.030)PA(0.016)HT(0.016)FBS(0.024)WC(0.024)BMI(0.019)GGT(0.018)
5WC(0.010)PA(0.025)FBS(0.024)WC(0.013)DBP(0.015)WT(0.024)PA(0.021)WC(0.014)WT(0.017)
Heart diseaseMale1AGE(0.739)AGE(0.683)AGE(0.617)AGE(0.869)AGE(0.835)AGE(0.819)AGE(0.809)AGE(0.767)AGE(0.734)
2BMI(0.071)BMI(0.045)WC(0.042)WC(0.060)WC(0.047)WC(0.029)WC(0.067)BMI(0.037)WC(0.054)
3WC(0.061)SBP(0.041)BMI(0.038)BMI(0.045)BMI(0.036)FH_HT(0.029)BMI(0.053)WT(0.033)BMI(0.033)
4PA(0.027)WC(0.039)SBP(0.030)SMK(0.009)SBP(0.029)BMI(0.027)SMK(0.030)FBS(0.033)SBP(0.029)
5SMK(0.026)WT(0.034)FBS(0.028)FH_HT(0.008)FBS(0.018)SBP(0.024)HT(0.011)WC(0.033)FBS(0.029)
Female1AGE(0.890)AGE(0.851)AGE(0.824)AGE(0.890)AGE(0.851)AGE(0.824)AGE(0.857)AGE(0.819)AGE(0.786)
2SMK(0.184)WC(0.044)WC(0.045)SMK(0.184)WC(0.044)WC(0.045)BMI(0.043)SBP(0.035)SBP(0.029)
3WC(0.051)SBP(0.039)DBP(0.021)WC(0.051)SBP(0.039)DBP(0.021)WC(0.037)WC(0.033)BMI(0.024)
4BMI(0.021)BMI(0.020)SBP(0.021)BMI(0.021)BMI(0.020)SBP(0.021)WT(0.021)BMI(0.031)WC(0.023)
5WT(0.016)FBS(0.014)BMI(0.017)WT(0.016)FBS(0.014)BMI(0.017)SMK(0.015)FBS(0.021)GGT(0.020)
StrokeMale1AGE(0.973)AGE(0.957)AGE(0.948)AGE(0.923)AGE(0.896)AGE(0.866)AGE(0.934)AGE(0.915)AGE(0.892)
2WC(0.014)SBP(0.011)SBP(0.009)WC(0.023)FBS(0.019)FBS(0.014)WC(0.024)WC(0.014)WC(0.016)
3SMK(0.004)FBS(0.008)FBS(0.008)SMK(0.011)SBP(0.016)SBP(0.013)SMK(0.010)SBP(0.011)SBP(0.010)
4BMI(0.004)DBP(0.007)WC(0.006)PA(0.011)WC(0.015)WC(0.012)DRK(0.007)FBS(0.011)FBS(0.009)
5PA(0.003)PA(0.005)DBP(0.005)BMI(0.011)SMK(0.010)GGT(0.010)FH_STR(0.007)HT(0.011)DBP(0.008)
Female1AGE(0.929)AGE(0.901)AGE(0.877)AGE(0.917)AGE(0.889)AGE(0.860)AGE(0.973)AGE(0.960)AGE(0.950)
2WC(0.022)SBP(0.019)SBP(0.017)WC(0.021)SBP(0.019)SBP(0.014)WC(0.009)SBP(0.009)SBP(0.008)
3BMI(0.015)WC(0.013)WC(0.011)BMI(0.018)WC(0.017)TG(0.013)BMI(0.008)FBS(0.008)WC(0.007)
4PA(0.010)FBS(0.012)DBP(0.010)PA(0.011)FBS(0.015)WC(0.010)PA(0.002)WT(0.005)TG(0.006)
5HT(0.008)DBP(0.012)TG(0.009)HT(0.010)BMI(0.013)FBS(0.010)WT(0.002)DBP(0.005)FBS(0.005)
CancerMale1AGE(0.869)AGE(0.855)AGE(0.782)AGE(0.916)AGE(0.895)AGE(0.821)AGE(0.872)AGE(0.847)AGE(0.776)
2SMK(0.029)SMK(0.026)AST(0.028)SMK(0.030)SMK(0.029)LDL(0.029)SMK(0.041)SMK(0.038)LDL(0.041)
3BMI(0.029)DBP(0.019)SMK(0.021)WT(0.011)FBS(0.013)AST(0.026)WC(0.022)FBS(0.021)SMK(0.032)
4WC(0.021)WC(0.017)HGB(0.020)DRK(0.011)SBP(0.012)SMK(0.025)BMI(0.021)BMI(0.017)FBS(0.019)
5WT(0.016)BMI(0.016)LDL(0.020)WC(0.011)DRK(0.010)TG(0.014)PA(0.015)WC(0.017)HGB(0.016)
Female1AGE(0.908)AGE(0.882)AGE(0.777)AGE(0.962)AGE(0.954)AGE(0.886)AGE(0.772)AGE(0.711)AGE(0.589)
2DRK(0.025)FBS(0.029)AST(0.050)PA(0.099)SMK(0.112)AST(0.048)WC(0.042)DBP(0.048)LDL(0.046)
3WC(0.016)DRK(0.021)ALT(0.031)SMK(0.018)FBS(0.011)HGB(0.013)PA(0.041)BMI(0.040)TG(0.045)
4BMI(0.015)SBP(0.015)TCHOL(0.026)WC(0.007)SBP(0.005)LDL(0.013)BMI(0.040)SBP(0.034)AST(0.044)
5SMK(0.014)SMK(0.011)HGB(0.023)BMI(0.006)PA(0.004)SMK(0.012)HT(0.039)WC(0.030)DBP(0.030)
HypertensionMale1AGE(0.546)SBP(0.350)AGE(0.286)AGE(0.574)AGE(0.318)AGE(0.277)AGE(0.513)SBP(0.330)AGE(0.304)
2BMI(0.200)AGE(0.299)FBS(0.270)BMI(0.191)SBP(0.315)SBP(0.215)BMI(0.227)AGE(0.312)DBP(0.222)
3WC(0.124)DBP(0.165)SBP(0.189)WC(0.124)DBP(0.137)DBP(0.163)WC(0.139)DBP(0.139)SBP(0.170)
4DRK(0.074)WC(0.047)DBP(0.056)DRK(0.061)BMI(0.053)BMI(0.068)DRK(0.057)WC(0.073)BMI(0.092)
5HT(0.021)BMI(0.042)WC(0.046)HT(0.018)FBS(0.047)FBS(0.043)FH_HTN(0.019)BMI(0.066)WC(0.053)
Female1AGE(0.709)SBP(0.378)SBP(0.401)AGE(0.711)SBP(0.378)SBP(0.369)AGE(0.484)SBP(0.349)SBP(0.303)
2BMI(0.161)AGE(0.352)AGE(0.345)BMI(0.160)AGE(0.366)AGE(0.355)BMI(0.192)AGE(0.263)AGE(0.229)
3WC(0.060)DBP(0.121)DBP(0.097)WC(0.082)DBP(0.107)DBP(0.103)WC(0.095)BMI(0.086)BMI(0.064)
4HT(0.019)BMI(0.051)BMI(0.048)FH_HTN(0.016)BMI(0.054)BMI(0.049)PA(0.070)WC(0.059)DBP(0.046)
5FH_HTN(0.017)WC(0.041)WC(0.028)HT(0.013)WC(0.048)WC(0.043)HT(0.051)DBP(0.059)WC(0.043)
DiabetesMale1AGE(0.657)FBS(0.468)FBS(0.381)AGE(0.668)AGE(0.409)AGE(0.389)AGE(0.536)AGE(0.320)AGE(0.311)
2WC(0.129)AGE(0.318)AGE(0.307)WC(0.129)FBS(0.363)FBS(0.364)BMI(0.185)FBS(0.314)FBS(0.244)
3BMI(0.118)WC(0.061)SBP(0.054)BMI(0.124)WC(0.064)WC(0.052)WC(0.142)BMI(0.113)GGT(0.086)
4FH_DM(0.028)BMI(0.050)WC(0.042)FH_DM(0.022)BMI(0.058)GGT(0.045)SMK(0.036)WC(0.079)WC(0.063)
5SMK(0.027)SMK(0.025)GGT(0.040)SMK(0.020)SBP(0.056)ALT(0.041)FH_DM(0.029)SMK(0.033)SBP(0.049)
Female1AGE(0.731)AGE(0.500)AGE(0.461)AGE(0.739)AGE(0.535)AGE(0.494)AGE(0.650)AGE(0.463)AGE(0.404)
2BMI(0.113)FBS(0.232)FBS(0.265)BMI(0.117)FBS(0.200)FBS(0.253)BMI(0.157)FBS(0.160)FBS(0.225)
3WC(0.095)HT(0.102)GGT(0.061)WC(0.099)HT(0.090)WC(0.051)WC(0.115)SBP(0.122)WC(0.074)
4FH_DM(0.018)WC(0.048)WC(0.051)FH_DM(0.019)BMI(0.045)BMI(0.044)FH_DM(0.022)WC(0.087)BMI(0.064)
5HT(0.012)BMI(0.033)BMI(0.029)HT(0.009)WC(0.033)GGT(0.042)PA(0.015)BMI(0.058)TG(0.059)
Feature names: AGE: Age; ALT: Alanine aminotransferase; BMI: Body mass index; DBP: Diastolic blood pressure; DRK: Alcohol intake; EGFR: Estimated glomerular filtration rate; FBS: Fasting blood sugar; FH_DM: Family history of diabetes; FH_HTN: Family history of hypertension; GGT: Gamma glutamyl transferase; HGB: Hemoglobin; HT: Height; PA: Physical activity; SBP: Systolic blood pressure; SCR: Creatinine; SMK: Smoking amount; TCHOL: Total cholesterol; TG: Triglycerides; WC: Waist circumference; WT: Weight.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Park, H.; Jung, S.Y.; Han, M.K.; Jang, Y.; Moon, Y.R.; Kim, T.; Shin, S.-Y.; Hwang, H. Lowering Barriers to Health Risk Assessments in Promoting Personalized Health Management. J. Pers. Med. 2024, 14, 316. https://doi.org/10.3390/jpm14030316

AMA Style

Park H, Jung SY, Han MK, Jang Y, Moon YR, Kim T, Shin S-Y, Hwang H. Lowering Barriers to Health Risk Assessments in Promoting Personalized Health Management. Journal of Personalized Medicine. 2024; 14(3):316. https://doi.org/10.3390/jpm14030316

Chicago/Turabian Style

Park, Hayoung, Se Young Jung, Min Kyu Han, Yeonhoon Jang, Yeo Rae Moon, Taewook Kim, Soo-Yong Shin, and Hee Hwang. 2024. "Lowering Barriers to Health Risk Assessments in Promoting Personalized Health Management" Journal of Personalized Medicine 14, no. 3: 316. https://doi.org/10.3390/jpm14030316

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop