Next Article in Journal
Work-Related Stressors among the Healthcare Professionals in the Fever Clinic Centers for Individuals with Symptoms of COVID-19
Next Article in Special Issue
The Prediction Model of Medical Expenditure Appling Machine Learning Algorithm in CABG Patients
Previous Article in Journal
Machine Learning Prediction Models for Chronic Kidney Disease Using National Health Insurance Claim Data in Taiwan
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine-Learning Techniques for Feature Selection and Prediction of Mortality in Elderly CABG Patients

1
Graduate Institute of Business Administration, College of Management, Fu Jen Catholic University, New Taipei City 24205, Taiwan
2
Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 242062, Taiwan
3
Cardiovascular Research Center, Wan Fang Hospital, Taipei Medical University, Taipei 242, Taiwan
4
Taipei Heart Institute, Taipei Medical University, Taipei 242, Taiwan
5
Department of Surgery, School of Medicine, College of Medicine, Taipei Medical University, Taipei 242, Taiwan
6
Division of Cardiovascular Surgery, Department of Surgery, Wan Fang Hospital, Taipei Medical University, Taipei 242, Taiwan
7
Master Program of Big Data Analysis in Biomedicine, College of Medicine, Fu Jen Catholic University, New Taipei City 242062, Taiwan
*
Authors to whom correspondence should be addressed.
The author has contributed equally to this work and share first authorship.
Healthcare 2021, 9(5), 547; https://doi.org/10.3390/healthcare9050547
Submission received: 21 March 2021 / Revised: 24 April 2021 / Accepted: 26 April 2021 / Published: 7 May 2021

Abstract

:
Coronary artery bypass surgery grafting (CABG) is a commonly efficient treatment for coronary artery disease patients. Even if we know the underlying disease, and advancing age is related to survival, there is no research using the one year before surgery and operation-associated factors as predicting elements. This research used different machine-learning methods to select the features and predict older adults’ survival (more than 65 years old). This nationwide population-based cohort study used the National Health Insurance Research Database (NHIRD), the largest and most complete dataset in Taiwan. We extracted the data of older patients who had received their first CABG surgery criteria between January 2008 and December 2009 (n = 3728), and we used five different machine-learning methods to select the features and predict survival rates. The results show that, without variable selection, XGBoost had the best predictive ability. Upon selecting XGBoost and adding the CHA2DS score, acute pancreatitis, and acute kidney failure for further predictive analysis, MARS had the best prediction performance, and it only needed 10 variables. This study’s advantages are that it is innovative and useful for clinical decision making, and machine learning could achieve better prediction with fewer variables. If we could predict patients’ survival risk before a CABG operation, early prevention and disease management would be possible.

1. Introduction

Advancing age leads to markedly increasing coronary artery disease (CAD), a common heart disease and the leading global cause of mortality [1], significantly increasing the global healthcare burden [2]. Coronary artery bypass grafting (CABG) is an efficient treatment for patients with CAD in myocardial revascularization [3]. The risk of CABG surgery is approximately 1–3%. CABG is also high-cost surgery [4]. In recent years, various studies evaluated CABG risk on survival rate, medical cost, and follow-up of different CAD treatment strategies [3,4,5,6,7,8].
However, there is no complete research using an extensive database to build an integral machine-learning model for predicting and evaluating which risk factors could preoperatively affect older adults’ survival rate. Thus, this research used the National Health Insurance Research Database (NHIRD), with a sufficiently large data sample of Taiwan, which provided all real and large healthcare data, including patients’ original clinical records, treatments, inhospital expenditures, and diagnosis codes. In addition to the patients’ basic characteristics and disease history, we used variables before one year and during the operation as predictive indicators. Therefore, if we could predict patients’ mortality risk before a CABG operation, take early prevention and disease management for those high-risk patients would be possible. Our studies used multistage selection, which contains feature-searching methods and prediction-model development based on logistic regression (LGR), random forest (RF), classification regression tree (CART), extreme gradient boosting (XGBoost), and multivariate adaptive regression splines (MARS). The model receives as input several preoperative medical factors and their characteristics. To find the correct factors that affect the outcomes and reduce distortion, model performance relies on feature selection (Nguyen, 2010).
There were three purposes of this retrospective population-based study. The first research object was to analyze older adults’ survival rate after CABG surgery within a 10-year follow-up. Second, we used different feature-selection methods to investigate which risk factors were crucial variables that could affect survival. Lastly, we aimed to determine the best prediction survival model for older adults receiving CABG procedures, and to identify the associated factors in the prediction model that determine surgery risk factors.

2. Materials and Methods

2.1. Data Source

There are around 23 million people in Taiwan. The National Health Insurance Research Database (NHIRD) enrolls nearly 99% of Taiwanese enrollees in the National Health Insurance (NHI) program [9]. NHIRD contains the personal information of patients who participate in the NHI program, including outpatient and inpatient information, and surgical procedure codes, and it enables the continuous tracking of all claimed records from each patient. The diagnosed codes were International Classification of Diseases, Ninth Revision; Clinical Modification (ICD-9-CM); the Tenth Revision (ICD-10-CM) in Taiwan was fully adopted from 1 January 2016. According to the abovementioned advantages, the NHIRD provides complete and comprehensive long-term follow-up for each patient. Demographic ID information in NHIRD was anonymized and deidentified. This study was exempted from a full ethical review by the Fu Jen Catholic University ethics institutional review board in Taiwan (C108121), and the requirement to obtain informed consent was waived.

2.2. Study Population

To understand the important factors that affect older patients’ survival rate after CABG surgery, this retrospective cohort study enrolled patients over 65 years old from 1 January 2008 to 31 December 2009, from the NHIRD, Taiwan. We selected patients who had first undergone CABG operation (the operation code of only one anastomosis vessel is 68023A and 68023B, 68024A and 68024B are 2 vessels, and 68025A and 68025B are 3 diseased vessels). CABG’s initial surgery date was used as the index date to ensure that this study focused on older individuals; patients under 65 years old (n = 3533) were excluded. We also excluded those who had had CABG surgery before the index year (between 2002 and 2007; n = 39), had died in the hospital (n = 434), and those with missing information (n = 5). According to these criteria, a total of 4162 patients undergoing CABG surgery were divided into two groups, dead and alive patients ≥65 years old, between 1 January 2008 and 31 December 2009 (Figure 1).

2.3. Comorbidities and Variable Definitions

In this research, the baseline characteristic variables were sex, Charlson comorbidity index (CCI) score, number of anastomosis vessels, and patient comorbidities (Supplementary Materials) including: hypertension, hyperlipidemia, diabetes mellitus (DM), congestive heart failure (CHF), peripheral vascular disease (PVD), coronary artery disease (CAD), chronic obstructive pulmonary disease (COPD), myocardial infarction (M), chronic kidney disease (CKD), end-stage renal disease (ESRD), and stroke. Blood transfusion (94001C, 94002C, 4013C, 94015C, 94003C), mechanical ventilation (57001B, 57002B, 57003B) in the preoperative one year, and CHA2DS2-VASc score [10,11] were also included. CHA2DS2-VAS was calculated for each research patient using a history of hypertension, diabetes mellitus, congestive heart failure, and vascular disease. Age between 65 and 74 years old, and female gender were 1 point. Two points were assigned for a history of ischemic stroke and transient ischemic attack (ICD-9-CM codes: 433–438; ICD-10-CM: I63.0–9, G45.9) or age ≥ 75 years old.
The date of comorbidities was defined as the date before the index date, which could be traced back to 2002–2007. Primary outcomes were overall survival rate of older adults after the CABG procedure, and cause of death was provided by the NHIRD death registry data. Patients in this study were all followed up from the index date until the date of death or the end of the research (31 December 2018).

2.4. Feature-Selection and Machine-Learning Prediction Models

The hospital must update each patient’s information every day. After long-term accumulation, much medical information is accumulated. We also used the NHIRD to determine key factors that affect the survival of older adults from the first CABG surgery. The medical records contained numerous items. Therefore, before making predictions, features were reduced through feature selection (FS), an essential preprocessing step [12].
However, models have different abilities to predict survival. Some studies used machine methods for an early diagnosis of bipolar disorder, prostate-cancer-specific survival, erectile dysfunction, CKD, and medical cost [13,14,15,16,17]. This research used multiple-stage selection methods to uncover potential collinearity among variable subsets and evaluate the response variable’s predictive performance. After that, we used a fivefold cross-validation process to verify the model of LGR, RF, CART, XGBoost, and MARS (for classification or continuous variables) to compare the predicted performance with all variables and evaluate the classification results after feature selection per classification method [18,19]. The classification model’s performance indicators were mean accuracy, kappa, sensitivity, specificity, and area under the ROC curve (AUC). The evaluation performance of the AUC value was defined by Hosmer et al. [17]: AUC ≥ 0.9, outstanding discrimination; 0.8 ≤ AUC < 0.9, good discrimination; 0.7 ≤ AUC < 0.8, acceptable/fair discrimination; 0.6 ≤ AUC < 0.7, poor discrimination; and AUC < 0.6, no discrimination [13]. The greater the accuracy, sensitivity, specificity, and kappa values are, the better the model is.
In this research, we used five different machine-learning methods to construct predictive models and conducted the best feature selection for evaluating the mortality of the CABG patients.

2.4.1. LGR

Logistic regression is a classical prediction method suitable for predicting general binary classification problems. The central concept of LGR is the natural logarithm of an odds ratio by logit [20]. It is used to analyze the relationship between dependent and independent variables. The predicted variable Y has only two possibilities: yes (1) and no (0).

2.4.2. RF

Random forest (RF) is an ensemble method, and the classifier in the original RF algorithm is a classification and regression tree (CART) that is based on the bagging algorithm and bootstrap aggregation. It randomly selects variables to split when the CART tree grows [21]. The out-of-bag (OOB) error of random forest is the average error of each weak sample using an approximate test error to measure performance [22]. Lastly, each tree was based on node impurity to improve the amplitude of the random forest and find out the importance of variables.

2.4.3. MARS

MARS is a nonparametric statistical method developed by physicist Friedman et al. (1991) [23]. It is flexible regression processing that can automatically create a criterion model and separate linear-regression slopes to process multiple complex data and establish prediction models.
Approximated nonlinearity is adopted using separate linear-regression slopes in different intervals of the independent variable space. For the best MARS model, the first stage uses a forward algorithm to construct many possible basic functions and corresponding knots to initially overfit the data. We used the generalized cross-validation criterion (GCV) to generate the best combination in the second stage [22].
MARS can also use dummy variables to deal with missing values, and it does not need to assume the distribution of demand functions and errors.

2.4.4. CART

Breiman et al. developed the classification and regression-tree algorithm in 1984 [24]. In the process of the CART algorithm, a series of rules are generated through recursion. First, CART builds a maximal tree to divide the two subsets into left and right through binary splits, and calculates the impurity by using the Gini index under each attribute segmentation. Nodes and leaf nodes start from the root during analysis. The smallest Gini index is used to determine segmented attributes and values. Then, the parent node can divide two exclusive children from each node, and iteratively calculate until the whole decision tree stops growing and is constructed [22].

2.4.5. XGBoost

The algorithm applied by XGBoost is a gradient-boosting decision tree (GBDT) that can be used for both classification and regression problems [25]. The greedy method optimizes the maximal gain of the objective function during the construction of each tree layer. The idea of the algorithm is to continuously add trees and perform feature splitting to grow a tree. Each time a tree is added, it learns a new function to fit the residual of the last prediction.
Lastly, multiple learners are added together to make the final prediction, and the accuracy rate is higher than that of a single one. To solve overfitting, XGBoost controls the complexity of the model by using regularization terms, and objective function optimization uses the second derivative of the Taylor expansion loss function to compute pseudoresiduals [22].

2.5. Statistical Analysis

Both cohorts were stratified into two groups (dead and alive) and compared using Pearson’s chi-squared tests for categorical variables. Demographic data at baseline presented numbers and percentages as n (%). Independent sample t-tests assessed continuous variables as means and standard deviations (mean ± SD) to compare the difference. All significance thresholds were associated with 2-tailed p values < 0.05. Data extraction was performed using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA). Variable selection and model establishment was carried out with R statistical software (R studio 3.5.1; http://www.r-project.org (accessed on 12 January 2021)).

3. Results

3.1. Demographic Characteristics of Study Population

The demographic data and comorbidities of the patients who accepted their first CABG surgery are listed in Table 1. We included ≥65 year-old adults who had fulfilled the criteria from 1 January 2008, to 31 December 2009, in the Taiwan NHIRD. The dead group was 2272 (69.98%), and the alive group was 1456 (71.09%). In comparison, male patients had higher mortality than that of female patients.
Statistically significant results were demonstrated for the dead and alive groups. The mean follow-up periods were 4.42 ± 3.14 and 10.05 ± 0.57 years (p < 0.001), respectively, and the other data were as follows, as described in the brackets: CHA2DS score (4.21 ± 1.67 vs. 3.30 ± 1.57, p < 0.001), diabetes (65.01 vs. 50.76, p < 0.001), myocardial infarction (52.02 vs. 38.46, p < 0.001), liver cirrhosis (2.2 vs. 0.69, p < 0.001), peripheral vascular disease (PVD; 23.81 vs. 17.03, p < 0.001), congestive heart failure (CHF; 60.96 vs. 38.67, p < 0.001), intracranial bleeding (2.33 vs. 0.96, p = 0.002), atrial fibrillation (AF; 15.32 vs. 10.92, p < 0.001), transient ischemic attack (TIA; 41.86 vs. 29.12, p ≤ 0.001), chronic kidney disease (CKD; 25.18 vs. 8.86, p ≤ 0.001), acute coronary syndrome (ACS; 65.58 vs. 55.63, p < 0.001), chronic obstructive pulmonary disease (COPD; 45.91 vs. 38.32, p < 0.001), stroke (41.68 vs. 29.05, p < 0.001), cancer (7.22 vs. 4.53, p < 0.001) and CCI scores (3.86 ± 2.40 vs. 2.59 ± 1.93, p < 0.001).
The surgical variables were significantly different in terms of cost (TWD 611,701 ± 488,753 vs. TWD 394,843 ± 165,389, p < 0.001), the average diameter of anastomosis vessels (2.64 ± 0.72 vs. 2.79 ± 0.77, p < 0.0001), the length of stay (25.59 ± 14.77 vs. 18.29 ± 9.15, p < 0.001), blood transfusion (10.89 ± 14.68 vs. 7.23 ± 5.31, p < 0.001), and mechanical ventilation (7.16 ± 13.90 vs. 2.76 ± 3.09, p < 0.001). In addition, variables of 1 year before surgery, such as the mean number of outpatient department visits (37.70 ± 23.34 vs. 32.36 ± 20.13, p < 0.001), emergency department visits (2.55 vs. 0.96, p = 0.0006), hospitalization visits (1.91 ± 1.34 vs. 1.45 ± 0.82, p < 0.0001), the mean bag of blood transfusion (13.34 vs. 4.60, p = 0.0006), the length of mechanical ventilation (11.09 vs. 3.85, p < 0.001), and medical cost (155,186 ± 197,087 vs. 91,439 ± 98,235, p < 0.001), were also statistically significantly different between the dead and alive groups of older adults who had undergone first CABG surgery.

3.2. Results of Feature Selection on CABG

To determine which risk factors could predict survival among older CABG patients, we used different feature-selection methods to determine them. Ranking first was the most important. A total of 72 variables were included in this study, and each variable had its ranking in 5 different methods after filtering (Table 2)—the studied characteristics included surgical, recent 1-year variables, and the patient’s baseline. LGR selected 17 variables. RF selected a total of 11 variables. CART chose nine variables. XGBoost and MARS both selected seven variables. Among those methods, LOS, CHA2DS2 score, and CKD were only selected by CART. CART, XGBoost, and MARS all selected the risk factors of surgical cost, patient’s age, renal disease, and CCI score as essential variables.
Through different variable-selection algorithm methods, we could make predictions with these variable combinations.

3.3. Performance of Different Prediction Models

Lastly, we used the results of different feature-selection methods and nonfeature selection to produce five different prediction models: LGR, RF, CART, MARS, and XGBoost. In order to predict survival, the ability of each model was an independent validation dataset. The results showed that, without variable selection (72 variables), the predictive ability of XGBoost was the best (accuracy: 0.7225) among the five models (as shown in Table 3). LGR, RF, and CART individually used 17,119 variables. XGBoost had the best predictive ability (accuracy: 0.7131) and only required seven variables. The best forecasting ability among these five methods was logistic regression (accuracy: 0.7184). We also added three risk factors to the variable selections of XGBoost and MARS—CHA2DS score, acute pancreatitis, and AKF—for further predictive analysis. Adding these three variables can improve the ability of prediction models. Overall, the feature-selection method opted for XGBoost, with surgical cost, CCI scores, age, renal disease, diabetes, CHF, ulcer disease, and three risk factors (AKF, acute pancreatitis, and CHA2DS2-VAS score). The average accuracy for MARS was 0.7225; MARS was ranked as the best and only needed ten variables.

4. Discussion

This population-based cohort study was based on NHIRD, which is the largest observational database from Taiwan. The strengths of using NHIRD are as follows: (1) it included various individual medical information; (2) each patient could be tracked for a long-term follow-up; (3) it could show current diagnostic and therapeutic modes in the real world. The purpose of the research was to find the risk factors that could predict survival rates with different combinations of feature-selection methods and prediction models. We evaluated the survival to discharge and risks factors of older adults after the first CABG from 2008 to 2009 and followed up to 10 years. Our study showed that, without variable selection, XGBoost had the best predictive ability. By selecting XGBoost and adding the CHA2DS score, acute pancreatitis, and acute kidney failure for further predictive analysis, MARS had the best prediction performance and only needed 10 variables.
Previously, most studies focused on chronic or vascular diseases that had been acquired before the CABG surgery [26]. No known study investigated using preoperative and perioperative variables as predictor factors for long-term survival probability. A previous history of DM and CKD is a decisive risk factor for cardiovascular diseases, such as CAD and CHF. In part, most are contributed from aging [5,27,28], MI, AF, chronic renal failure, abnormal renal function, and renal failure have higher mortality after CABG [6,26,29,30,31]. Liu et al. found that ≥65 age, the female sex, diabetes, congenital heart disease, hypertension on Levels 2 and 3, and using private insurance contributed to a higher risk of readmission [1]. The score of CHA2DS2-VASc was employed as a risk-measurement tool; it was recorded in treatment guidelines for stroke prevention and is a factor for predicting stroke. Tian et al. suggest that CHA2DS2-VASc score should be on the clinical application [10]. This study demonstrated two significant findings: first, preoperative 1-year and perioperative variables are significant predictors. Second, after applying machine-learning variable screening and prediction methods, it is clearer to identify which variables could affect survival. Furthermore, we could also use fewer factors to achieve good predictive ability. Our study’s limitations are the lack of clinical lab data, such as family history, and detailed health-check values.

5. Conclusions

On the basis of our research, we developed multiple-stage frameworks to build a survival model for predicting the mortality of older adults who had undergone their first CABG. The advantages of this study are that it is innovative and practical in clinical research. Furthermore, we could achieve better prediction with only 10 variables. This could help clinicians make decisions more quickly and encourage patients towards earlier healthcare management.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/healthcare9050547/s1, ICD-9-CM and ICD-10-CM codes used for diagnosis in this study.

Author Contributions

Conceptualization, T.-S.L., S.-J.L. and M.C.; data curation, Y.-C.H.; formal analysis, Y.-C.H.; methodology, Y.-C.H. and M.C.; project administration, T.-S.L., S.-J.L. and M.C.; software, Y.-C.H.; supervision, T.-S.L.; validation, T.-S.L., S.-J.L. and M.C.; writing—original draft, Y.-C.H.; writing—review and editing, Y.-C.H., S.-J.L. and Y.-N.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of Fu Jen Catholic University (protocol code C108121; date of approval, 5 March 2020).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data presented in this study are not available on request from the corresponding author. Due to the General Data Protection Regulation, the data presented in this research are not publicly available.

Acknowledgments

The authors would like to thank the editor and the reviewers for their valuable comments. The authors sincerely appreciate NHIRD, which was provided by the Ministry of Health and Welfare.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Liu, G.; Zhang, Y.; Zhang, W.; Hu, L.; Lv, T.; Cheng, H.; Hu, Y.; Huang, J. Risk Prediction Model of Readmission after Coronary Artery Bypass Grafting (CABG) in China. Res. Sq. 2020. [Google Scholar] [CrossRef]
  2. Malmberg, M.; Gunn, J.; Rautava, P.; Sipilä, J.; Kytö, V. Outcome of Acute Myocardial Infarction Versus Stable Coronary Artery Disease Patients Treated with Coronary Bypass Surgery. Ann. Med. 2021, 53, 70–77. [Google Scholar] [CrossRef] [PubMed]
  3. Chang, Y.-C.; Chiang, J.-H.; Lay, I.-S.; Lee, Y.-C. Increased Risk of Coronary Artery Disease in People with a Previous Diagnosis of Carpal Tunnel Syndrome: A Nationwide Retrospective Population-Based Case-Control Study. BioMed Res. Int. 2019, 2019, 1–8. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Lee, T.-S.; Li, S.-J.; Jiang, Y.; Shia, B.-C.; Chen, M. Cost Analysis of Coronary Artery Bypass Grafting Surgery under Single-Payer Reimbursement in Taiwan. Int. J. Appl. Sci. Eng. 2020, 17, 419–428. [Google Scholar] [CrossRef]
  5. Chen, S.-W.; Chang, C.-H.; Lin, Y.-S.; Wu, V.C.-C.; Chen, D.-Y.; Tsai, F.-C.; Hung, M.-J.; Chu, P.-H.; Lin, P.-J.; Chen, T.-H. Effect of Dialysis Dependence and Duration on Post-Coronary Artery Bypass Grafting Outcomes in Patients with Chronic Kidney Disease: A Nationwide Cohort Study in Asia. Int. J. Cardiol. 2016, 223, 65–71. [Google Scholar] [CrossRef] [PubMed]
  6. Chou, C.-L.; Hsieh, T.-C.; Wang, C.-H.; Hung, T.-H.; Lai, Y.-H.; Chen, Y.-Y.; Lin, Y.-L.; Kuo, C.-H.; Wu, Y.-J.; Fang, T.-C. Long-term Outcomes of Dialysis Patients After Coronary Revascularization: A Population-based Cohort Study in Taiwan. Arch. Med. Res. 2014, 45, 188–194. [Google Scholar] [CrossRef]
  7. Milojevic, M.; Head, S.J.; Parasca, C.A.; Serruys, P.W.; Mohr, F.W.; Morice, M.-C.; Mack, M.J.; Ståhle, E.; Feldman, T.E.; Dawkins, K.D.; et al. Causes of Death Following PCI Versus CABG in Complex CAD. J. Am. Coll. Cardiol. 2016, 67, 42–55. [Google Scholar] [CrossRef] [Green Version]
  8. Zhang, Z.; Kolm, P.; Grau-Sepulveda, M.V.; Ponirakis, A.; O’Brien, S.M.; Klein, L.W.; Shaw, R.E.; McKay, C.; Shahian, D.M.; Grover, F.L.; et al. Cost-Effectiveness of Revascularization Strategies. J. Am. Coll. Cardiol. 2015, 65, 1–11. [Google Scholar] [CrossRef] [Green Version]
  9. Kuo, C.-S.; Lu, C.-W.; Chang, Y.-K.; Yang, K.-C.; Hung, S.-H.; Yang, M.-C.; Chang, H.-H.; Huang, C.-T.; Hsu, C.-C.; Huang, K.-C. Effectiveness of 23-Valent Pneumococcal Polysaccharide Vaccine on Diabetic Elderly. Medicine 2016, 95, e4064. [Google Scholar] [CrossRef]
  10. Tian, Y.; Yang, C.; Liu, H. CHA2DS2-VASc Score as Predictor of Ischemic Stroke in Patients Undergoing Coronary Artery Bypass Grafting and Percutaneous Coronary Intervention. Sci. Rep. 2017, 7, 1–7. [Google Scholar] [CrossRef]
  11. Yin, L.; Ling, X.; Zhang, Y.; Shen, H.; Min, J.; Xi, W.; Wang, J.; Wang, Z. CHADS2 and CHA2DS2-VASc Scoring Systems for Predicting Atrial Fibrillation following Cardiac Valve Surgery. PLoS ONE 2015, 10, e0123858. [Google Scholar] [CrossRef] [Green Version]
  12. Nguyen, H.T.; Petrović, S.; Franke, K. A Comparison of Feature-Selection Methods For intrusion Detection. In Proceedings of the International Conference on Mathematical Methods, Models, and Architectures for Computer Network Security, St. Petersburg, Russia, 8–10 September 2010; pp. 242–255. [Google Scholar]
  13. Hu, Y.-H.; Chen, K.; Chang, I.-C.; Shen, C.-C. Critical Predictors for the Early Detection of Conversion from Unipolar Major Depressive Disorder to Bipolar Disorder: Nationwide Population-Based Retrospective Cohort Study. JMIR Med. Inform. 2020, 8, e14278. [Google Scholar] [CrossRef]
  14. Lin, Y.-T.; Lee, M.T.-S.; Huang, Y.-C.; Liu, C.-K.; Li, Y.-T.; Chen, M. Prediction of Recurrence-Associated Death from Localized Prostate Cancer with a Charlson Comorbidity Index–Reinforced Machine Learning Model. Open Med. 2019, 14, 593–606. [Google Scholar] [CrossRef]
  15. Chen, Y.-F.; Lin, C.-S.; Hong, C.-F.; Lee, D.-J.; Sun, C.; Lin, H.-H. Design of a Clinical Decision Support System for Predicting Erectile Dysfunction in Men Using NHIRD Dataset. IEEE J. Biomed. Health Inform. 2018, 23, 2127–2137. [Google Scholar] [CrossRef]
  16. Krishnamurthy, S.; Kapeleshh, K.S.; Dovgan, E.; Luštrek, M.; Gradišek Piletič, B.; Srinivasan, K.; Li, Y.-C.; Gradišek, A.; Syed-Abdul, S. Machine Learning Prediction Models for Chronic Kidney Disease using National Health Insurance Claim Data in Taiwan. medRxiv 2020. [Google Scholar] [CrossRef]
  17. Hosmer, J.D.W.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  18. Almustafa, K.M. Prediction of Heart Disease and Classifiers’ Sensitivity Analysis. BMC Bioinform. 2020, 21, 1–18. [Google Scholar] [CrossRef]
  19. Austin, P.C.; Ghali, W.A.; Tu, J.V. A Comparison of Several Regression Models for Analysing Cost of CABG Surgery. Stat. Med. 2003, 22, 2799–2815. [Google Scholar] [CrossRef]
  20. Peng, C.-Y.J.; Lee, K.L.; Ingersoll, G.M. An Introduction to Logistic Regression Analysis and Reporting. J. Educ. Res. 2002, 96, 3–14. [Google Scholar] [CrossRef]
  21. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  22. Wu, T.-E.; Chen, H.-A.; Jhou, M.-J.; Chen, Y.-N.; Chang, T.-J.; Lu, C.-J. Evaluating the Effect of Topical Atropine Use for Myopia Control on Intraocular Pressure by Using Machine Learning. J. Clin. Med. 2020, 10, 111. [Google Scholar] [CrossRef]
  23. Friedman, J.H. Multivariate Adaptive Regression Splines. Ann. Stat. 1991, 19, 1–67. [Google Scholar] [CrossRef]
  24. Breiman, L.F.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Chapman and Hall: Pacific Grove, CA, USA, 1984. [Google Scholar]
  25. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA,, 13–17 August 2016; pp. 785–794. [Google Scholar]
  26. Carr, B.M.; Romeiser, J.; Ruan, J.; Gupta, S.; Seifert, F.C.; Zhu, W.; Shroyer, A.L. Long-Term Post-CABG Survival: Performance of Clinical Risk Models Versus Actuarial Predictions. J. Card. Surg. 2015, 31, 23–30. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Feng, W.-H.; Chu, C.-Y.; Hsu, P.-C.; Lee, W.-H.; Su, H.-M.; Lin, T.-H.; Yen, H.-W.; Voon, W.-C.; Lai, W.-T.; Sheu, S.-H. The Effects of Secondary Prevention after Coronary Revascularization in Taiwan. PLoS ONE 2019, 14, e0215811. [Google Scholar] [CrossRef]
  28. Raza, S.; Sabik, J.F.; Ainkaran, P.; Blackstone, E.H. Coronary Artery Bypass Grafting in Diabetics: A Growing Health Care Cost Crisis. J. Thorac. Cardiovasc. Surg. 2015, 150, 304–312. [Google Scholar] [CrossRef] [Green Version]
  29. Liao, K.-M.; Kuo, L.-T.; Lu, H.-Y. Hospital Costs and Prognosis in End-Stage Renal Disease Patients Receiving Coronary Artery Bypass Grafting. BMC Nephrol. 2020, 21, 1–9. [Google Scholar] [CrossRef]
  30. Fengsrud, E.; Englund, A.; Ahlsson, A. Pre- and Postoperative Atrial Fibrillation in CABG Patients have Similar Prognostic Impact. Scand. Cardiovasc. J. 2016, 51, 21–27. [Google Scholar] [CrossRef] [Green Version]
  31. Pollock, B.D.; Filardo, G.; Da Graca, B.; Phan, T.K.; Ailawadi, G.; Thourani, V.; Damiano, J.R.J.; Edgerton, J.R. Predicting New-Onset Post-Coronary Artery Bypass Graft Atrial Fibrillation with Existing Risk Scores. Ann. Thorac. Surg. 2018, 105, 115–121. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Patient selection and further analysis of 3728 older adult patients who had undergone first-time coronary artery bypass surgery grafting (CABG) between 2008 and 2009.
Figure 1. Patient selection and further analysis of 3728 older adult patients who had undergone first-time coronary artery bypass surgery grafting (CABG) between 2008 and 2009.
Healthcare 09 00547 g001
Table 1. Demographic features of older CABG adults in Taiwan from 2008 to 2009.
Table 1. Demographic features of older CABG adults in Taiwan from 2008 to 2009.
Variables≥65 Dead
(n = 2272)
≥65 Alive
(n = 1456)
p-Value
n%n%
SexFemale68230.0242128.910.471
Male159069.98103571.09
Age, mean (SD), y74.30 (5.60)71.27 (4.78)<0.001
Follow up years, Mean (SD)4.42(3.14)10.05 (0.57)<0.001
Follow up years, Median4.2210.02-
CHA2DS score, mean (SD)4.21 (1.67)3.30 (1.57)<0.001
Comorbidities
DM147765.0173950.76<0.0001
Hypertension62427.4637926.030.335
Hyperlipidemia152266.99105672.53<0.001
MI118252.0256038.46<0.001
Liver cirrhosis502.2100.69<0.001
CHF138560.9656338.67<0.001
CAD222297.8143598.560.098
PVD54123.8124817.03<0.0001
Acute pancreatitis431.89211.440.301
Malignant dysrhythmia1044.58583.980.385
Intracranial bleeding532.33140.960.002
AF34815.3215910.92<0.001
TIA95141.8642429.12<0.0001
CKD57225.181298.86<0.0001
ACS149065.5881055.63<0.0001
COPD104345.9155838.32<0.0001
Stroke94741.6842329.05<0.0001
Cancer1647.22664.53<0.001
CCIS scores0753.31399.55<0.0001
126911.8433022.66
238316.8636224.86
342418.6623916.41
434115.0116511.33
527512.11157.9
6+50522.231067.28
Mean (SD)3.86 (2.40)2.59 (1.93)<0.0001
Surgical Variables
Anastomosis vessels, mean (SD)2.64 (0.72)2.79 (0.77)<0.001
Length of stay (LOS), mean (SD)25.59 (14.77)18.29 (9.15)<0.001
Blood transfusion, (Bag), mean (SD)10.89 (14.68)7.23 (5.31)<0.001
Mechanical ventilation, (Day), mean (SD)7.16 (13.90)2.76 (3.09)<0.001
Surgical cost611,701 (488,753)394,843 (165,389)<0.001
One Year Before Surgery
Outpatient visits, mean (SD)37.70 (23.34)32.36 (20.13)<0.001
Hospitalization, mean (SD)1.91 (1.34)1.45 (0.82)<0.001
ED visits, mean (SD)582.55140.96<0.001
Blood transfusion, (Bag), mean (SD)3.83 (3.69)4.09 (4.87)0.636
Mechanical ventilation, (Day), mean (SD)5.55 (13.48)3.93 (4.05)0.373
Medical cost (related cardiology department), mean (SD) (thousand NT$)81,957 (107,098)60,969 (80,674)<0.0001
Medical cost (thousand NT$)155,186 (197087)91,439 (98,235)<0.0001
CCIS = Charlson comorbidity index score; SD: standard deviation; ED: Emergency departmen; MI: Myocardial infarct; CHF: Congestive heart failure; CAD: Coronary artery disease; PVD: Peripheral vascular disease; AF: Atrial fibrillation; TIA: Transient ischemic attack; CKD: Chronic kidney disease; ACS: Acute coronary syndrome; COPD: Chronic obstructive pulmonary disease ; AKF: Acute kidney failure ; DM: Diabetes mellitus.
Table 2. Ranking of essential variables of older CABG adults.
Table 2. Ranking of essential variables of older CABG adults.
VariablesLGR
(17 Variables)
RF
(11 Variables)
CART
(9 Variables)
MARS
(7 Variables)
XGBoost
(7 Variables)
Surgical Variables
Blood transfusion, (Bag), mean1
Length of stay (LOS), mean 4
Surgical cost 311
One Year Before Surgery
ED visits, mean46
Outpatient visits, mean15
Hospitalization, mean 3
Mechanical ventilation, (Day), mean167 7
Blood transfusion, (Bag), mean 1
Medical cost 8 6
Baseline
Age 11532
CHF74 65
CKD 7
ACS12
CAD2
CCI score 923
COPD11
PVD14
Diabetes mellitus 5 5
Renal disease 144
Major illness8
Ischemic stroke3
CHA2DS2 scores 2
Ulcer disease17 7
Hypertension6
Hyperlipidemia 2
AKF13
Acute pancreatitis10
Connective tissue disease98
Moderate or severe renal disease596
Moderate or severe liver disease 10
Table 3. Performance evaluation of prediction models on nonselection and after feature selection.
Table 3. Performance evaluation of prediction models on nonselection and after feature selection.
MethodAccuracyKappaSensitivitySpecificityAUC
Overall
(72 variables)
LGR0.71980.44270.67110.79390.7926
RF0.70770.39650.73550.66550.7784
MARS0.71040.42940.64440.81080.7890
CART0.69300.33600.81110.51350.7031
XGBoost0.72250.43940.70440.75000.7934
LGR selection
(17 variables)
LGR0.61790.27520.48880.81410.6981
RF0.62600.28290.51770.79050.6912
MARS0.62190.27710.50880.79390.6917
CART0.59110.22920.45330.80060.6576
XGBoost0.62460.28450.50440.80740.6977
RF selection
(11 variables)
LGR0.68760.39600.58660.84120.7784
RF0.69160.39370.62440.79390.7637
MARS0.68900.38170.64440.75670.7675
CART0.69300.33600.81110.51350.7031
XGBoost0.69830.41610.59770.85130.7790
CART selection
(9 variables)
LGR0.70910.40090.73110.67560.7624
RF0.65540.34640.52000.86140.7557
MARS0.70910.39540.74880.64860.7653
CART0.69300.33600.81110.51350.7031
XGBoost0.71310.40620.74440.66550.7652
MARS selection
(7 variables)
LGR0.68760.39600.58660.84120.7784
RF0.69160.39370.62440.79390.7637
MARS0.68900.38170.64440.75670.7675
CART0.69300.33600.81110.51350.7031
XGBoost0.69830.41610.59770.85130.7790
XGBoost
selection
(7 variables)
LGR0.71840.41860.74440.67900.7739
RF0.69030.38000.66000.73640.7453
MARS0.71310.40960.73330.68240.7683
CART0.69300.33600.81110.51350.7031
XGBoost0.71040.42120.67330.76680.7763
XGBoost
selection
and 3 risk factors
(10 variables)
LGR0.68900.39370.60440.81750.7807
RF0.70370.40080.69110.72290.7727
MARS0.72250.42330.76000.66650.7831
CART0.69300.33600.81110.51350.7031
XGBoost0.69700.40690.62000.81410.7845
MARS selection
and 3 risk factors
(10 variables)
LGR0.69160.39640.61550.80740.7780
RF0.68360.38060.60880.79720.7629
MARS0.70240.39980.68440.72970.7722
CART0.69300.33600.81110.51350.7031
XGBoost0.70770.41900.66000.78040.7806
Abbreviations: LGR: logistic regression; RF: random forest; CART: classification and regression tree; MARS: multivariate adaptive regression splines; AUC: area under the curve; XGBoost: extreme gradient boosting.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Huang, Y.-C.; Li, S.-J.; Chen, M.; Lee, T.-S.; Chien, Y.-N. Machine-Learning Techniques for Feature Selection and Prediction of Mortality in Elderly CABG Patients. Healthcare 2021, 9, 547. https://doi.org/10.3390/healthcare9050547

AMA Style

Huang Y-C, Li S-J, Chen M, Lee T-S, Chien Y-N. Machine-Learning Techniques for Feature Selection and Prediction of Mortality in Elderly CABG Patients. Healthcare. 2021; 9(5):547. https://doi.org/10.3390/healthcare9050547

Chicago/Turabian Style

Huang, Yen-Chun, Shao-Jung Li, Mingchih Chen, Tian-Shyug Lee, and Yu-Ning Chien. 2021. "Machine-Learning Techniques for Feature Selection and Prediction of Mortality in Elderly CABG Patients" Healthcare 9, no. 5: 547. https://doi.org/10.3390/healthcare9050547

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop