Next Article in Journal
Systemic Immune-Inflammation Index and Mortality in Testicular Cancer: A Systematic Review and Meta-Analysis
Next Article in Special Issue
The Pathophysiology of Collateral Circulation in Acute Ischemic Stroke
Previous Article in Journal
Current Perspectives on Periodontitis in Systemic Sclerosis: Associative Relationships, Pathogenic Links, and Best Practices
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

XGBoost-Based Simple Three-Item Model Accurately Predicts Outcomes of Acute Ischemic Stroke

1
Department of Neurology, Taipei Medical University—Shuang Ho Hospital, New Taipei City 235, Taiwan
2
Department of Neurology, School of Medicine, College of Medicine, Taipei Medical University, Taipei City 110, Taiwan
3
Taipei Neuroscience Institute, Taipei Medical University—Shuang Ho Hospital, New Taipei City 235, Taiwan
4
Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei City 110, Taiwan
5
Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei City 110, Taiwan
6
Smart Healthcare Interdisciplinary College, National Taipei University of Nursing and Health Sciences, Taipei City 112, Taiwan
7
Department of Health Care Management, College of Health Technology, National Taipei University of Nursing and Health Sciences, Taipei City 112, Taiwan
8
Department of Education and Research, Taipei City Hospital, Taipei City 103, Taiwan
*
Author to whom correspondence should be addressed.
Diagnostics 2023, 13(5), 842; https://doi.org/10.3390/diagnostics13050842
Submission received: 5 February 2023 / Revised: 19 February 2023 / Accepted: 21 February 2023 / Published: 22 February 2023
(This article belongs to the Special Issue Risk Factors for Acute Ischemic Stroke)

Abstract

:
An all-inclusive and accurate prediction of outcomes for patients with acute ischemic stroke (AIS) is crucial for clinical decision-making. This study developed extreme gradient boosting (XGBoost)-based models using three simple factors—age, fasting glucose, and National Institutes of Health Stroke Scale (NIHSS) scores—to predict the three-month functional outcomes after AIS. We retrieved the medical records of 1848 patients diagnosed with AIS and managed at a single medical center between 2016 and 2020. We developed and validated the predictions and ranked the importance of each variable. The XGBoost model achieved notable performance, with an area under the curve of 0.8595. As predicted by the model, the patients with initial NIHSS score > 5, aged over 64 years, and fasting blood glucose > 86 mg/dL were associated with unfavorable prognoses. For patients receiving endovascular therapy, fasting glucose was the most important predictor. The NIHSS score at admission was the most significant predictor for those who received other treatments. Our proposed XGBoost model showed a reliable predictive power of AIS outcomes using readily available and simple predictors and also demonstrated the validity of the model for application in patients receiving different AIS treatments, providing clinical evidence for future optimization of AIS treatment strategies.

1. Introduction

Stroke is the second leading cause of death and the third leading cause of disability worldwide [1,2]. The global burden of stroke has increased substantially over the past 30 years owing to population growth, aging, and exposure to risk factors such as high plasma levels of fasting glucose. Among all stroke types, ischemic strokes account for the largest proportion of all new strokes. In 2019, there were an estimated 7.6 million new ischemic strokes globally, and the economic burden of stroke is estimated to be more than US$ 721 billion [1]. In addition to the high incidence of stroke occurrence, recurrence, and death, ischemic stroke may cause various functional impairments, cognitive impairment, or post-stroke depression, contributing to the high prevalence of disability in stroke survivors [1,2,3,4,5]. In the report by Yao et al. [4], two-thirds of patients with ischemic stroke had sequelae symptoms, and nearly one-fourth had varying degrees of disability. Consequently, stroke has a wide range of negative physical and economic impacts on patients and their families. Metabolic factors such as high blood pressure, high blood sugar, and high cholesterol are strongly linked to ischemic stroke. These potentially modifiable risks are associated with outcomes after the episode of ischemic stroke [1,2,3].
Various scoring systems have been developed to predict the prognosis and treatment outcomes after acute ischemic stroke (AIS) [6]. However, the clinical application of these models is often limited due to the complexity of the scoring methods and their mild-to-moderate prediction accuracy [6].
Machine learning methods provide a means to process large amounts of data and enhance data interpretation [7,8,9,10]. In recent years, exponential advances in computing power and improvements in algorithms have led to the successful application of machine learning algorithms in developing clinical prediction and decision systems for AIS [9,10,11,12,13,14,15,16]. Gradient tree boosting is a widely-used technique among machine-learning algorithms which has been proven to provide state-of-the-art performance in classification in clinical practice [17].
Extreme gradient boosting (XGBoost) is a tree-boosting ensemble algorithm that gradually establishes the losses produced by weak learners based on a decision tree [18,19]. Having a higher predictive performance than other algorithms, the XGBoost algorithm has provided model interpretation in many fields, including medical and environmental research [18,19,20,21]. Previously, Patel et al. used XGBoost to predict the mortality of patients in the intensive care unit using a multi-center database [18]. Additionally, Zhang et al. indicated that the predictive performance of XGBoost was better than that of logistic regression (LR) in distinguishing between patients with and without a response to fluid intake in urine output [20].
In stroke-related studies, XGBoost performed better at predicting 30-day readmissions than other machine learning algorithms [22]. When used to predict 90-day readmissions for patients with AIS, XGBoost outperformed the LR model [23]. In assessing patient outcomes following stroke, XGBoost showed the best performance among all classification models, achieving an accuracy of 82% based on ten activities of daily living tasks, measured using wearable technologies [24].
Although machine learning approaches have expanded the field of diagnostic and predictive tool design, many models require embedding of overly complex variables, making data collection and organization complicated, thus potentially limiting their clinical applicability [9,10,13,25]. A previous study applied machine learning models for ischemic stroke using data from 35,798 patients with 17 predictive features [13]; in addition, the machine learning-based models developed by Heo et al. used 38 variables to predict AIS outcomes [10]. Both studies also included different predictors, such as clinical variables and brain images, but without a universal modality [9,10,13,16,25].
Although challenging, an all-inclusive predictive model that utilizes simple, easily accessible predictors for patients with AIS with different clinical characteristics and receiving different treatments will be crucial for clinical decision-making and treatment strategies. In previously proposed prediction models, factors, such as age, fasting glucose, and stroke severity (as represented by NIHSS scores), served as general, and often overlapping, independent predictors of AIS functional outcomes [6,26,27]. The current study hypothesized that age, fasting glucose, and NIHSS score could serve as simple, generic, and readily available factors and provide sufficient predictive power to predict outcomes after stroke. Thus, the study aimed to develop predictive models using these most general predictors, utilizing the advantages of machine learning algorithms in analyzing large amounts of nonlinear data.
The present study (i) provided accurate outcome predictions for AIS by applying XGBoost-based models using three simple parameters: age, fasting glucose, and NIHSS score, and (ii) explored the clinical validity of the current model in patients with AIS who received different treatments, including intravenous thrombolysis (IVT), endovascular therapy (EVT), and non-thrombolytic treatments.

2. Materials and Methods

2.1. Participants

In the present study, we retrospectively reviewed the medical records of patients diagnosed and managed at Shuang-Ho Hospital between January 2016 and March 2020. Patient records were retrieved from the Taiwan Stroke Registry, a multi-center database that collected clinical data from patients with AIS across major hospitals and health facilities in Taiwan [28,29].
The inclusion criteria for this study were patients aged ≥18 years who were admitted to the hospital for the diagnosis and treatment of AIS. AIS was defined as the acute onset of neurological deficits with signs or symptoms and presenting to the hospital within 10 days of onset. In all patients, either a non-contrast computed tomography (CT) scan of the head or magnetic resonance imaging (MRI) of the brain was performed at the time of admission. The exclusion criteria were as follows: patients presenting with acute intracranial hemorrhage or subarachnoid hemorrhage on admission, lack of fasting glucose or NIHSS records on admission, no modified Rankin Scale (mRS) score recorded at three months, and patients transferred to other hospitals. These were excluded to ensure availability and consistency of records.
Demographic data were collected at admission, including age, sex, NIHSS score, and treatment for AIS (i.e., IVT, EVT, and non-thrombolysis). IVT and EVT treatments followed the American Heart Association/American Stroke Association guidelines [30]. The patients’ vascular risks and comorbidities, including hypertension, diabetes mellitus, hyperlipidemia, atrial fibrillation, previous stroke or transient ischemic attack (TIA), and ischemic heart disease, were also documented.
Fasting glucose levels were measured within 72 h of admission. The mRS scores were used to assess functional outcomes after AIS at three months. NIHSS scores at admission and mRS scores at 3 months were assessed by a certified stroke specialist. For patients treated with IVT and EVT, head CT/MRI was repeated within 24 h to rule out treatment-related intracranial hemorrhage [30]. Two independent neurologists and a radiologist interpreted all the CT/MRI findings. A favorable outcome was defined as an mRS score of 0 or 1, whereas an unfavorable outcome was defined as an mRS score ≥2 at the 3-month follow-up [31,32].

2.2. Statistical Analyses

All statistical analyses were performed using the SAS 9.4 software (SAS Institute Inc., Cary, NC, USA). The variables were summarized using descriptive statistics. All continuous variables were expressed as mean and standard deviation. Fisher’s exact test was used to determine associations between the two categorical variables in patients with favorable and unfavorable outcomes. Student’s t-test was used to determine the statistical significance of differences between the means of two continuous variables. We used receiver operating characteristic (ROC) curves to assess model performance and calculated each predictive model’s accuracy, sensitivity, and specificity, as well as the area under the ROC curve (AUC). Statistical significance was set at a two-tailed p-value of < 0.05.

2.3. Application of Extreme Gradient Boosting (XGBoost) Modeling

All XGBoost models were developed using the “caret” and “xgboost” packages in RStudio software (version 1.3.1073; 2009–2020 RStudio, PBC). XGBoost algorithm was used to train and predict 3-month unfavorable outcome.
P r e d i c t e d   o u t c o m e = y ^ i = k = 1 K f k X i ,   f k F
In the process of training prediction model, K trees were generated. In XGBoost, the set of functions used in the model was minimized by the following regularized objective:
L ( ϕ ) = i l ( ( y ^ i ) ,   y i ) + k Ω ( f k )
where Ω ( f k ) = γ T   +   1 2   λ | | ω | | 2 , l is a differentiable convex loss function measuring the predicted outcome y ^ i , and the real outcome yi. Considering that the tree with more depth smoothed the final learned weights, the prediction model was penalized by the regularization term Ω , used to avoid overfitting.
XGBoost did not generate all trees at one time. It began from the 0-th tree. It split each node in the tree; the original node continued splitting to right and left nodes. After a new node splits, it was essential to check whether the new split would give Gain to the loss function. The variable and its point with maximum value of Gain were selected.
The final information, Gain, of the objective function after each split was:
Gain = O b j L + R O b j L + O b j R = 1 2   i     I L   gi 2   i     I L   hi + λ +   i     I R   gi 2   i     I R   hi + λ   i     I   gi 2   i     I   hi + λ γ
where, L and R are the subsets of left and right nodes after splitting from the instance set I. γ is a splitting threshold for suppressing the growth of the tree [17,33]. It was an important metric, used to rank the importance of variables in this study.

2.4. Ten-Fold Cross-Validation

Ten-fold cross-validation was applied in this study. Library caret was used for cross-validation and to generate the measures. The dataset was randomly divided into ten subsets, with nine datasets used as training data to develop the prediction model, and the remaining subset used to validate the model. Furthermore, optimal tuning hyperparameters were selected using ten-fold cross-validation of the training data. The hyperparameters assigned for optimization in the final prediction model were as follows: “nrounds,” “max_depth,” “eta,” “gamma,” “colsample_bytree,” “min_child_weight,” and “subsample.”

2.5. Ranking the Importance of Variables

The metric Gain was used to evaluate the importance of each variable generated by the binary prediction model based on XGBoost. Gain is the measure that enhances accuracy by a specific variable to the branch, as classified errors can be decreased by adding a particular variable, and the two new branches become more accurate. A higher percentage indicated that the variable was more important than the others [34].

2.6. Partial Dependence Plots (PDPs)

Visualization is a reasonable interpretation method for black-box machine learning algorithms, illustrating the relationship between the outcomes and variables selected in the prediction model [35]. PDPs represent one such method and were used to depict the change in the outcome of interest in the prediction model (the marginal effect) while the specific variable changed [36]. Package “pdp” generated PDPs for each variable in the R environment.

3. Results

We used the packages in the R environment. Ten-fold cross-validation was the validated method in the package “caret”, which could then be used to find the optimal hyperparameters for the prediction model in terms of AUC. The optimal hyperparameters were decided during the ten-fold cross-validation and incorporated while building the prediction model based on the XGBoost algorithm. We used the Gain indicator to rank the importance of the variables in different treatments. To find the relationship between 3-month unfavorable outcomes and each variable, partial dependence plots were used to depict the propensity in all variables.

3.1. The Structure of Study

The analytical flowchart of this study is shown in Figure 1. Data from 2384 participants were collected. After the completion of screening data, the records of 1848 patients were analyzed. There were several hyperparameters that needed be controlled. To achieve the best predictive performance, hyperparameters had to be optimal. Ten-fold cross-validation was used to find the optimal hyperparameters and improve the prediction model. Finally, the ranking importance of variables and PDPs were generated by the final prediction model to uncover the relationship between 3-month unfavorable outcomes and variables.

3.2. Demographical and Baseline Characteristics of the AIS Cohort

The cohort included 706 females and 1142 males, with a mean age of 68.29 (±13.66) years and a mean baseline NIHSS score on admission of 7.58 (±7.85) (Table 1). On admission, 213 patients (11.53%) underwent IVT, with or without EVT. Among these, 180 patients (84.51%) who received IVT only were termed the IVT group. Among the entire cohort, 100 patients (5.41%) who received EVT were designated as the EVT group, including 33 patients who received EVT following IVT and 67 patients who received EVT only. Three months after AIS, 52.38% of the patients had an unfavorable outcome. Patients with unfavorable outcomes were significantly older, had higher NIHSS scores, and had higher fasting glucose levels than those with favorable outcomes. Furthermore, in vascular risk factors, Fisher’s exact test also showed that diabetes mellitus, hyperlipidemia, atrial fibrillation, previous stroke or TIA, and ischemic heart disease were associated with 3-month outcomes, as was the gender of patients (Table 1).

3.3. Model Performance

In the current study, we incorporated fasting glucose levels, age, and NIHSS as inputs to predict patients’ potential unfavorable outcomes three months post-stroke using the XGBoost algorithm.
The mean and standard deviation for accuracy, sensitivity, specificity, and AUC of the predictive model, based on XGBoost with ten-fold cross-validation, are shown in Tables S1–S4. The accuracy, sensitivity, and specificity were 0.7654, 0.7886, and 0.7644, respectively, with an AUC of 0.8595. Figure 2 shows the ROC curve of the validation sets during ten-fold cross-validation. The AUC values of the model, applied in patients who received non-thrombolytic, IVT, and EVT treatments, were 0.8615, 0.8423, and 0.7733, respectively.
The optimal tuning hyperparameters were also provided via ten-fold cross-validation. The hyperparameters were selected to train the final model, as follows: “Nround = 50”, “max_depth = 1”, “eta = 0.3”, “gamma = 0”, “colsample_bytree = 0.8”, “min_child_weight = 1”, and “subsample = 0.5”.
Figure 3 illustrates the importance of predictors for the entire cohort and patients receiving different treatments. The importance was measured by Gain, which was generated from the package “xgboost” from the ranking based on Gain. In both the whole cohort and in patients who received IVT or non-thrombolytic treatment, the NIHSS score was the most crucial variable for predicting AIS outcome. In the EVT group, the most important predictor was fasting glucose level, with the NIHSS score being less important than age.
Figure 4 depicts the partial dependence plot of the models, showing the relationship between unfavorable outcomes and each variable. Generally, the three predictors in this model offered a positive nonlinear effect on unfavorable outcomes in AIS. The NIHSS score significantly impacted outcomes: the higher the NIHSS score, the worse the prognosis was at three months after the stroke. As predicted by the XGBoost model, when the initial NIHSS score was >5, patients were likely to have an unfavorable prognosis after AIS (Figure 4a). In addition, as shown in Figure 4b, patients over 64 years of age were more likely to have adverse outcomes after AIS in our model, and fasting blood glucose > 86 mg/dL was also associated with a high risk of adverse outcomes (Figure 4c).

3.4. Comparison of the Patients with and without IVT in the EVT Group

To further interpret the current model for the EVT group, we compared the characteristics of patients in the EVT group with and without IVT and assessed the importance of predictors of the two subgroups separately. When comparing the EVT, with and without IVT subgroups, there were no differences between age (67.12 (±12.48) vs. 70.28 (±13.75), p = 0.2535), fasting glucose (184.15 (±56.61) vs. 168.60 (±58.16), p = 0.2049), or NIHSS scores (18.15 (±6.60) vs. 19.04 (±7.60), p = 0.5473). There were also no differences in the outcomes by mRS assessment between the two subgroups three months after the stroke (p = 1.000).
Figure 5 demonstrates the importance of predictors for the EVT group with and without IVT treatment. Similar to the non-thrombolytic and IVT groups, the NIHSS score was the most important predictor of prognostic relevance in the EVT with IVT subgroup (Figure 5a). However, in the subgroup of EVT without IVT, fasting glucose was determined as the most crucial factor in predicting outcomes in the XGBoost model (Figure 5b).

4. Discussion

In this study, we applied an XGBoost-based model to predict the neurological prognosis of patients with AIS three months after stroke based on three simple predictors: age, NIHSS score on admission, and fasting glucose levels. The generated XGBoost model achieved a good validation performance, with an AUC of 0.8595 and validation accuracy, sensitivity, and specificity values of 0.7862, 0.7886, and 0.7644, respectively. Using the model developed by the XGBoost algorithm, we evaluated the importance of each predicting feature by Gain. In our model, NIHSS was the most critical variable for predicting unfavorable outcomes after AIS, with initial NIHSS scores above 5 eliciting a higher propensity for unfavorable outcomes. For patients treated with non-thrombolytic therapy, IVT, and EVT, our models achieved AUCs of 0.8615, 0.8423, and 0.7733, respectively. Compared to previous studies, our model simplified the variables needed to predict the prognosis of AIS and could be applied to patients receiving different AIS treatments.
Globally, stroke remains the leading cause of death and disability, and the disease and medical burden of stroke have continued to increase over the past three decades [1,2,3]. Therefore, the development of reliable prognostic prediction methods for AIS is necessary to improve disease treatment decisions, allocation of care resources, and predictive treatment strategies.
Our results suggested that aging had a negative impact on long-term treatment outcomes following AIS. Evidence suggested that older patients exhibited more severe AIS symptoms than younger patients and were associated with more unfavorable clinical outcomes, such as higher rates of disability and mortality [37,38]. The strong association between aging and unfavorable AIS outcomes, independent of stroke severity, characteristics, or comorbidities, supported the widespread use of age as a factor in various scoring systems, and its characterization as a highly-weighted covariate [6,26].
Stroke severity is another widely-used variable in AIS outcome prediction scoring systems [6,26]. The NIHSS is a 15-item neurological examination scale with higher scores indicating higher stroke severity [39]. Owing to its simplicity, reproducibility, reliability, and validity of estimation scales extracted from medical records, the NIHSS score is a commonly used predictor of AIS outcomes in stroke predictive models [6,26]. Previous studies have reported that the NIHSS score on admission was a significant factor in the development of unfavorable outcomes based on multivariate logistic regression models [40,41,42]. Consistent with Al Khathaami et al. [40], our data showed that patients with NIHSS scores > 5 on admission were more likely to develop an unfavorable prognosis. In addition, our model demonstrated more detailed probabilities using partial dependency plots of NIHSS scores and rationalized the use of the NIHSS score as a crucial predictor in the XGBoost-based prediction model.
Hyperglycemia after stroke may adversely affect clinical outcomes [30]. Previous studies confirmed that hyperglycemia on hospital admission in patients with AIS was associated with greater final infarct size, worse clinical prognosis, and higher mortality rates [15,16,43,44]. Furthermore, a retrospective study conducted by Sung et al. [45] demonstrated that, compared with random glucose or HbA1c levels, fasting glucose was a stronger independent predictor of unfavorable neurological outcomes in patients with AIS. Unlike age and NIHSS scores, which are determined at the time of AIS presentation [46], blood glucose is a modifiable factor. With proper glycemic control, it is possible to improve the prognosis of patients and reduce the negative physical impact of AIS [30]. These findings, coupled with the general applicability of glucose levels in various AIS outcome prediction models, rationalized the decision to use fasting glucose as a predictor in our prediction model [15,16,26,47,48].
Furthermore, in the present XGboost model, fasting glucose was assessed as the most critical predictor of outcomes for patients who received EVT. The result was in concordance with current evidence showing that impaired fasting glucose after AIS onset to be an independent risk factor for 3-month unfavorable outcomes after EVT, especially for patients aged ≥60 years [49,50]. The knowledge gained from using machine learning on clinical data also implied that applying these insights could impact future clinical practice, such as the optimal range of glycemic control for patients receiving different AIS treatments.
The complexity of the scoring methods and the low predictive accuracy of the models make it difficult for many scoring systems or models to be used in clinical practice to predict AIS prognosis. In this context, our model used readily available patient information, such as age, NIHSS score, and fasting glucose, providing good predictive accuracy. This gives our model the potential to be suitable to assist in clinical applications. For patients with AIS, age and stroke severity were determined at the time of stroke onset. However, clinically modifiable factors, such as blood glucose, in predictive models, could aid clinical decision-making and improve patient prognosis. For instance, one could try to optimize a patient’s glycemic control and apply the model to evaluate whether it would provide the patient with a better outcome after AIS.
Stroke prognosis is influenced by various clinical factors and treatment approaches [30,32,51]. Moreover, the multifactorial and complex nature of AIS limits the predictive performance and reliability of traditional predictive models and scoring systems to individualize treatment. Our study showed that the application of XGBoost was able to accurately predict AIS outcomes using simple inputs. The predictive power of our model was consistent with the documented application of machine learning algorithms to assist in the diagnosis of stroke and predict its outcome [16,52,53]. For patients with a predicted high risk of unfavorable prognosis, our model could facilitate the timely adoption of personalized treatments, highlighting its potential to help stratify patients, individualize treatment, and optimize the therapeutic management of patients with AIS.
The present study had several limitations. First, the data analyzed in this study were limited to patients with AIS from a single stroke center. Therefore, a larger cohort from multiple centers will be required to validate and generalize our model. Second, the model was based on only three clinical predictors and did not include data obtained from neuroimaging or other neurophysiological examinations [44], thus potentially limiting its predictive accuracy. Third, as the dataset was extracted from clinical registry data, there was a relatively modest proportion of missing data for baseline variables, which may have also affected the accuracy of the model. Finally, as the treatments for AIS continue to advance, the treatment techniques (e.g., EVT) and their respective clinical guidelines continue to be updated. As time progresses, new treatments and technologies could significantly improve the prognosis of AIS patients. Therefore, predictive models based on past patient information may have limited abilities for prospective patients with AIS.

5. Conclusions

In the present study, we developed an accurate and reliable model to predict outcomes at three months following AIS by applying XGBoost-based machine learning algorithms, using three predictors: age, fasting glucose, and NIHSS score. The generated XGBoost model achieved a notable validation performance with an AUC of 0.8595. This model also demonstrated its validity in patients with AIS who received different treatments, including IVT, EVT, and non-thrombolytic therapy. In the whole cohort and patients who received IVT or non-thrombolytic treatment, the NIHSS score was the most crucial predictor of AIS outcome. In the EVT group, the most important predictor was fasting glucose. Our results showed the potential clinical applicability of this model to facilitate individualized risk prediction and aid the formulation of diagnostic decisions for patients with AIS. Based on these results, future work is being shaped to develop predictive models that incorporate a more diverse patient population and can be automatically updated with new patient information and future advancements.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/diagnostics13050842/s1; Table S1: Results of predictive performance based on XGBoost by ten-fold cross-validation; Table S2: Results of predictive performance based on XGBoost by ten-fold cross-validation in the group of non-thrombolytic treatment; Table S3: Results of predictive performance based on XGBoost by ten-fold cross-validation in the group of EVT; Table S4: The results of predictive performance based on XGBoost by ten-fold cross-validation in the group of IVT.

Author Contributions

Conceptualization, C.-C.C. and C.-Y.K.; methodology, E.C.-Y.S. and C.-Y.K.; software, C.-Y.K.; validation, E.C.-Y.S.; formal analysis, C.-Y.K.; investigation, C.-C.C. and J.-H.C.; resources, C.-C.C. and J.-H.C.; data curation, C.-C.C. and J.-H.C.; writing—original draft preparation, C.-C.C. and C.-Y.K.; writing—review and editing, C.-C.C., E.C.-Y.S. and C.-Y.K.; visualization, Y.-T.C. and C.-Y.K.; supervision, C.-C.C. and C.-Y.K.; project administration, C.-C.C.; funding acquisition, C.-C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science and Technology Council, grant number NSTC 111-2314-B-038-132-MY3 and Taipei Medical University, grant number TMU111-AE1-B23 to Chen-Chih Chung.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of Taipei Medical University Joint Institutional Review Board (TMU-JIRB No. N202103006 on 19 March 2021).

Informed Consent Statement

The study involving human participants were reviewed and approved by Joint Institutional Review Board of Taipei Medical University. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Data Availability Statement

The data that support the findings of this study are available from Taiwan Stroke Registry (http://taiwanstrokeregistry.org/TSR/; accessed on 1 January 2022), but restrictions apply to the availability of these data, which were used under license for the current research and so are not publicly available.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Feigin, V.L.; Stark, B.; Johnson, C.O.; Roth, G.; Bisignano, C.; Abady, G.G.; Abbasifard, M.; Abbasi-Kangevari, M.; Abd-Allah, F.; Abedi, V.; et al. Global, regional, and national burden of stroke and its risk factors, 1990–2019: A systematic analysis for the Global Burden of Disease Study 2019. Lancet Neurol. 2021, 20, 795–820. [Google Scholar] [CrossRef]
  2. Katan, M.; Luft, A. Global Burden of Stroke. Semin. Neurol. 2018, 38, 208–211. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Winstein, C.J.; Stein, J.; Arena, R.; Bates, B.; Cherney, L.R.; Cramer, S.C.; DeRuyter, F.; Eng, J.J.; Fisher, B.; Harvey, R.L.; et al. Guidelines for Adult Stroke Rehabilitation and Recovery: A Guideline for Healthcare Professionals From the American Heart Association/American Stroke Association. Stroke 2016, 47, e98–e169. [Google Scholar] [CrossRef] [PubMed]
  4. Yao, Y.-Y.; Wei, Z.-J.; Zhang, Y.-C.; Li, X.; Gong, L.; Zhou, J.-W.; Wang, Y.; Zhang, Y.-Y.; Wang, R.-P. Functional Disability after Ischemic Stroke: A Community-Based Cross-Sectional Study in Shanghai, China. Front. Neurol. 2021, 12, 649088. [Google Scholar] [CrossRef] [PubMed]
  5. Chan, L.; Hong, C.-T.; Lee, H.-H.; Chung, C.-C.; Chiu, W.-T.; Lee, T.-Y.; Chen, D.Y.-T.; Huang, L.-K.; Hu, C.-J. Poststroke Cognitive Impairment: A Longitudinal Follow-Up and Pre/Poststroke Mini-Mental State Examination Comparison. Curr. Alzheimer Res. 2022, 19, 716–723. [Google Scholar] [CrossRef]
  6. Rabinstein, A.; Rundek, T. Prediction of outcome after ischemic stroke: The value of clinical scores. Neurology 2013, 80, 15–16. [Google Scholar] [CrossRef] [Green Version]
  7. Shehab, M.; Abualigah, L.; Shambour, Q.; Abu-Hashem, M.A.; Shambour, M.K.Y.; Alsalibi, A.I.; Gandomi, A.H. Machine learning in medical applications: A review of state-of-the-art methods. Comput. Biol. Med. 2022, 145, 105458. [Google Scholar] [CrossRef]
  8. Nave, O.; Elbaz, M. Artificial immune system features added to breast cancer clinical data for machine learning (ML) applications. Biosystems 2021, 202, 104341. [Google Scholar] [CrossRef]
  9. Sirsat, M.S.; Fermé, E.; Câmara, J. Machine Learning for Brain Stroke: A Review. J. Stroke Cerebrovasc. Dis. 2020, 29, 105162. [Google Scholar] [CrossRef]
  10. Heo, J.; Yoon, J.; Park, H.; Kim, Y.D.; Nam, H.S.; Heo, J.H. Machine Learning–Based Model for Prediction of Outcomes in Acute Stroke. Stroke 2019, 50, 1263–1265. [Google Scholar] [CrossRef]
  11. Su, P.-Y.; Wei, Y.-C.; Luo, H.; Liu, C.-H.; Huang, W.-Y.; Chen, K.-F.; Lin, C.-P.; Wei, H.-Y.; Lee, T.-H. Machine Learning Models for Predicting Influential Factors of Early Outcomes in Acute Ischemic Stroke: Registry-Based Study. JMIR Med. Inform. 2022, 10, e32508. [Google Scholar] [CrossRef]
  12. Yang, C.-C.; Bamodu, O.A.; Chan, L.; Chen, J.-H.; Hong, C.-T.; Huang, Y.-T.; Chung, C.-C. Risk factor identification and prediction models for prolonged length of stay in hospital after acute ischemic stroke using artificial neural networks. Front. Neurol. 2023, 14, 1085178. [Google Scholar] [CrossRef]
  13. Lin, C.-H.; Hsu, K.-C.; Johnson, K.R.; Fann, Y.C.; Tsai, C.-H.; Sun, Y.; Lien, L.-M.; Chang, W.-L.; Chen, P.-L.; Hsu, C.Y. Evaluation of machine learning methods to stroke outcome prediction using a nationwide disease registry. Comput. Methods Programs Biomed. 2020, 190, 105381. [Google Scholar] [CrossRef] [PubMed]
  14. Sung, S.M.; Kang, Y.J.; Cho, H.J.; Kim, N.R.; Lee, S.M.; Choi, B.K.; Cho, G. Prediction of early neurological deterioration in acute minor ischemic stroke by machine learning algorithms. Clin. Neurol. Neurosurg. 2020, 195, 105892. [Google Scholar] [CrossRef]
  15. Chung, C.-C.; Chan, L.; Bamodu, O.A.; Hong, C.-T.; Chiu, H.-W. Artificial neural network based prediction of postthrombolysis intracerebral hemorrhage and death. Sci. Rep. 2020, 10, 20501. [Google Scholar] [CrossRef]
  16. Chung, C.-C.; Hong, C.-T.; Huang, Y.-H.; Su, E.C.-Y.; Chan, L.; Hu, C.-J.; Chiu, H.-W. Predicting major neurologic improvement and long-term outcome after thrombolysis using artificial neural networks. J. Neurol. Sci. 2020, 410, 116667. [Google Scholar] [CrossRef]
  17. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  18. Patel, S.; Singh, G.; Zarbiv, S.; Ghiassi, K.; Rachoin, J.-S. Mortality Prediction Using SaO2/FiO2 Ratio Based on eICU Database Analysis. Crit. Care Res. Pract. 2021, 2021, 6672603. [Google Scholar] [CrossRef] [PubMed]
  19. Kuo, C.-Y.; Liu, C.-W.; Lai, C.-H.; Kang, J.-H.; Tseng, S.-H.; Su, E.C.-Y. Prediction of robotic neurorehabilitation functional ambulatory outcome in patients with neurological disorders. J. Neuroeng. Rehabil. 2021, 18, 174. [Google Scholar] [CrossRef] [PubMed]
  20. Zhang, Z.; Ho, K.M.; Hong, Y. Machine learning for the prediction of volume responsiveness in patients with oliguric acute kidney injury in critical care. Crit. Care 2019, 23, 112. [Google Scholar] [CrossRef] [Green Version]
  21. Zhang, J.; Liu, K.; Wang, M. Downscaling Groundwater Storage Data in China to a 1-km Resolution Using Machine Learning Methods. Remote Sens. 2021, 13, 523. [Google Scholar] [CrossRef]
  22. Darabi, N.; Hosseinichimeh, N.; Noto, A.; Zand, R.; Abedi, V. Machine Learning-Enabled 30-Day Readmission Model for Stroke Patients. Front. Neurol. 2021, 12, 638267. [Google Scholar] [CrossRef] [PubMed]
  23. Xu, Y.; Yang, X.; Huang, H.; Peng, C.; Ge, Y.; Wu, H.; Wang, J.; Xiong, G.; Yi, Y. Extreme Gradient Boosting Model Has a Better Performance in Predicting the Risk of 90-Day Readmissions in Patients with Ischaemic Stroke. J. Stroke Cerebrovasc. Dis. 2019, 28, 104441. [Google Scholar] [CrossRef] [PubMed]
  24. Chen, P.-W.; Baune, N.; Zwir, I.; Wang, J.; Swamidass, V.; Wong, A. Measuring Activities of Daily Living in Stroke Patients with Motion Machine Learning Algorithms: A Pilot Study. Int. J. Environ. Res. Public Health 2021, 18, 1634. [Google Scholar] [CrossRef]
  25. Saber, H.; Somai, M.; Rajah, G.B.; Scalzo, F.; Liebeskind, D.S. Predictive analytics and machine learning in stroke and neurovascular medicine. Neurol. Res. 2019, 41, 681–690. [Google Scholar] [CrossRef]
  26. Rempe, D.A. Predicting Outcomes after Transient Ischemic Attack and Stroke. Contin. Lifelong Learn. Neurol. 2014, 20, 412–428. [Google Scholar] [CrossRef]
  27. Matsumoto, K.; Nohara, Y.; Soejima, H.; Yonehara, T.; Nakashima, N.; Kamouchi, M. Stroke Prognostic Scores and Data-Driven Prediction of Clinical Outcomes after Acute Ischemic Stroke. Stroke 2020, 51, 1477–1483. [Google Scholar] [CrossRef]
  28. Hsieh, F.-I.; Lien, L.-M.; Chen, S.-T.; Bai, C.-H.; Sun, M.-C.; Tseng, H.-P.; Chen, Y.-W.; Chen, C.-H.; Jeng, J.-S.; Tsai, C.-F.; et al. Get with The Guidelines-Stroke Performance Indicators: Surveillance of Stroke Care in the Taiwan Stroke Registry: Get with the Guidelines-Stroke in Taiwan. Circulation 2010, 122, 1116–1123. [Google Scholar] [CrossRef] [Green Version]
  29. Hsieh, C.-Y.; Chen, C.-H.; Li, C.-Y.; Lai, M.-L. Validating the diagnosis of acute ischemic stroke in a National Health Insurance claims database. J. Formos. Med. Assoc. 2015, 114, 254–259. [Google Scholar] [CrossRef]
  30. Furie, K.L.; Jayaraman, M. 2018 Guidelines for the Early Management of Patients with Acute Ischemic Stroke. Stroke 2018, 49, 509–510. [Google Scholar] [CrossRef]
  31. Jahan, R.; Saver, J.L.; Schwamm, L.H.; Fonarow, G.C.; Liang, L.; Matsouaka, R.A.; Xian, Y.; Holmes, D.N.; Peterson, E.D.; Yavagal, D.; et al. Association Between Time to Treatment with Endovascular Reperfusion Therapy and Outcomes in Patients with Acute Ischemic Stroke Treated in Clinical Practice. JAMA 2019, 322, 252–263. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  32. Jang, M.U.; Kang, J.; Kim, B.J.; Hong, J.-H.; Yeo, M.J.; Han, M.-K.; Lee, B.-C.; Yu, K.-H.; Oh, M.-S.; Choi, K.-C.; et al. In-Hospital and Post-Discharge Recovery after Acute Ischemic Stroke: A Nationwide Multicenter Stroke Registry-base Study. J. Korean Med. Sci. 2019, 34, e240. [Google Scholar] [CrossRef] [PubMed]
  33. Zhang, C.; Wang, D.; Wang, L.; Guan, L.; Yang, H.; Zhang, Z.; Chen, X.; Zhang, M. Cause-aware failure detection using an interpretable XGBoost for optical networks. Opt. Express 2021, 29, 31974–31992. [Google Scholar] [CrossRef] [PubMed]
  34. Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Mitchell, R.; Cano, I.; Zhou, T.; et al. Package ‘xgboost’. R Version 2021, 90, 1–66. [Google Scholar]
  35. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  36. Petch, J.; Di, S.; Nelson, W. Opening the Black Box: The Promise and Limitations of Explainable Machine Learning in Cardiology. Can. J. Cardiol. 2022, 38, 204–213. [Google Scholar] [CrossRef]
  37. Bentsen, L.; Christensen, L.; Christensen, A.; Christensen, H. Outcome and Risk Factors Presented in Old Patients above 80 Years of Age versus Younger Patients after Ischemic Stroke. J. Stroke Cerebrovasc. Dis. 2014, 23, 1944–1948. [Google Scholar] [CrossRef]
  38. Veerbeek, J.M.; Kwakkel, G.; van Wegen, E.E.; Ket, J.C.; Heymans, M.W. Early Prediction of Outcome of Activities of Daily Living after Stroke: A systematic review. Stroke 2011, 42, 1482–1488. [Google Scholar] [CrossRef] [Green Version]
  39. Kasner, S. Clinical interpretation and use of stroke scales. Lancet Neurol. 2006, 5, 603–612. [Google Scholar] [CrossRef]
  40. Al Khathaami, A.M.; Al Bdah, B.; Alnosair, A.; Alturki, A.; Alrebdi, R.; Alwayili, S.; Alhamzah, S.; Alotaibi, N.D. Predictors of poor outcome in embolic stroke of undetermined source. Neurosciences 2019, 24, 164–167. [Google Scholar] [CrossRef]
  41. Suda, S.; Shimoyama, T.; Nagai, K.; Arakawa, M.; Aoki, J.; Kanamaru, T.; Suzuki, K.; Sakamoto, Y.; Takeshi, Y.; Matsumoto, N.; et al. Low Free Triiodothyronine Predicts 3-Month Poor Outcome after Acute Stroke. J. Stroke Cerebrovasc. Dis. 2018, 27, 2804–2809. [Google Scholar] [CrossRef]
  42. Suda, S.; Muraga, K.; Kanamaru, T.; Okubo, S.; Abe, A.; Aoki, J.; Suzuki, K.; Sakamoto, Y.; Shimoyama, T.; Nito, C.; et al. Low free triiodothyronine predicts poor functional outcome after acute ischemic stroke. J. Neurol. Sci. 2016, 368, 89–93. [Google Scholar] [CrossRef] [PubMed]
  43. Baird, T.A.; Parsons, M.W.; Phan, T.; Butcher, K.S.; Desmond, P.M.; Tress, B.M.; Colman, P.G.; Chambers, B.R.; Davis, S.M. Persistent Poststroke Hyperglycemia Is Independently Associated with Infarct Expansion and Worse Clinical Outcome. Stroke 2003, 34, 2208–2214. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  44. Masrur, S.; Cox, M.; Bhatt, D.L.; Smith, E.E.; Ellrodt, G.; Fonarow, G.C.; Schwamm, L. Association of Acute and Chronic Hyperglycemia with Acute Ischemic Stroke Outcomes Post-Thrombolysis: Findings from Get With The Guidelines-Stroke. J. Am. Heart Assoc. 2015, 4, e002193. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  45. Sung, J.-Y.; Chen, C.-I.; Hsieh, Y.-C.; Chen, Y.-R.; Wu, H.-C.; Chan, L.; Hu, C.-J.; Hu, H.-H.; Chiou, H.-Y.; Chi, N.-F. Comparison of admission random glucose, fasting glucose, and glycated hemoglobin in predicting the neurological outcome of acute ischemic stroke: A retrospective study. PeerJ 2017, 5, e2948. [Google Scholar] [CrossRef] [Green Version]
  46. Chung, C.-C.; Bamodu, O.A.; Hong, C.-T.; Chan, L.; Chiu, H.-W. Application of machine learning-based models to boost the predictive power of the SPAN index. Int. J. Neurosci. 2021, 133, 26–36. [Google Scholar] [CrossRef]
  47. Ntaios, G.; Faouzi, M.; Ferrari, J.; Lang, W.; Vemmos, K.; Michel, P. An integer-based score to predict functional outcome in acute ischemic stroke: The ASTRAL score. Neurology 2012, 78, 1916–1922. [Google Scholar] [CrossRef]
  48. Saposnik, G.; Kapral, M.K.; Liu, Y.; Hall, R.; O’Donnell, M.; Raptis, S.; Tu, J.; Mamdani, M.; Austin, P. IScore: A risk score to predict death early after hospitalization for an acute ischemic stroke. Circulation 2011, 123, 739–749. [Google Scholar] [CrossRef]
  49. Osei, E.; Hertog, H.D.; Berkhemer, O.; Fransen, P.; Roos, Y.; Beumer, D.; van Oostenbrugge, R.; Schonewille, W.; Boiten, J.; Zandbergen, A.; et al. Increased admission and fasting glucose are associated with unfavorable short-term outcome after intra-arterial treatment of ischemic stroke in the MR CLEAN pretrial cohort. J. Neurol. Sci. 2016, 371, 1–5. [Google Scholar] [CrossRef]
  50. Yuan, L.; Sun, Y.; Huang, X.; Xu, X.; Xu, J.; Xu, Y.; Yang, Q.; Zhu, Y.; Zhou, Z. Fasting Blood-Glucose Level and Clinical Outcome in Anterior Circulation Ischemic Stroke of Different Age Groups after Endovascular Treatment. Neuropsychiatr. Dis. Treat. 2022, 18, 575–583. [Google Scholar] [CrossRef]
  51. El Khoury, R.; Jung, R.; Nanda, A.; Sila, C.; Abraham, M.G.; Castonguay, A.C.; Zaidat, O.O. Overview of key factors in improving access to acute stroke care. Neurology 2012, 79, S26–S34. [Google Scholar] [CrossRef] [Green Version]
  52. Abedi, V.; Goyal, N.; Tsivgoulis, G.; Hosseinichimeh, N.; Hontecillas, R.; Bassaganya-Riera, J.; Elijovich, L.; Metter, J.E.; Alexandrov, A.W.; Liebeskind, D.S.; et al. Novel Screening Tool for Stroke Using Artificial Neural Network. Stroke 2017, 48, 1678–1681. [Google Scholar] [CrossRef] [PubMed]
  53. Chan, K.L.; Leng, X.; Zhang, W.; Dong, W.; Qiu, Q.; Yang, J.; Soo, Y.; Wong, K.S.; Leung, T.W.; Liu, J. Early Identification of High-Risk TIA or Minor Stroke Using Artificial Neural Network. Front. Neurol. 2019, 10, 171. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Analytical flowchart of development of the model for predicting outcome three months after stroke.
Figure 1. Analytical flowchart of development of the model for predicting outcome three months after stroke.
Diagnostics 13 00842 g001
Figure 2. ROC curves of ten-fold cross-validation based on XGBoost.
Figure 2. ROC curves of ten-fold cross-validation based on XGBoost.
Diagnostics 13 00842 g002
Figure 3. The importance and the ranking of the predictors based on Gain in XGBoost. Following the increasing accuracy of the branches, the longer horizontal bar represents greater importance attributed to the model. (a) Entire cohort. (b) Non-thrombolytic. (c) EVT. (d) IVT.
Figure 3. The importance and the ranking of the predictors based on Gain in XGBoost. Following the increasing accuracy of the branches, the longer horizontal bar represents greater importance attributed to the model. (a) Entire cohort. (b) Non-thrombolytic. (c) EVT. (d) IVT.
Diagnostics 13 00842 g003
Figure 4. Partial dependence plot of variables. The x-axis represents the value of a particular variable; values on the y-axis indicate the risk of developing an unfavorable outcome. A positive value indicates that a higher variable value increases the risk of an unfavorable outcome. (a) NIHSS. (b) Age. (c) Fasting glucose.
Figure 4. Partial dependence plot of variables. The x-axis represents the value of a particular variable; values on the y-axis indicate the risk of developing an unfavorable outcome. A positive value indicates that a higher variable value increases the risk of an unfavorable outcome. (a) NIHSS. (b) Age. (c) Fasting glucose.
Diagnostics 13 00842 g004
Figure 5. The importance and the ranking of the predictors of EVT with IVT and EVT without IVT subgroups. EVT, endovascular therapy; IVT, intravenous thrombolysis. (a) EVT with IVT. (b) EVT without IVT.
Figure 5. The importance and the ranking of the predictors of EVT with IVT and EVT without IVT subgroups. EVT, endovascular therapy; IVT, intravenous thrombolysis. (a) EVT with IVT. (b) EVT without IVT.
Diagnostics 13 00842 g005
Table 1. Characteristics of patients with favorable and unfavorable 3-month outcomes after AIS.
Table 1. Characteristics of patients with favorable and unfavorable 3-month outcomes after AIS.
Favorable OutcomeUnfavorable Outcomep-Value
  n (%)880 (47.62)968 (52.38)
  Age (years)63.73 (±12.43)72.44 (± 13.41)<0.0001 ***
  Male, n (%)609 (32.95)533 (28.84)<0.0001 ***
  NIHSS score3.27 (±3.4)11.50 (± 8.66)<0.0001 ***
Glycemic metrics
  Fasting glucose (mg/dL)123.69 (±43.68)140.70 (± 54.80)<0.0001 ***
  Glucose at admission (mg/dL)158.30 (±71.73)163.66 (± 80.67)0.1746
  HbA1c (%)6.73 (±1.91)6.69 (± 1.86)0.6767
Vascular risk factors, n (%)
 Hypertension607 (32.85)699 (37.82)0.1273
 Diabetes mellitus308 (16.67)401 (21.70)0.0046 **
 Hyperlipidemia621 (33.60)603 (32.63)0.0002 ***
 Atrial fibrillation102 (5.52)236 (12.77)<0.0001 ***
 Previous stroke or TIA83 (4.49)188 (10.17)<0.0001 ***
 Ischemic heart disease80 (4.33)117 (6.33)0.0371 *
*** p < 0.001; ** p < 0.01; * p < 0.05. Continuous data are presented as mean and standard deviation. Abbreviation: HbA1c, glycated hemoglobin; NIHSS, National Institute of Health Stroke Scale; TIA, transient ischemic attack.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chung, C.-C.; Su, E.C.-Y.; Chen, J.-H.; Chen, Y.-T.; Kuo, C.-Y. XGBoost-Based Simple Three-Item Model Accurately Predicts Outcomes of Acute Ischemic Stroke. Diagnostics 2023, 13, 842. https://doi.org/10.3390/diagnostics13050842

AMA Style

Chung C-C, Su EC-Y, Chen J-H, Chen Y-T, Kuo C-Y. XGBoost-Based Simple Three-Item Model Accurately Predicts Outcomes of Acute Ischemic Stroke. Diagnostics. 2023; 13(5):842. https://doi.org/10.3390/diagnostics13050842

Chicago/Turabian Style

Chung, Chen-Chih, Emily Chia-Yu Su, Jia-Hung Chen, Yi-Tui Chen, and Chao-Yang Kuo. 2023. "XGBoost-Based Simple Three-Item Model Accurately Predicts Outcomes of Acute Ischemic Stroke" Diagnostics 13, no. 5: 842. https://doi.org/10.3390/diagnostics13050842

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop