A Machine Learning-Based Applied Prediction Model for Identification of Acute Coronary Syndrome (ACS) Outcomes and Mortality in Patients during the Hospital Stay

Sherazi, Syed Waseem Abbas; Zheng, Huilin; Lee, Jong Yun

doi:10.3390/s23031351

Open AccessArticle

A Machine Learning-Based Applied Prediction Model for Identification of Acute Coronary Syndrome (ACS) Outcomes and Mortality in Patients during the Hospital Stay

by

Syed Waseem Abbas Sherazi

,

Huilin Zheng

and

Jong Yun Lee

^*

Department of Computer Science, Chungbuk National University, Cheongju 28644, Chungbuk, Republic of Korea

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(3), 1351; https://doi.org/10.3390/s23031351

Submission received: 3 January 2023 / Revised: 17 January 2023 / Accepted: 20 January 2023 / Published: 25 January 2023

(This article belongs to the Special Issue Sensor Data Fusion Based on Deep Learning for Computer Vision and Medical Applications II)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Nowadays, machine learning (ML) is a revolutionary and cutting-edge technology widely used in the medical domain and health informatics in the diagnosis and prognosis of cardiovascular diseases especially. Therefore, we propose a ML-based soft-voting ensemble classifier (SVEC) for the predictive modeling of acute coronary syndrome (ACS) outcomes such as STEMI and NSTEMI, discharge reasons for the patients admitted in the hospitals, and death types for the affected patients during the hospital stay. We used the Korea Acute Myocardial Infarction Registry (KAMIR-NIH) dataset, which has 13,104 patients’ data containing 551 features. After data extraction and preprocessing, we used the 125 useful features and applied the SMOTETomek hybrid sampling technique to oversample the data imbalance of minority classes. Our proposed SVEC applied three ML algorithms, such as random forest, extra tree, and the gradient-boosting machine for predictive modeling of our target variables, and compared with the performances of all base classifiers. The experiments showed that the SVEC outperformed other ML-based predictive models in accuracy (99.0733%), precision (99.0742%), recall (99.0734%), F1-score (99.9719%), and the area under the ROC curve (AUC) (99.9702%). Overall, the performance of the SVEC was better than other applied models, but the AUC was slightly lower than the extra tree classifier for the predictive modeling of ACS outcomes. The proposed predictive model outperformed other ML-based models; hence it can be used practically in hospitals for the diagnosis and prediction of heart problems so that timely detection of proper treatments can be chosen, and the occurrence of disease predicted more accurately.

Keywords:

machine learning; predictive model; acute coronary syndrome; soft-voting ensemble classifier; imbalanced data; diagnosis and prognosis

1. Introduction

During the last few decades, benefiting from the powerful ability of machine learning and deep learning has achieved great success in health informatics, disease diagnosis and prediction, risk score and healthcare analysis [1,2], specifically in cardiology and cardiovascular disease. With the use of advanced diagnosis and prediction techniques, patients as well as paramedical staff have benefited from the timely detection of proper treatments as well as the severity of the patients [3,4] affected with cardiovascular disease. As these prediction models are dependent on critical risk factors and clinical outcomes of the patients, they can predict the occurrence of diseases more accurately and promptly.

1.1. Research Motivation

Scientists and researchers are continuously focusing on the development of diagnosis and prognosis models for their practical use during clinical practice and routine follow-ups [5]. Nevertheless, previous risk detection studies and existing clinical research works mainly focus on either regression-based prediction algorithms, or only deal with a few risk factors while ignoring the tremendous risk factors and correlation between important risk factors [6]. According to the Framingham heart study [7], the prediction models for prediction of heart diseases have mainly been categorized into two groups, such as regression-based risk assessment methods and machine learning-based classification methods. The previous risk assessment methods have been the Framingham Risk Score (FRS) [8,9,10], QRISK [11,12], Thrombolysis in Myocardial Infarction (TIMI) [13,14], Global Registry of Acute Coronary Events (GRACE) [15,16], and History, Electrocardiogram, Age, and Risk factors and Troponin (HEART) [17,18,19] models, whereas the machine learning-based classification methods for heart disease [20] are Random Forest (RF) [21,22,23], Extra Tree (ET) [20,24], Support Vector Machine (SVM) [25], Gradient Boosting Machine (GBM) [26,27], Neural Networks (NN) [28,29,30], and other ensemble models [6,31]. However, there are some drawbacks and limitations of previous models, as follows. Regression-based risk assessment models are outdated and consider only a few risk scores for the early prediction and diagnosis of ACS, whereas already developed machine learning-based risk score models have low accuracy, and those models are specified to only a few risk factors and do not provide accurate predictions for other classes [32]. Furthermore, regression models and decision trees provide limited interpretability, specifically in data nonlinearity [33]. Previous prediction models cannot deal with imbalanced data problems and lead to negative prediction performance, which can lead patients towards mortality [34]. Consequently, most of the previous research studies have focused on the clinical follow-ups of patients while ignoring the hospital stay. The patients are in more medically critical and serious situations during the hospital stay, and discharged from the hospital either after their recovery, or discharged in the case of death or hopeless situations.

1.2. Research Objectives

Therefore, this paper proposes a machine learning-based predictive model for the prediction of acute coronary syndrome (ACS) outcomes and mortality of the patients during the hospital stay. The main goal of this research is to develop a machine learning-based predictive model for the prediction of ACS outcomes such as ST-elevation myocardial infarction (STEMI) and non-ST-elevation myocardial infarction (NSTEMI), as well as mortality prediction of in-hospital patients due to heart problems. This research also focuses on the prediction of discharge patients whether the patients have been diagnosed with or recovered from ACS, died during the hospital stay, discharged due to hopelessness by the hospitals, or referred to other hospitals due to medical reasons. Our research content can be summarized as follows. First, we conducted experiments on the Korea Acute Myocardial Infarction Registry (KAMIRNIH) dataset for the prediction of our target outcome, which was highly imbalanced, redundant, and contained missing values as well as irrelevant data regarding the patients’ medical records. We could not directly use this data for the experimental analysis, and thus needed data preprocessing, which deals with data problems such as missing values and data redundancies, drops irrelevant data, and goes through data-balancing techniques so that a proper dataset can be used for predictive modeling. Second, we applied the most-used machine learning models, such as random forest, extra tree, and gradient boosting machine, and combined them to develop an ensemble predictive model for prediction of ACS outcomes and mortality of the patients during the hospital stay. Third, we focused on multiple risk factors and designed the model for the prediction of ACS outcomes (e.g., STEMI, NSTEMI), mortality of patients (e.g., cardiac death, non-cardiac death), and discharge of the patients (e.g., recovered, died, hopeless discharged, referred to the other hospitals). Finally, we compared the performances of machine learning-based classification models with our proposed voting ensemble predictive model in terms of performance measures.

The research contributions and practical aspect of the proposed approach can be summarized as follows:

The machine learning-based prediction model is proposed for identification of acute coronary syndrome (ACS) outcomes and mortality in patients during the hospital stay.
For the proposed prediction model, the experimental dataset is preprocessed by applying multiple preprocessing, data cleaning, experimental data extraction, and data sampling methods, resulting in 13,104 patients’ records containing 125 important and useful attributes such as basic medical information, past medical history, family medical history, information about medical diagnosis tools, medical findings, PCI information, and initial diagnosis records, and so forth as the final experimental dataset.
To deal with the data imbalance issue, we applied the SMOTETomek hybrid sampling technique to overcome the imbalanced class distribution in the experimental dataset.
For the experimental analysis, we applied the various machine learning algorithms and finalized the top three algorithms, such as the random forest, extra tree, and gradient boosting machine, and proposed the soft-voting ensemble classifier using these three algorithms as base classifiers for predictive modeling using the training dataset and then evaluated the performance of applied models using 5-fold cross-validation. After that, we applied the test data for the testing of the proposed predictive model and other machine learning models.
For analyses of our experimental dataset, we applied various statistical methods such as the Chi-Square test and Analysis of Variance (ANOVA) test.
For the evaluation of the proposed soft-voting ensemble classifier and other machine learning algorithms, we applied performance measures such as the AUC, precision, recall, F1-score, accuracy, and confusion matrices.
From the experimental results and performance of the proposed predictive model, we conclude that the proposed model outperformed other models and is effective for timely identification of the ACS, as well as helpful for physicians and patients to identify future cardiac events and select the proper treatment for the patients so that the mortality ratio can be reduced in patients with ACS.

2. Materials and Methods

2.1. Data Source

For the experimental analysis and research studies, we used the Korea Acute Myocardial Infarction Registry (KAMIR) [35] dataset, which is a nationwide registry for Korean patients affected with heart-related diseases. This is registered in 52 hospitals all over Korea and has all patients’ records registered from November 2005 to December 2019. This KAMIR data registry categorized the whole dataset into 4 groups based on the time of registry of the patients. KAMIR-I has all the patients’ data registered between November 2005 to December 2006, KAMIR-II has the patients’ records from January 2007 to January 2008, KAMIR-III (also known as KorMI-I) has all the medical information of the patients between February 2008 to March 2012, KAMIR-IV (KorMI-II) has the data from April 2012 to December 2015, and KAMIR-NIH has all the registered patients’ records from November 2011 to December 2019. The latest data of this registry is KAMIR-NIH, so we used the KAMIR-NIH data for the experimental analysis. KAMIR-NIH consists of 13,104 patients’ records with 2-year follow-ups after hospital discharge. This data contains 551 attributes of the registered patients, such as patients’ past medical history, basic medical information, drugs prescribed and used, rehospitalization history, and cardiac and non-cardiac disease records for the registered patients. This dataset contains the patients’ information during the hospital stay, as well as 6-month, 12-month, and 24-month follow-up records.

2.2. Data Extraction and Data Preprocessing

The dataset provided by the KAMIR was in raw form and could not be accessed or used for the experiments, therefore we needed to extract the useful data and preprocess the extracted data according to our experiments and predictive targets. The available dataset was inconsistent, redundant, and contained lots of non-effective information (e.g., follow-ups, date and time, drugs, drugs dosage, treatment methods, and stents information) as well as missing values, so data extraction and preprocessing were the most important steps to use this data for final outcomes of the predictions and diagnosis of ACS for in-hospital patients. The overall data extraction process is briefly elaborated on in Figure 1.

The KAMIR-NIH consists of 13,104 patients’ records having 551 medical attributes. This dataset has basic medical information of the registered patients, patients’ medical history, drugs used for the treatment and drugs dosage, rehospitalization history, cardiac and non-cardiac disease records for the registered patients, in-hospital medical records, as well as 6-month, 12-month, and 24-month follow-up records for the patients. First, we used the KAMIR-NIH raw data and excluded the attributes containing the follow-up information because our research focuses on the patients who were admitted to hospital and stayed for medical examinations regardless of the 6-month, 12-month, and 24-month follow-ups. We excluded 236 attributes containing follow-up information from the total of 551 attributes. In the next step, we omitted the 185 attributes containing unnecessary information such as date and time attributes, drug attributes, drug dosage attributes, treatment methods used during the hospital stays, and attributes containing stents information, such as stent size, diameter, length, and so forth. Furthermore, we also removed all the features containing the information of our final predictions. Our final target variables were ACS outcomes, such as ST-elevation myocardial infarction (STEMI) and non-ST-elevation myocardial infarction (NSTEMI), and mortality prediction of the patients admitted in the hospitals due to heart problems such as CD and NCD, and prediction of discharged patients, such as whether the patients had been diagnosed of or recovered from ACS, died during the hospital stay, discharged due to hopelessness by the hospitals, or referred to other hospitals due to some other medical reason. Therefore, we omitted all those variables which contained this medical information. Finally, we had the 13,104 patients’ records containing 125 important and useful attributes, such as basic medical information (e.g., gender, age, height, weight, heart rate), past medical history (e.g., chest pain, diabetes, hypertension, previous heart failure), family medical history, information about medical diagnosis tools (e.g., electrocardiogram, Image Finding for MI, MRI, CT scan, ECHO), medical findings (e.g., white blood cells, neutrophil, hemoglobin, platelets, glucose, creatinine, cholesterol), PCI information, and initial diagnosis records.

In the KAMIR-NIH dataset, we had different features which could be classified as categorical features, continuous features, and discrete features [36]. For each type of classified feature, we used different preprocessing rules to convert this data so that we could use different machine learning classifiers. First of all, we excluded all the unnecessary attributes from the raw dataset and selected the important features for our implementation. There were lots of missing values in the select attributes, so we implemented multiple missing values’ imputation methods, such as mean value imputation, median, k-nearest neighbours, and also used zero-value imputation to deal with missing values.

This is Example 1 of an equation:

Mean Value = \frac{Sum of all values in the colum}{Total number of values}

(1)

Median Value = \{\begin{array}{l} (\frac{n + 1}{2}) th value if Odd no . of values \\ \frac{\frac{nth}{2} value + (\frac{n}{2} + 1) th value}{2} if Even no . of values \end{array}

(2)

kNN Value = Mean value of k - neighbours of the missing value

(3)

We compared and confirmed that the zero-value imputation methods worked more efficiently than the mean or median value imputation methods, so we finalized the zero-value imputation method to deal with missing values. For categorical features, we used one-hot encoding [37] and label encoding [38] to convert them into numerical form. For continuous variables and discrete variables, we used the actual values of these attributes so that we could minimize data loss. Furthermore, for the variables containing multiple values, we used the one-hot encoding method so that we could use all the possible outcomes of those attributes. For binary-valued attributes, we simply converted them into 1 and 2, so that 1 represented Yes and 2 represented No, whereas if there were missing values, we denoted them with 0.

2.3. Data Sampling

Data sampling is an efficient and valuable technique for the transformation of an imbalanced training dataset into balanced class distribution either by increasing the minority class data (oversampling), or decreasing the majority class data (undersampling), or using both strategies at the same time (hybrid sampling). From previous studies of health informatics research and data sampling techniques [39,40,41,42], it has been concluded that hybrid sampling techniques are considered the best data sampling techniques. According to our target variables, our final prediction results include the ACS outcomes along with the discharge reason and death type (final diagnosis + discharge result + death type) for the patients admitted to hospital due to heart problems. In our dataset, our final predictions included 10 target variables, such as “STEMI CD” (N = 290), “STEMI NCD” (N = 38), “STEMI Hopeless Discharge” (N = 24), “STEMI Recovery to Home” (N = 5860), “STEMI Recovery to Other Hospital” (N = 113), “NSTEMI CD” (N = 135), “NSTEMI NCD” (N = 40), “NSTEMI Hopeless Discharge” (N = 25), “NSTEMI Recovery to Home” (N = 6406), and “NSTEMI Recovery to Other Hospital” (N = 173). The data for each target variable in the original dataset are highly imbalanced. We applied the machine learning-based classifiers as well as our proposed predictive model for the final prediction of these target variables, and results were highly biased towards the majority class. We applied the SMOTETomek hybrid sampling technique to balance the class distribution in the experimental dataset. The SMOTETomek hybrid sampling technique uses the SMOTE for oversampling and Tomek Links for data cleaning. This oversampling method does not generate the duplicates of the original dataset. Instead, it creates synthetic data points which are different from the original ones. It moves the data points in the direction of its neighbours so that the synthetic data points are not exactly the same as the original data, but are not completely the same as the original data. Tomek Links in SMOTETomek was used for data cleaning as it removes the majority class data that has less distance from the minority class. This is the reason why we preferred SMOTETomek over the other methods.

2.4. Architecture of Proposed Predictive Modeling System

The overall architecture of the proposed predictive modeling system for ACS outcomes and discharge prediction is briefly elaborated in Figure 2. The steps for the predictive modeling system are as follows. In the first step, we used the KAMIR-NIH raw dataset for our experiments and then extracted the useful features from the overall dataset. In the next step, we applied the preprocessing rules on the extracted data to convert the data into a useful form, so that we could apply the different machine learning-based classification models and evaluate the models. We preprocessed the data by applying missing-value imputation techniques, data transformation techniques, and data encoding techniques on our experimental data. After that, we split the data into two main groups, such as training data (70%) and testing data (30%), and then further subdivided the training dataset into a validation dataset using 5-fold cross-validation. In the 5-fold cross-validation, the training dataset was divided into 5 equal sub-groups of datasets and we trained the model using 4 subgroups as training data and the fifth subgroup as test data during the training process. In the next step, the other subgroup was considered as the test dataset and we trained the model on the rest of the 4 subgroups. This process was repeated for 5 iterations until all subgroups were used as test data during the training process of the model. In the next step, we applied the machine learning-based classifiers such as the random forest, extra tree, gradient boosting machine, and a proposed machine learning-based voting ensemble classifier for predictive modeling using the training dataset and then evaluated the performance of the applied models using 5-fold cross-validation. In the next step, we used the test data (30%) for the testing of machine learning-based classification models and the proposed predictive model, and then evaluated the model’s performance using the area under the ROC curve (AUC), accuracy, precision, recall, F-score, and the confusion matrix. Finally, the prediction results were extracted for applied machine learning-based classification models and we classified the extracted results as the final output of our predictive modeling system.

2.5. Applied Machine Learning Algorithms

For the experiment, we applied various machine learning algorithms, such as RF, ET, SVM, GBM, GLM, Linear Regression, and Logistic Regression. We compared the results of all applied algorithms and finalized the top 3 algorithms, such as RF, ET, and GBM with outperformed results. The accuracy of these selected ML algorithms was comparatively high, and these were better prediction models for the prescribed outcomes. The other algorithms had lower performance, and therefore we skipped those algorithms and also did not include the other algorithms as base classifiers in the soft-voting ensemble classifier because those algorithms could cause a decrease in the performance of the proposed ensemble classifier. Therefore, we proposed the soft-voting ensemble classifier using RF, ET, and GBM as base classifiers for the identification of acute coronary syndrome (ACS) outcomes and mortality in patients during their hospital stay.

2.5.1. Random Forest

Random forest [21,22,23] is a learning method for classification and regression in which decision trees are constructed such that these trees depend on independent sampled values of a random vector, and the distribution is the same for all trees in random forest. Random forests are constructed randomly, and their results are extracted by combining the decisions of several trees which are trained independently, and final predictions are merged through averaging [28] the prediction trees. When training of the random forest is performed, it can predict the results for new unlabeled data. Prediction is performed identically on each decision tree and random forest finalizes the predictions on the bases of averages of all predictions from each decision tree.

2.5.2. Extra Tree

Extra tree is also known as extremely randomized trees [20,24] in which feature selection and split selection are performed randomly, and it is less computationally expensive as compared with random forest. The key difference between decision trees, random forests, and extra trees is that extra trees show low variance, whereas decision trees and random forest show high variance and medium variance, respectively. It also minimizes over-learning from datasets and controls overfitting, and hence was used in our experiment. The extra tree algorithm yields state-of-the-art results in complex problems of high dimensionality. The decision stumps for extra tree classifiers are built in such a way that all data used in the training dataset are utilized for building each stump, the maximum depth is one, and the best split to form the root node or any node is determined by inquiring into a subset of randomly selected features.

2.5.3. Gradient Boosting Machine

Gradient boosting [26,27] is a powerful machine learning technique for predictive modeling in the form of an ensemble of weak prediction models, generally decision trees. Gradient boosting is a machine learning technique used to solve classification and regression problems in such a way that it strengthens the model with weak predictions to make it better and minimize the loss function such that test loss becomes a minimum. It causes the loss function to be optimized, weak learners to make predictions, and additive models to add weak learners to minimize the loss function. The reason for using this model is that the user-specified cost function can be optimized instead of having a loss function that provides less control, and does not correspond to real-world problems and real-time applications.

2.6. Design of a Machine Learning-Based Soft-Voting Ensemble Classifier for Predictive Modeling

In this section, we describe the framework to propose the machine learning-based soft-voting ensemble classifier for the predictive modeling of our results, such as ACS outcomes and discharge reasons. Basically, the voting ensemble classifier relies on the multiple machine learning-based base classifiers which collectively take part in the final prediction results and finalize the predictive results on the bases of the highest weighted probability result. It combines the multiple machine learning models to increase predictive accuracy and minimize the flaws of individual classifiers. The mathematical equation for the prediction of the soft-voting ensemble classifier is as follows:

P = \arg \max_{i} \sum_{j = 1}^{m} W_{j} f_{c_{A}} (C_{j} (x) = i)

(4)

where C_j = classifier; W_j = weights associated with the prediction of classifier;

f_{c_{A}} {= characteristic function [C}_{j} (x) = i \in A]

; and A = set of class labels.

The design steps of the proposed predictive model are as follows. In the first step, we applied multiple machine learning-based models on our preprocessed data and extracted the results. After that, we selected three machine learning-based classifiers, such as random forest, extra tree, and gradient boosting machine with the best prediction result and dropped other classifiers as their accuracies and performances were comparatively lower than the others. After that, we trained the base classifiers and tuned their hyperparameters. In the next step, we assigned the weights to these base classifiers to increase the performance of our ensemble model and combined the results of these base classifiers using the soft-voting strategy. In the soft-voting ensemble classifier, the final outcome is the result with the highest sum of weight probabilities of the base classifiers. Using the soft-voting ensemble classifier, we could cover up the drawbacks and low performances of the base classifiers. The architecture of machine learning-based soft-voting ensemble classifier for the predictive modeling of ACS outcomes and discharge classification is shown in Figure 3.

2.7. Statistical Analysis

For analyses of our experimental dataset, we applied various statistical methods such as the Chi-Square test and Analysis of Variance (ANOVA) test. In our dataset, we had different types of data, such as categorical features and continuous features. To perform the statistical analysis and check their statistical significance, we applied statistical methods and calculated their significance. A chi-square test was used for categorical variables to show their relationship with target variables and interpret the discrepancies between the actual outcomes and expected outcomes. The ANOVA test was used to analyze the mean value differences and influence of independent variables on target variable. We used the ANOVA test because it compares more than two groups to determine the relationship between them. The mathematical representation for the chi-square and ANOVA tests is mentioned below:

x^{2} = \sum \frac{{(O_{i} {- E}_{i})}^{2}}{E_{i}}

(5)

F = \frac{\frac{(\sum_{i = 1}^{k} (\frac{T_{i}^{2}}{n_{i}}) - (\frac{G^{2}}{n}))}{k - 1}}{\frac{\sum_{i = 1}^{k} \sum_{j = 1}^{n_{i}} Y_{ij}^{2} - \sum_{i = 1}^{k} (\frac{T_{i}^{2}}{n_{i}})}{n - k}}

(6)

In Equation (5),

x^{2}

represents chi-square,

O_{i}

observed value, and

E_{i}

expected value. In Equation (6),

\frac{(\sum_{i = 1}^{k} (\frac{T_{i}^{2}}{n_{i}}) - (\frac{G^{2}}{n}))}{k - 1}

is the mean square due to treatment (MST),

\frac{\sum_{i = 1}^{k} \sum_{j = 1}^{n_{i}} Y_{ij}^{2} - \sum_{i = 1}^{k} (\frac{T_{i}^{2}}{n_{i}})}{n - k}

is the mean square due to error (MSE),

T_{i}

is the group total,

n_{i}

is the number of observation in group i, n is the total number of observations, G is the grand total for all observations, and

Y_{ij}

is an observation. All statistical analyses were performed using IBM SPSS Statistics 23 and Microsoft Excel 365.

2.8. Evaluation Method and Performance Measures

We categorized our dataset into two main groups, such as training data (70%) and testing data (30%), and then applied 5-fold cross-validation during the training process. We generated each machine learning-based prediction model with the best hyper-parameters, evaluated by 5-fold stratified cross-validation, and then verified by test dataset (30%). For the evaluation of our applied classifiers, we used certain performance measures such as AUC, precision, recall, F1-score, accuracy, and confusion matrices.

The formulas for the above-mentioned performance measures are as follows:

Precision = \frac{TP}{TP + FP}

(7)

Recall = \frac{TP}{TP + FN}

(8)

F 1 - Score = 2 \times \frac{(Recall * Precision)}{(Recall + Precision)}

(9)

Accuracy = \frac{TP + TN}{TP + FP + FN + TN}

(10)

where TP, TN, FP, and FN denote True-Positive, True-Negative, False-Positive, and False-Negative.

The confusion matrix shown in Table 1 describes the performances of the machine learning-based predictive models, where each row and column represent the instances in the actual class and predicted class, respectively. The FP is a Type 1 error and FN is a Type 2 error.

2.9. Implementation Environments

For statistical analysis and data preprocessing, we used the SPSS 18 for Windows (SPSS Inc., Chicago, IL, USA) [43] and MS Excel for Windows (Microsoft Office 365 ProPlus) [44]. For the implementation of our experiments and performance analysis, we used the 64-bit Windows-based operating system with Intel(R) Core (TM) i7-4770 CPU @ 3.40 GHz 3.40 GHz processor and 24 GB random access memory (RAM). We also used the Jupyter Notebook, an open-source web application, for the development and implementation of the predictive model using Python language (Version 3.7.3) [45], Tensor flow [46], Keras (Version 2.3.1), Scikit-learn [47], numpy [48], pandas [49], and imbalanced-learn [50] libraries.

3. Results

This chapter will briefly overview and explain the results of our experimental analysis for the predictive modeling of the ACS outcomes and prediction of the discharge reasons for the patients admitted in the hospitals, and also classify all outcomes of our experiments with proper results analysis. We applied widely used machine learning algorithms such as random forest, extra tree, and the gradient boosting machine, and then proposed the machine learning-based soft-voting ensemble classifier using these machine learning classifiers for predictive modeling of the patients during their hospital stay.

3.1. Baseline Characteristics

The KAMIR-NIH dataset was preprocessed and subdivided into 2 groups such as ST-elevation myocardial infarction (STEMI) and non-ST-elevation myocardial infarction (NSTEMI). We applied the Chi-Square test for categorical variables to calculate the percentages and frequencies in dataset, and ANOVA test for continuous variables to compute the means and standard deviations of the selected features. The baseline characteristics of all categorical and continuous features are mentioned in Table 2 and Table A1. Most of the features have a p-value < 0.001 and <0.05 which indicates that the attributes are statistically significant and have a very low chance of being wrong, whereas some features have a higher p-value indicating that there is a higher chance of being wrong and of it being statistically not significant. In Table 2, features such as Sex, Age, Chest Pain, and so forth have a p-value < 0.001, whereas some features such as hsCRP, ST Type, and so forth have a p-value < 0.05, which shows that these attributes are statistically significant and have very low chances of a wrong result. In contrast, features such as Neutrophil, Troponin T, High Density Lipoprotein, and so forth have a higher p-value, indicating that these attributes are not statistically significant and have a higher chance of being wrong.

3.2. Results of Performance Measures for Applied Machine Learning-Based Predictive Models

For the performance evaluation of applied machine learning-based predictive models for ACS occurrences and discharge results, we compared all applied models using accuracy, precision, recall, F1-score, and AUC as shown in Table 3, Table 4, Table 5 and Table 6. Table 3 shows the detailed results of performance measures for the random forest classifier for all required outcomes of our class labels such as “STEMI CD”, “STEMI NCD”, “STEMI Hopeless Discharge”, “STEMI Recovery to Home”, “STEMI Recovery to Other Hospital”, “NSTEMI CD”, “NSTEMI NCD”, “NSTEMI Hopeless Discharge”, “NSTEMI Recovery to Home”, and “NSTEMI Recovery to Other Hospital”, Table 4 for an extra tree classifier, Table 5 for a gradient boosting machine classifier, and Table 6 for our proposed soft-voting ensemble classifier for predictive modeling of the above-mentioned class labels.

Table 7 shows the overall performance evaluation and comparison of all applied machine learning-based models, such as random forest, extra tree, and gradient boosting machine with our proposed soft-voting ensemble classifier for predictive modeling of ACS outcomes and discharge results for the patients admitted to the hospital. Table 7 presents the overall results of all applied machine learning-based predictive models as the average of all 10 class labels mentioned above. The average accuracies were 98.9172%, 98.9796%, 78.6517%, and 99.0733% for random forest, extra tree, the gradient boosting machine, and the soft-voting ensemble model, respectively. Consequently, the AUC, precision, recall, and f1-score values for random forest were (99.9650%, 98.9182%, 98.9172%, 98.9159%), extra tree (99.9828%, 98.9828%, 98.9797%, 98.9791%), gradient boosting machine (97.7566%, 78.7202%, 78.6517%, 78.6302%), and the soft-voting ensemble classifier (99.9702%, 99.0742%, 99.0734%, 99.9719%), respectively.

From the results of overall performance evaluations for applied machine learning-based predictive models mentioned in Table 7, we could find that the proposed soft-voting ensemble classifier outperformed other predictive models in accuracy, precision, recall, and f1-score, but the extra tree outperformed others in AUC. Overall, the proposed machine learning-based soft-voting ensemble classifier improved the predictive results for our target variables using KAMIR-NIH data for the patients admitted in hospitals due to heart problems.

Figure 4 shows the confusion matrix for the performance evaluation of the proposed soft-voting ensemble classifier using in-hospital data. The confusion matrix was used for performance evaluation in tabular form in which the x-axis and y-axis represent class labels such as “STEMI CD”, “STEMI NCD”, “STEMI Hopeless Discharge”, “STEMI Recovery to Home”, “STEMI Recovery to Other Hospital”, “NSTEMI CD”, “NSTEMI NCD”, “NSTEMI Hopeless Discharge”, “NSTEMI Recovery to Home”, and “NSTEMI Recovery to Other Hospital”. These class labels are denoted as numbers from 0–9 for these 10 class labels, respectively.

Figure 4a represents the actual confusion matrix with classified class labels and misclassified class labels, whereas Figure 4b represents the normalized confusion matrix. The diagonal values in Figure 4a,b represent the accurately classified results and other values misclassified using the machine learning-based soft-voting ensemble classifier.

4. Discussion

After the fast revolution of technology in the fields of computer science and health informatics, the practical use of artificial intelligence technologies in cardiology has also been increasing, specifically for the diagnosis and prognosis of ACS occurrences in patients with cardiovascular disease. Therefore, our research focused on the development of the machine learning-based soft-voting ensemble model for the predictive modeling of occurrences of ACS outcomes, hospital discharge for the patients with ACS, and type of death in cases where the patient died during their hospital stay. From the experimental results of machine learning classifiers, we realized that the applied algorithms were accurately predicting a few class labels, but the accuracy was low for other class labels. Therefore, we proposed the machine learning-based soft-voting ensemble classifier for predictive modeling of our target variables. Consequently, we compared the results of applied machine-learning classifiers with the proposed soft-voting ensemble predictive model on the basis of performance measures such as accuracy, precision, recall, F1-score, and AUC. The results showed that the proposed predictive model outperformed the other machine learning classifiers. The performance of the proposed soft-voting ensemble model in accuracy, precision, recall, F1-score, and AUC was 99.0733%, 99.0742%, 99.0734%, 99.9719%, and 99.9702%, as well as random forest (98.9172, 98.9182, 98.9172, 98.9159, 99.9650), extra tree (98.9796, 98.9828, 98.9797, 98.9791, 99.9828), and gradient boosting machine (78.6517, 78.7202, 78.6517, 78.6302, 97.7566), respectively. The overall performance of our predictive model was comparatively higher than other applied machine learning models in accuracy, precision, recall, and F1-score, but the AUC of the extra tree classifier was slightly higher than the proposed predictive model.

5. Conclusions

This paper proposed a machine learning-based soft-voting ensemble classifier for the predictive modeling of ACS occurrences (e.g., STEMI, NSTEMI), hospital discharge reports (e.g., Death, Hopeless Discharge, Recovery to Home, Recovery to Other Hospital), and the death type (e.g., Cardiac Death, Non-Cardiac Death) of patients admitted to hospitals using three widely used machine-learning classifiers named as random forest, extra tree, and the gradient boosting machine. Consequently, we can summarize our main contributions as follows. First, this paper led to the development of machine learning-based predictive modeling for the identification of acute coronary syndrome (ACS) outcomes and mortality in patients during their hospital stay. Second, this model can forecast the occurrences of mortality during hospital stays of Korean patients with acute coronary syndrome, because they well-reflect the demographic characteristics of Koreans. Third, it was shown that the performance of the machine learning-based mortality prediction model was superior to previous risk assessment approaches. Lastly, it is expected that these results can contribute to the development of a future diagnosis and forecast tool of the occurrences of major adverse cardiovascular events (MACE) in clinical patients with acute coronary syndrome. The proposed machine learning-based soft-voting ensemble classifier is effective for timely identification of ACS, and is also helpful for the physicians and patients to identify future cardiac events and select the proper treatment for the patients so that the mortality ratio can be reduced in patients with ACS. This research can be used as a satisfactory privilege for the development of a machine learning-based risk score system for patients admitted to hospitals, as well as discharged patients in the near future.

There were some potential limitations to our research. First, we used only 8227 experimental subjects, and thus our dataset is insufficient, because machine learning algorithms need to employ a large-scale dataset in the experiment. Second, our proposed model is also limited in diagnosing and forecasting mortality in Korean patients with acute coronary syndrome. Third, it is difficult to explain the prediction result in machine learning-based approaches, because GBM, RF, and ET were non-linear models, whereas in the regression-based prediction models, it was easy to explain how major prognostic factors were associated with mortality in patients with ACS, because they were based on statistical analysis. Lastly, there was a limit on checking the mortality predictions during hospital stays in our experimental dataset.

Author Contributions

S.W.A.S.: Method design, literature research, data preprocessing, soft-voting ensemble classifier design and model framework implementation, and manuscript preparation. H.Z.: Helped in data collection and analysis and data preprocessing, model framework implementation. J.Y.L.: Method design, framework design, experiment coordination, and manuscript preparation. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2022R1F1A1063003), and the MSIT (Ministry of Science and ICT), Korea, under the Grand Information Technology Research Center support program (IITP-2023-2020-0-01462) supervised by the IITP (Institute for Information and Communications Technology Planning and Evaluation).

Institutional Review Board Statement

The experimental dataset was provided by the Korea Acute Myocardial Infarction Registry (KAMIR-NIH) for experimental analysis. The dataset included patients’ medical history, follow-up information, medication, and cardiac event information, but it did not include patients’ personal information, hospital details or photographs, and so forth. All participants in this study were guaranteed anonymity. According to Article 13 of the Enforcement Rule of the Bioethics and Safety Act based on Korean law (https://www.irb.or.kr/menu02/commonDeliberation.aspx, accessed on 3 August 2022), IRB approval can be exempted for the experimental dataset. Our experimental datasets and results are authentic and real. The proposed prediction model for the mortality prediction of AMI patients with hypertension was limited to Korean people and cannot be directly applied to other racial groups. The experimental datasets are confidential and available with the permission of the Korea Acute Myocardial Infarction Registry (KAMIR, http://kamir6.kamir.or.kr/) on reasonable request.

Informed Consent Statement

Not applicable, as this research doesn’t include patients’ personal information, hospital details or photographs, and so forth.

Data Availability Statement

The experimental data can be accessed and available on special request to the Korea Acute Myocardial Infarction Registry (KAMIR) (http://kamir5.kamir.or.kr/).

Acknowledgments

The authors would like to thank the Korea Acute Myocardial Infarction Registry (KAMIR); a nationwide, multicenter data collection registry for providing us multicenter data for our experiments.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Baseline characteristics of all applied features for all subjects (N = 13,104).

Variable	Descriptive Statistics
Variable	Value	All (N = 13,104)	STEMI (N = 6325)	NSTEMI (N = 6779)	p Value
Sex	Male	73.92 (9686)	77.71 (4915)	70.38 (4771)	<0.001 **
	Female	26.08 (3418)	22.29 (1410)	29.62 (2008)	<0.001 **
Age		63.96925 ± 12.64313	62.745 ± 12.8054	65.112 ± 12.3820	<0.001 **
Chest Pain	Typical	86.19 (11,294)	90.85 (5746)	81.84 (5548)	<0.001 **
Dyspnea	Yes	23.70 (3105)	20.41 (1291)	26.76 (1814)	<0.001 **
Prev. Chest Pain	Yes	25.57 (3351)	21.53 (1362)	29.34 (1989)	<0.001 **
First Medical Center	This Hosp	32.36 (4240)	28.16 (1781)	36.27 (2459)	<0.001 **
SBP		130.1542 ± 30.0512	125.45 ± 31.529	134.54 ± 27.903	<0.001 **
DBP		78.64339 ± 18.34411	76.62 ± 19.870	80.53 ± 16.580	<0.001 **
Heart Rate		78.6606 ± 19.59055	77.11 ± 20.545	80.11 ± 18.542	<0.001 **
Killip Class	III–IV	13.35 (1750)	15.68 (992)	11.18 (758)	<0.001 **
Height		164.2221 ± 11.22179	164.950 ± 12.1186	163.548 ± 10.2780	<0.001 **
Weight		65.28827 ± 12.1705	66.161 ± 12.0610	64.481 ± 12.2165	<0.001 **
AC		87.61039 ± 8.863507	87.786 ± 8.8728	87.475 ± 8.8557	0.244
ECG	Yes	99.78 (13,075)	99.86 (6316)	99.70 (6759)	0.145
Past Medical History	Yes	72.65 (9520)	69.06 (4368)	76.00 (5152)	<0.001 **
HTN	Yes	51.05 (6690)	46.97 (2971)	54.86 (3719)	<0.001 **
DM	Yes	28.63 (3752)	25.03 (1583)	32.00 (2169)	<0.001 **
DL	Yes	11.25 (1474)	10.59 (670)	11.86 (804)	<0.001 **
Prev. MI	Yes	7.85 (1029)	5.98 (378)	9.60 (651)	<0.001**
Prev. Angina	Yes	9.76 (1279)	6.48 (410)	12.82 (869)	<0.001 **
Prev. Heart Failure	Yes	1.63 (213)	0.89 (56)	2.32 (157)	<0.001 **
Prev. CVA	Infarction	5.10 (668)	4.05 (256)	6.08 (412)	<0.001 **
Smoking	Current	39.02 (5113)	44.08 (2788)	34.30 (2325)	<0.001 **
Family History	Yes	6.33 (830)	6.02 (381)	6.62 (449)	0.194
F_H_o_E_A_I_H_D	Yes	0.76 (100)	0.74 (47)	0.78 (53)	0.361
Menopause	Yes	8.04 (1054)	7.11 (450)	8.91 (604)	<0.001 **
Age of Menopause		48.20648 ± 7.616432	47.93 ± 7.755	48.43 ± 7.523	0.605
Hysterectomy History	Yes	0.67 (88)	0.46 (29)	0.87 (59)	<0.001 **
HRT	Yes	0.06 (8)	0.16 (10)	0.27 (18)	<0.05 *
MI Symptoms	Yes	97.88 (12,826)	98.50 (6230)	97.30 (6596)	<0.001 **
MI ECG	Yes	68.16 (8932)	94.25 (5961)	43.83 (2971)	<0.001 **
MI Imaging	Yes	28.06 (3677)	14.89 (942)	40.35 (2735)	<0.001 **
WBC		10.50721 ± 4.552223	11.529 ± 4.1662	9.553 ± 4.6885	<0.001 **
Neutrophil		66.48509 ± 15.18227	66.324 ± 16.2682	66.635 ± 14.0928	0.242
Lymphocyte		24.63949 ± 13.05661	25.337 ± 14.3136	23.989 ± 11.7264	<0.001 **
Hemoglobin		13.76865 ± 2.142532	14.127 ± 2.0190	13.435 ± 2.1999	<0.001 **
Platelets		232.3399 ± 67.78979	235.89 ± 67.208	229.02 ± 68.166	<0.001 **
Glucose		169.9549 ± 82.53465	179.21 ± 82.991	161.40 ± 81.184	<0.001 **
Creatinine		1.134202 ± 1.2009	1.047 ± 0.8031	1.215 ± 1.4741	<0.001 **
Max. Creatine Kinase		1019.191 ± 1955.354	1429.309 ± 2370.662	625.581 ± 1335.430	<0.001 **
Creatine Kinase MB		110.5776 ± 164.4305	166.2745 ± 189.54721	58.4153 ± 114.40263	<0.001 **
Troponin I		46.83087 ± 105.5838	75.4137 ± 138.28379	21.8091 ± 53.17410	<0.001 **
Troponin T		14.21992 ± 459.5798	4.8386 ± 12.85030	26.5103 ± 698.40787	0.306
Total Cholesterol		177.8673 ± 46.35988	180.84 ± 46.078	175.08 ± 46.454	<0.001 **
Triglyceride		134.5005 ± 120.0775	140.29 ± 125.358	128.98 ± 114.547	<0.001 **
HDL		42.83839 ± 12.51687	42.61 ± 12.289	43.05 ± 12.727	0.053
LDL		111.849 ± 40.58414	114.11 ± 40.616	109.72 ± 40.442	<0.001 **
hsCRP		1.554095 ± 6.201701	1.3933 ± 4.38070	1.7238 ± 7.66500	<0.05 *
NTproBNP		2795.943 ± 9980.085	1872.666 ± 9574.2647	3695.017 ± 10281.7662	<0.001 **
BNP		319.4587 ± 741.2533	222.417 ± 533.8414	431.961 ± 912.7885	<0.001 **
HbA1c		6.486941 ± 1.482016	6.469 ± 1.5192	6.505 ± 1.4450	0.262
ARU		460.0607 ± 73.9739	457.53 ± 75.363	462.30 ± 72.673	0.090
PRU		199.4301 ± 109.9491	181.66 ± 108.618	217.63 ± 108.344	<0.001 **
PCI	Performed	89.55 (11,735)	96.63 (6112)	82.95 (5623)	<0.001 **
Why not PCI	PCI not Indicated	6.34 (831)	1.91 (121)	10.47 (710)	<0.001 **
Puncture Route	Trnasfemoral	55.21 (7235)	71.38 (4515)	40.12 (2720)	<0.001 **
PCI Reason	pPCI in STEMI	44.44 (5824)	91.81 (5807)	0.25 (17)	<0.001 **
Distal Protection	No use	66.82 (8756)	61.01 (3859)	72.24 (4897)	<0.001 **
PCI Performed	Perform	89.70 (11,754)	96.76 (6120)	83.11 (5634)	<0.001 **
Target Vessel	LAD	41.79 (5476)	49.39 (3124)	34.70 (2352)	<0.001 **
Lesion Type	Type C	44.36 (5813)	52.17 (3300)	37.07 (2513)	<0.001 **
preTIMI
	TIMI 0	42.11 (5518)	61.75 (3906)	23.78 (1612)	<0.001 **
	TIMI I	9.81 (1286)	10.29 (651)	9.37 (635)	<0.001 **
	TIMI II	13.84 (1813)	10.34 (654)	17.10 (1159)	<0.001 **
	TIMI III	23.94 (3137)	14.37 (909)	13.41 (909)	<0.001 **
PCI Treatment	Stent	83.00 (10,876)	90.61 (5731)	75.90 (5145)	<0.001 **
PostTIMI
	TIMI 0	0.37 (49)	0.43 (27)	0.32 (22)	<0.001 **
	TIMI I	0.43 (57)	0.62 (39)	0.27 (18)	<0.001 **
	TIMI II	2.43 (319)	3.49 (221)	1.45 (98)	<0.001 **
	TIMI III	86.45 (11,329)	92.22 (5833)	81.07 (5496)	<0.001 **
PCI 2	Perform	89.61 (11,742)	96.63 (6112)	83.05 (5630)	<0.001 **
PCI Left Main	Yes	4.38 (574)	3.49 (221)	5.21(353)	<0.001 **
PCI LM Treatment	Stent	2.86 (375)	2.15 (136)	3.53 (239)	<0.001 **
Stents in LM	1–3	2.86 (375)	2.15 (136)	3.53 (239)	<0.001 **
PCI LAD	Yes	62.66 (8211)	69.23 (4379)	56.53 (3832)	<0.001 **
PCI LAD Treatment	Stent	47.86 (6272)	54.04 (3418)	42.10 (2854)	<0.001 **
Stents in LAD	1–4	48.56 (6363)	54.48 (3446)	43.03 (2917)	<0.001 **
PCI LCX	Yes	39.71 (5203)	34.47 (2180)	44.59 (3023)	<0.001 **
PCI LCX Treatment	Stent	20.86 (2733)	14.09 (891)	27.17 (1842)	<0.001 **
Stents in LCX	1–4	20.98 (2749)	14.17 (896)	27.33 (1853)	<0.001 **
PCI RCA	Yes	47.86 (6272)	53.09 (3358)	42.99 (2914)	<0.001 **
PCI RCA Treatment	Stent	33.33 (4368)	38.53 (2437)	28.49 (1931)	<0.001 **
Stents in RCA	1–4	33.34 (4369)	38.55 (2438)	28.49 (1931)	<0.001 **
PCI Result	Successful	88.37 (11,580)	95.07 (6013)	82.12 (5567)	<0.001 **
Revascularization Status	Total	61.53 (8063)	66.21 (4188)	57.16 (3875)	<0.001 **
Index PCI	multivessel	17.10 (2241)	12.32 (779)	21.57 (1462)	<0.001 **
Staged PCI	Stepwise PCI	8.44 (1106)	11.18 (707)	5.89 (399)	<0.001 **
QCA	Performed	80.17 (10,506)	77.57 (4906)	82.61 (5600)	<0.001 **
Initial Diagnosis
	STEMI	48.08 (6300)	98.78 (6248)	0.77 (52)	<0.001 **
	NSTEMI	51.91 (6802)	1.20 (76)	99.22 (6726)	<0.001 **
STEMI Treatment	pPCI	46.55 (6100)	95.72 (6054)	0.68 (46)	<0.001 **
NSTEMI Treatment	Early Invasive	43.89 (5752)	0.98 (62)	83.94 (5690)	<0.001 **
Thrombolysis	Yes	1.00 (131)	2.04 (129)	0.03 (2)	<0.001 **
CAG	Yes	98.41 (12895)	99.37 (6285)	97.51 (6610)	<0.001 **
2DE	ECG Perform	95.18 (12,472)	94.31 (5965)	95.99 (6507)	<0.001 **
LVEF		51.90744 ± 11.23909	50.12 ± 10.472	53.55 ± 11.660	<0.001 **
RWMI		1.4249 ± 0.388668	1.50 ± 0.373	1.36 ± 0.390	<0.001 **
MR Grade	I–IV	32.77 (4294)	29.03 (1836)	36.26 (2458)	<0.001 **
LVESD		35.02565 ± 8.382741	35.21 ± 8.140	34.86 ± 8.593	<0.05 *
LVEDD		49.70543 ± 6.636215	49.56 ± 6.423	49.84 ± 6.821	<0.05 *
LVESV		47.65738 ± 25.2402	48.35 ± 23.112	46.94 ± 27.238	<0.05 *
LVEDV		95.02556 ± 42.32248	95.27 ± 47.419	94.77 ± 36.358	0.627
Hosp. Complications	Yes	20.70 (2712)	25.61 (1620)	16.11 (1092)	<0.001 **
Cardiogenic Shock	Yes	9.00 (1180)	13.01 (823)	5.27 (357)	<0.001 **
New Heart Failure	Yes	4.49 (588)	4.22 (267)	4.74 (321)	<0.001 **
Recurrent Ischemia	Yes	0.89 (117)	1.09 (69)	0.71 (48)	<0.001 **
Reinfarction	Yes	0.37 (49)	0.57 (36)	0.19 (13)	<0.001 **
ST	Yes	0.33 (43)	0.52 (33)	0.15 (10)	<0.05 *
ST Type	Acute ST	0.18 (23)	0.27 (17)	0.09 (6)	<0.05 *
Cerebral Infarction	Yes	0.64 (84)	0.55 (35)	0.72 (49)	<0.001 **
Cerebral hrr	Yes	0.11 (15)	0.16 (10)	0.07 (5)	<0.001 **
ICH	Yes	0.11 (15)	0.16 (10)	0.07 (5)	<0.001 **
Hb decrease 5	Yes	1.15 (151)	1.23 (78)	1.08 (73)	<0.001 **
Hb decrease 15	Yes	1.24 (162)	1.44 (91)	1.05 (71)	<0.001 **
Minor Bleeding	Yes	3.08 (403)	3.19 (202)	2.97 (201)	<0.001 **
Atrioventricular Block	Yes	2.31 (303)	3.79 (240)	0.93 (63)	<0.001 **
VT Drug	Yes	1.30 (171)	1.96 (124)	0.69 (47)	<0.001 **
VT DC	Yes	2.68 (351)	4.27 (270)	1.19 (81)	<0.001 **
VF	Yes	1.95 (255)	3.10 (196)	0.87 (59)	<0.001 **
AF	Yes	3.41 (447)	3.81 (241)	3.04 (206)	<0.001 **
AKI	Yes	0.98 (129)	1.00 (63)	0.97 (66)	<0.001 **
Sepsis	Yes	0.60 (78)	0.58 (37)	0.60 (41)	<0.001 **
MOF	Yes	0.72 (94)	0.89 (56)	0.56 (38)	<0.001 **
TPM	Yes	4.04 (529)	6.67 (422)	1.58 (107)	<0.001 **
CPR	Yes	6.97 (914)	8.93 (565)	5.15 (349)	<0.001 **
IABP	Yes	3.24 (424)	5.00 (316)	1.59 (108)	<0.001 **
ECMO	Yes	1.13 (148)	1.72 (109)	0.58 (39)	<0.001 **
Defibrillation	Yes	4.17 (546)	6.45 (408)	2.04 (138)	<0.001**
PPM	Yes	0.12 (16)	0.13 (8)	0.12 (8)	<0.001 **
ICD	Yes	0.12 (16)	0.09 (6)	0.15 (10)	<0.001 **
CABG	Yes	1.40 (184)	0.57 (36)	2.18 (148)	<0.001 **

Note: SBP denotes Systolic Blood Pressure; DBP Diastolic Blood Pressure; AC Abdominal Circumference; ECG Electrocardiogram; HTN Hypertension; DM Diabetes Mellitus; DL Dyslipidemia; CVA Cerebrovascular Accident; F_H_o_E_A_I_H_D Family History of Early Age Ischemic Heart Disease; HRT Hormone Replacement Treatment; WBC White Blood Cells; HDL High Density Lipoprotein; LDL Low Density Lipoprotein; hsCRP High-sensitivity C-reactive protein; NTproBNP N-terminal prohormone of brain natriuretic peptide; BNP B-type natriuretic peptide; HbA1c Hemoglon A1c; ARU Aspirin Reaction Unit; PRU P2Y12 Reaction Unit; LM Left Main; LAD left anterior descending artery; LCX left circumflex artery; RCA Right Coronary Artery; QCA Qualitative Comparative Analysis; CAG Coronary Angiogram; 2DE 2D echocardiography; LVEF Left Ventricular Ejection Fraction; RWMI Regional Wall Motion Index; LVESD Left Ventricular End-Systolic Diameter; LVEDD Left Ventricular End-Diastolic Diameter; LVESV Left Ventricular End-Systolic Volume; LVEDV Left Ventricular End-Diastolic Volume; ICH Intracerebral Hemorrhage; Hb Hemoglobin; VT Ventricular Tachycardia; VF Ventricular Fibrillation; AF Atrial Fibrillation; AKI Acute Kidney Injury; MOF Multiple Organ Failure; TPM Temporary Pacemaker; CPR Cardiopulmonary Resuscitation; IABP Intra-Aortic Balloon Pump; ECMO Extracorporeal Membrane Oxygenation; Permanent Pacemaker; ICD Implantable Cardioverter-Defibrillator; CABG Coronary Artery Bypass Grafting. Note: The single asterisk (*) with p-value denotes that variables are statistically significant as p-value < 0.05 means that there are less than 5% chance of being wrong and double asterisk (**) with p-value denotes that variables are statistically high significant as p-value < 0.001 means that there are less than one in a thousand chance of being wrong in STEMI and NSTEMI groups.

References

Bhardwaj, R.; Nambiar, A.R.; Dutta, D. A study of machine learning in healthcare. In Proceedings of the 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), Turin, Italy, 4–8 July 2017; Volume 2, pp. 236–241. [Google Scholar]
Jafar, A.; Hameed, M.T.; Akram, N.; Waqas, U.; Kim, H.S.; Naqvi, R.A. CardioNet: Automatic Semantic Segmentation to Calculate the Cardiothoracic Ratio for Cardiomegaly and Other Chest Diseases. J. Pers. Med. 2022, 12, 988. [Google Scholar] [CrossRef] [PubMed]
Vollmer, S.; Mateen, B.A.; Bohner, G.; Király, F.J.; Ghani, R.; Jonsson, P.; Cumbers, S.; Jonas, A.; McAllister, K.S.; Myles, P.; et al. Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness. BMJ 2020, 368, 16927. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Naqvi, R.A.; Hussain, D.; Loh, W.K. Artificial intelligence-based semantic segmentation of ocular regions for biometrics and healthcare applications. CMC Comput. Mater. Contin. 2021, 66, 715–732. [Google Scholar] [CrossRef]
Abraham, W.T.; Fonarow, G.C.; Albert, N.M.; Stough, W.G.; Gheorghiade, M.; Greenberg, B.H.; OPTIMIZE-HF Investigators and Coordinators. Predictors of in-hospital mortality in patients hospitalized for heart failure: Insights from the Organized Program to Initiate Lifesaving Treatment in Hospitalized Patients with Heart Failure (OPTIMIZE-HF). J. Am. Coll. Cardiol. 2008, 52, 347–356. [Google Scholar] [CrossRef] [Green Version]
Sherazi, S.W.A.; Bae, J.W.; Lee, J.Y. A soft voting ensemble classifier for early prediction and diagnosis of occurrences of major adverse cardiovascular events for STEMI and NSTEMI during 2-year follow-up in patients with acute coronary syndrome. PLoS ONE 2021, 16, e0249338. [Google Scholar] [CrossRef]
Kannel, W.B.; Gordon, T. The Framingham Study: An Epidemiological Investigation of Cardiovascular Disease; US Department of Health, Education, and Welfare, National Institutes of Health: Washington, DC, USA, 1970. [Google Scholar]
Ferket, B.S.; van Kempen, B.J.; Hunink, M.M.; Agarwal, I.; Kavousi, M.; Franco, O.H.; Steyerberg, E.W.; Max, W.; Fleischmann, K.E. Predictive value of updating Framingham risk scores with novel risk markers in the US general population. PLoS ONE 2014, 9, e88312. [Google Scholar] [CrossRef] [Green Version]
D’Agostino, R.B., Sr.; Vasan, R.S.; Pencina, M.J.; Wolf, P.A.; Cobain, M.; Massaro, J.M.; Kannel, W.B. General cardiovascular risk profile for use in primary care: The Framingham Heart Study. Circulation 2008, 117, 743–753. [Google Scholar] [CrossRef] [Green Version]
Brindle, P.; Jonathan, E.; Lampe, F.; Walker, M.; Whincup, P.; Fahey, T.; Ebrahim, S. Predictive accuracy of the Framingham coronary risk score in British men: Prospective cohort study. BMJ 2003, 327, 1267. [Google Scholar] [CrossRef] [Green Version]
Hippisley-Cox, J.; Coupland, C.; Vinogradova, Y.; Robson, J.; Minhas, R.; Sheikh, A.; Brindle, P. Predicting cardiovascular risk in England and Wales: Prospective derivation and validation of QRISK2. BMJ 2008, 336, 1475–1482. [Google Scholar] [CrossRef] [Green Version]
Hippisley-Cox, J.; Coupland, C.; Brindle, P. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: Prospective cohort study. BMJ 2017, 357, j2099. [Google Scholar] [CrossRef]
Antman, E.M.; Cohen, M.; Bernink, P.J.L.M.; McCabe, C.H.; Horacek, T.; Papuchis, G.; Mautner, B.; Corbalan, R.; Radley, D.; Braunwald, E. The TIMI risk score for unstable angina/nonST elevation MI: A method for prognostication and therapeutic decision making. JAMA 2000, 284, 835–842. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Amin, S.T.; Morrow, D.A.; Braunwald, E.; Sloan, S.; Contant, C.; Murphy, S.; Antman, E.M. Dynamic TIMI risk score for STEMI. J. Am. Heart Assoc. 2013, 2, e003269. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Elbarouni, B.; Goodman, S.G.; Yan, R.T.; Welsh, R.C.; Kornder, J.M.; DeYoung, J.P.; Wong, G.C.; Rose, B.; Grondin, F.R.; Gallo, R.; et al. Validation of the Global Registry of Acute Coronary Event (GRACE) risk score for in-hospital mortality in patients with acute coronary syndrome in Canada. Am. Heart J. 2009, 158, 392–399. [Google Scholar] [CrossRef] [PubMed]
Huang, W.; FitzGerald, G.; Goldberg, R.J.; Gore, J.; McManus, R.H.; Awad, H.; Waring, M.E.; Allison, J.; Saczynski, J.S.; Kiefe, C.I.; et al. Performance of the GRACE Risk Score 2.0 simplified algorithm for predicting 1-year death after hospitalization for an acute coronary syndrome in a contemporary multiracial cohort. Am. J. Cardiol. 2016, 118, 1105–1110. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Riley, R.F.; Miller, C.D.; Russell, G.B.; Harper, E.N.; Hiestand, B.C.; Hoekstra, J.W.; Lefebvre, C.W.; Nicks, B.A.; Cline, D.M.; Askew, K.L.; et al. Cost analysis of the History, ECG, Age, Risk factors, and initial Troponin (HEART) Pathway randomized control trial. Am. J. Emerg. Med. 2017, 35, 77–81. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bhattacharya, P.T.; Golamari, R.R.; Vunnam, S.; Moparthi, S.; Venkatappa, N.; Dollard, D.J.; Missri, J.; Yang, W.; Kimmel, S.E. Predictive risk stratification using HEART (history, electrocardiogram, age, risk factors, and initial troponin) and TIMI (thrombolysis in myocardial infarction) scores in non-high risk chest pain patients: An African American urban community based hospital study. Medicine 2019, 98, e16370. [Google Scholar]
Poldervaart, J.M.; Langedijk, M.; Backus, B.E.; Dekker, I.M.C.; Six, A.J.; Doevendans, P.A.; Hoes, A.W.; Reitsma, J.B. Comparison of the GRACE, HEART and TIMI score to predict major adverse cardiac events in chest pain patients at the emergency department. Int. J. Cardiol. 2017, 227, 656–661. [Google Scholar] [CrossRef] [Green Version]
Lakshmanarao, A.; Swathi, Y.; Sundareswar, P.S.S. Machine learning techniques for heart disease prediction. Forest 2019, 95, 97. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Khalilia, M.; Chakraborty, S.; Popescu, M. Predicting disease risks from highly imbalanced data using random forest. BMC Med. Inform. Decis. Mak. 2011, 11, 51. [Google Scholar] [CrossRef] [Green Version]
Singh, Y.K.; Sinha, N.; Singh, S.K. Heart disease prediction system using random forest. In International Conference on Advances in Computing and Data Sciences; Springer: Singapore, 2016; pp. 613–623. [Google Scholar]
Shafique, R.; Mehmood, A.; Choi, G.S. Cardiovascular disease prediction system using extra trees classifier. Res. Sq. 2019, 11, 51. [Google Scholar]
Subha, V.; Revathi, M.; Murugan, D. Comparative analysis of support vector machine ensembles for heart disease prediction. Int. J. Comp. Sci. Comm. Netw. 2015, 5, 386–390. [Google Scholar]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Statist. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot. 2013, 7, 21. [Google Scholar] [CrossRef]
Jiang, S.; Chin, K.S.; Tsui, K.L. A universal deep learning approach for modeling the flow of patients under different severities. Comput. Methods Programs Biomed. 2018, 154, 191–203. [Google Scholar] [CrossRef] [PubMed]
Sherazi, S.W.A.; Jeong, Y.J.; Jae, M.H.; Bae, J.-W.; Lee, J.Y. A machine learning– based 1-year mortality prediction model after hospital discharge for clinical patients with acute coronary syndrome. Health Inform. J. 2020, 26, 1289–1304. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Mokashi, A.R.; Tambe, M.N.; Walke, P.T. Heart disease prediction using ANN and improved KMeans. Int. J. Innov. Res. Elect. Electr. Instrum. Contr. Eng. 2016, 4, 221–224. [Google Scholar]
Latha, C.B.C.; Jeeva, S.C. Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform. Med. Unlocked 2019, 16, 100203. [Google Scholar] [CrossRef]
Goldstein, B.A.; Navar, A.M.; Carter, R.E. Moving beyond regression techniques in cardiovascular risk prediction: Applying machine learning to address analytic challenges. Eur. Heart J. 2017, 38, 1805–1814. [Google Scholar] [CrossRef] [Green Version]
Stiglic, G.; Kocbek, P.; Fijacko, N.; Zitnik, M.; Verbert, K.; Cilar, L. Interpretability of machine learning-based prediction models in healthcare. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, e1379. [Google Scholar] [CrossRef]
Aldahiri, A.; Alrashed, B.; Hussain, W. Trends in Using IoT with Machine Learning in Health Prediction System. Forecasting 2021, 3, 181–206. [Google Scholar] [CrossRef]
Korea Acute Myocardial Infarction Registry. Available online: http://kamir5.kamir.or.kr/ (accessed on 1 March 2021).
Peat, J.; Barton, B. Medical Statistics: A Guide to Data Analysis and Critical Appraisal; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
Potdar, K.; Pardawala, T.S.; Pai, C.D. A comparative study of categorical variable encoding techniques for neural network classifiers. Int. J. Comput. Appl. 2017, 175, 7–9. [Google Scholar] [CrossRef]
Hancock, J.T.; Khoshgoftaar, T.M. Survey on categorical data for neural networks. J. Big Data 2020, 7, 1–41. [Google Scholar] [CrossRef] [Green Version]
Zheng, H.; Sherazi, S.W.A.; Lee, J.Y. A Stacking Ensemble Prediction Model for the Occurrences of Major Adverse Cardiovascular Events in Patients with Acute Coronary Syndrome on Imbalanced Data. IEEE Access 2021, 9, 113692–113704. [Google Scholar] [CrossRef]
Haixiang, G.; Yijing, L.; Shang, J.; Mingyun, G.; Yuanyue, H.; Bing, G. Learning from classimbalanced data: Review of methods and applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
Wang, Q. A hybrid sampling SVM approach to imbalanced data classification. Abstr. Appl. Anal. 2014, 2014, 972786. [Google Scholar] [CrossRef]
Batista, G.E.; Bazzan, A.L.; Monard, M.C. Balancing Training Data for Automated Annotation of Keywords: A Case Study. WOB 2003, 10–18. [Google Scholar]
PASW Statistics. Available online: http://www.spss.com.hk/statistics/ (accessed on 1 March 2021).
Office 365 ProPlus. Available online: https://products.office.com/en-us/business/office-365-proplus-product (accessed on 1 March 2021).
Jupyter.org. Project Jupyter. 2019. Available online: http://jupyter.org/ (accessed on 1 March 2021).
Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Harris, C.R.; Millman, K.J.; Van Der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef]
McKinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010. [Google Scholar]
Lemaître, G.; Nogueira, F.; Aridas, C.K. Imbalanced-Learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. J. Mach. Learn. Res. 2017, 18, 559–563. [Google Scholar]

Figure 1. Experimental data extraction from the original KAMIR-NIH dataset.

Figure 2. Overall workflow of proposed predictive modeling system.

Figure 3. The soft-voting ensemble classifier for predictive modeling of ACS outcomes and mortality of patients.

Figure 4. Confusion matrix for all class labels using proposed soft-voting ensemble classifier on KAMIR-NIH in-hospital data (a) actual confusion matrix; (b) normalized confusion matrix.

Table 1. Confusion matrix for predictive modeling of target variables.

	Predicted Value (Predicted by the test)
Actual Value (Confirmed by Experience)		Positives	Negatives
	Positives	TP (True Positive)	FN (False Negative)
	Negatives	FP (False Positive)	TN (True Negative)

Table 2. Baseline characteristics of important features for all subjects (N = 13,104).

Variable	Descriptive Statistics
Variable	Value	All (N = 13,104)	STEMI (N = 6325)	NSTEMI (N = 6779)	p Value
Sex	Male	73.92 (9686)	77.71 (4915)	70.38 (4771)	<0.001 **
Sex	Female	26.08 (3418)	22.29 (1410)	29.62 (2008)	<0.001 **
Age	Number	63.96925 ± 12.64313	62.745 ± 12.8054	65.112 ± 12.3820	<0.001 **
Chest Pain	Typical	86.19 (11,294)	90.85 (5746)	81.84 (5548)	<0.001 **
Systolic Blood Pressure	Number	130.1542 ± 30.0512	125.45 ± 31.529	134.54 ± 27.903	<0.001 **
Diastolic Blood Pressure	Number	78.64339 ± 18.34411	76.62 ± 19.870	80.53 ± 16.580	<0.001 **
Heart Rate	Number	78.6606 ± 19.59055	77.11 ± 20.545	80.11 ± 18.542	<0.001 **
Killip Class	III–IV	13.35 (1750)	15.68 (992)	11.18 (758)	<0.001 **
Height	Number	164.2221 ± 11.22179	164.950 ± 12.1186	163.548 ± 10.2780	<0.001 **
Weight	Number	65.28827 ± 12.1705	66.161 ± 12.0610	64.481 ± 12.2165	<0.001 **
Prev. MI	Yes	7.85 (1029)	5.98 (378)	9.60 (651)	<0.001 **
Prev. Angina	Yes	9.76 (1279)	6.48 (410)	12.82 (869)	<0.001 **
Prev. Heart Failure	Yes	1.63 (213)	0.89 (56)	2.32 (157)	<0.001 **
Prev. CVA	Infarction	5.10 (668)	4.05 (256)	6.08 (412)	<0.001 **
Smoking	Current	39.02 (5113)	44.08 (2788)	34.30 (2325)	<0.001 **
White Blood Cells	Number	10.50721 ± 4.552223	11.529 ± 4.1662	9.553 ± 4.6885	<0.001 **
Neutrophil	Number	66.48509 ± 15.18227	66.324 ± 16.2682	66.635 ± 14.0928	0.242
Lymphocyte	Number	24.63949 ± 13.05661	25.337 ± 14.3136	23.989 ± 11.7264	<0.001 **
Hemoglobin	Number	13.76865 ± 2.142532	14.127 ± 2.0190	13.435 ± 2.1999	<0.001 **
Platelets	Number	232.3399 ± 67.78979	235.89 ± 67.208	229.02 ± 68.166	<0.001 **
Glucose	Number	169.9549 ± 82.53465	179.21 ± 82.991	161.40 ± 81.184	<0.001 **
creatinine	Number	1.134202 ± 1.2009	1.047 ± 0.8031	1.215 ± 1.4741	<0.001 **
Max. Creatine Kinase	Number	1019.191 ± 1955.354	1429.309 ± 2370.662	625.581 ± 1335.430	<0.001 **
Creatine Kinase MB	Number	110.5776 ± 164.4305	166.2745 ± 189.54721	58.4153 ± 114.40263	<0.001 **
Troponin I	Number	46.83087 ± 105.5838	75.4137 ± 138.28379	21.8091 ± 53.17410	<0.001 **
Troponin T	Number	14.21992 ± 459.5798	4.8386 ± 12.85030	26.5103 ± 698.40787	0.306
Total Cholesterol	Number	177.8673 ± 46.35988	180.84 ± 46.078	175.08 ± 46.454	<0.001 **
High Density Lipoprotein	Number	42.83839 ± 12.51687	42.61 ± 12.289	43.05 ± 12.727	0.053
Low Density Lipoprotein	Number	111.849 ± 40.58414	114.11 ± 40.616	109.72 ± 40.442	<0.001 **
Triglyceride	Number	134.5005 ± 120.0775	140.29 ± 125.358	128.98 ± 114.547	<0.001 **
hsCRP	Number	1.554095 ± 6.201701	1.3933 ± 4.38070	1.7238 ± 7.66500	<0.05 *
NTproBNP	Number	2795.943 ± 9980.085	1872.666 ± 9574.2647	3695.017 ± 10,281.7662	<0.001 **
BNP	Number	319.4587 ± 741.2533	222.417 ± 533.8414	431.961 ± 912.7885	<0.001 **
HbA1c	Number	6.486941 ± 1.482016	6.469 ± 1.5192	6.505 ± 1.4450	0.262
ARU	Number	460.0607 ± 73.9739	457.53 ± 75.363	462.30 ± 72.673	0.090
PRU	Number	199.4301 ± 109.9491	181.66 ± 108.618	217.63 ± 108.344	<0.001 **
PreTIMI	TIMI 0	42.11 (5518)	61.75 (3906)	23.78 (1612)	<0.001 **
	TIMI I	9.81 (1286)	10.29 (651)	9.37 (635)	<0.001 **
	TIMI II	13.84 (1813)	10.34 (654)	17.10 (1159)	<0.001 **
	TIMI III	23.94 (3137)	14.37 (909)	13.41 (909)	<0.001 **
PostTIMI	TIMI 0	0.37 (49)	0.43 (27)	0.32 (22)	<0.001 **
	TIMI I	0.43 (57)	0.62 (39)	0.27 (18)	<0.001 **
	TIMI II	2.43 (319)	3.49 (221)	1.45 (98)	<0.001 **
	TIMI III	86.45 (11,329)	92.22 (5833)	81.07 (5496)	<0.001 **
Initial Diagnosis	STEMI	48.08 (6300)	98.78 (6248)	0.77 (52)	<0.001 **
Initial Diagnosis	NSTEMI	51.91 (6802)	1.20 (76)	99.22 (6726)	<0.001 **
ST Type	Acute ST	0.18 (23)	0.27 (17)	0.09 (6)	<0.05 *

Note: hsCRP denotes high-sensitivity C-reactive protein; NTproBNP N-terminal prohormone of brain natriuretic peptide; BNP B-type natriuretic peptide; HbA1c Hemoglon A1c; ARU Aspirin Reaction Unit; PRU P2Y12 Reaction Unit. Note: The single asterisk (*) with p-value denotes that variables are statistically significant as a p-value < 0.05 means that there is a less than 5% chance of being wrong and a double asterisk (**) with a p-value denotes that variables are statistically highly significant as a p-value < 0.001 means that there is less than one in a thousand chance of being wrong in STEMI and NSTEMI groups.

Table 3. Performance measures of all class labels for the Random Forest Classifier using in-hospital data.

Class Labels for Prediction	Precision	Recall	F1-Score	AUC
STEMI CD	0.9862	0.9928	0.9895	0.9999
STEMI NCD	1.0000	0.9994	0.9997	1.0000
STEMI Hopeless Discharge	1.0000	0.9994	0.9997	1.0000
STEMI Recovery to Home	0.9778	0.9576	0.9676	0.9974
STEMI Recovery to other Hospital	0.9818	0.9890	0.9853	0.9999
NSTEMI CD	0.9852	0.9948	0.9900	0.9999
NSTEMI NCD	1.0000	0.9989	0.9994	1.0000
NSTEMI Hopeless Discharge	1.0000	0.9994	0.9997	1.0000
NSTEMI Recovery to Home	0.9670	0.9743	0.9706	0.9981
NSTEMI Recovery to other Hospital	0.9930	0.9851	0.9890	0.9999

Table 4. Performance measures of all class labels for Extra Tree Classifier using in-hospital data.

Class Labels for Prediction	Precision	Recall	F1-Score	AUC
STEMI CD	0.9860	0.9956	0.9908	0.9999
STEMI NCD	1.0000	0.9963	0.9981	1.0000
STEMI Hopeless Discharge	0.9994	0.9979	0.9986	1.0000
STEMI Recovery to Home	0.9724	0.9567	0.9645	0.9983
STEMI Recovery to other Hospital	0.9789	0.9839	0.9814	0.9998
NSTEMI CD	0.9930	0.9962	0.9946	0.9999
NSTEMI NCD	0.9994	0.9994	0.9994	1.0000
NSTEMI Hopeless Discharge	1.0000	0.9989	0.9994	0.9999
NSTEMI Recovery to Home	0.9709	0.9874	0.9791	0.9992
NSTEMI Recovery to other Hospital	0.9994	0.9867	0.9930	0.9998

Table 5. Performance measures of all class labels for the Gradient Boosting Machine Classifier using in-hospital data.

Class Labels for Prediction	Precision	Recall	F1-Score	AUC
STEMI CD	0.7166	0.7166	0.7166	0.9645
STEMI NCD	0.8263	0.8054	0.8157	0.9881
STEMI Hopeless Discharge	0.7815	0.7150	0.7467	0.9790
STEMI Recovery to Home	0.8504	0.8649	0.8576	0.9853
STEMI Recovery to other Hospital	0.7039	0.7544	0.7283	0.9645
NSTEMI CD	0.7709	0.7442	0.7573	0.9734
NSTEMI NCD	0.8033	0.8922	0.8454	0.9886
NSTEMI Hopeless Discharge	0.8394	0.8045	0.8215	0.9882
NSTEMI Recovery to Home	0.8731	0.8575	0.8653	0.9869
NSTEMI Recovery to other Hospital	0.6996	0.7025	0.7010	0.9525

Table 6. Performance measures of all class labels for the proposed Soft Voting Ensemble Classifier using in-hospital data.

Class Labels for Prediction	Precision	Recall	F1-Score	AUC
STEMI CD	0.9882	0.9948	0.9915	0.9999
STEMI NCD	1.0000	0.9984	0.9992	1.0000
STEMI Hopeless Discharge	1.0000	1.0000	1.0000	1.0000
STEMI Recovery to Home	0.9799	0.9596	0.9696	0.9977
STEMI Recovery to other Hospital	0.9792	0.9860	0.9826	0.9998
NSTEMI CD	0.9879	0.9989	0.9933	0.9999
NSTEMI NCD	0.9989	0.9994	0.9992	1.0000
NSTEMI Hopeless Discharge	1.0000	0.9979	0.9989	1.0000
NSTEMI Recovery to Home	0.9754	0.9826	0.9790	0.9989
NSTEMI Recovery to other Hospital	0.9974	0.9904	0.9939	0.9998

Table 7. Overall performance evaluation for all applied predictive models during in-hospital stays using KAMIR-NIH in-hospital data.

Algorithms	Accuracy	AUC	Precision	Recall	F1-Score
Random Forest	98.9172	99.9650	98.9182	98.9172	98.9159
Extra Tree	98.9796	99.9828	98.9828	98.9797	98.9791
Gradient Boosting Machine	78.6517	97.7566	78.7202	78.6517	78.6302
Proposed SVE Classifier	99.0733	99.9702	99.0742	99.0734	99.9719

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sherazi, S.W.A.; Zheng, H.; Lee, J.Y. A Machine Learning-Based Applied Prediction Model for Identification of Acute Coronary Syndrome (ACS) Outcomes and Mortality in Patients during the Hospital Stay. Sensors 2023, 23, 1351. https://doi.org/10.3390/s23031351

AMA Style

Sherazi SWA, Zheng H, Lee JY. A Machine Learning-Based Applied Prediction Model for Identification of Acute Coronary Syndrome (ACS) Outcomes and Mortality in Patients during the Hospital Stay. Sensors. 2023; 23(3):1351. https://doi.org/10.3390/s23031351

Chicago/Turabian Style

Sherazi, Syed Waseem Abbas, Huilin Zheng, and Jong Yun Lee. 2023. "A Machine Learning-Based Applied Prediction Model for Identification of Acute Coronary Syndrome (ACS) Outcomes and Mortality in Patients during the Hospital Stay" Sensors 23, no. 3: 1351. https://doi.org/10.3390/s23031351

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Machine Learning-Based Applied Prediction Model for Identification of Acute Coronary Syndrome (ACS) Outcomes and Mortality in Patients during the Hospital Stay

Abstract

1. Introduction

1.1. Research Motivation

1.2. Research Objectives

2. Materials and Methods

2.1. Data Source

2.2. Data Extraction and Data Preprocessing

2.3. Data Sampling

2.4. Architecture of Proposed Predictive Modeling System

2.5. Applied Machine Learning Algorithms

2.5.1. Random Forest

2.5.2. Extra Tree

2.5.3. Gradient Boosting Machine

2.6. Design of a Machine Learning-Based Soft-Voting Ensemble Classifier for Predictive Modeling

2.7. Statistical Analysis

2.8. Evaluation Method and Performance Measures

2.9. Implementation Environments

3. Results

3.1. Baseline Characteristics

3.2. Results of Performance Measures for Applied Machine Learning-Based Predictive Models

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI