Next Article in Journal
Characterisation and Risk Assessment of Metal Contaminants in the Dust Fall in the Vicinity of a Construction Waste Dump in Beijing
Previous Article in Journal
Time to End-of-Life of Patients Starting Specialised Palliative Care in Denmark: A Descriptive Register-Based Cohort Study
Previous Article in Special Issue
A Survey on Methodological Issues of Clinical Research Studies Reviewed by Independent Ethic Committees during the COVID-19 Pandemic in Italy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Application of a Decision Tree Model to Predict the Outcome of Non-Intensive Inpatients Hospitalized for COVID-19

by
Massimo Giotta
1,
Paolo Trerotoli
2,*,
Vincenzo Ostilio Palmieri
3,
Francesca Passerini
3,
Piero Portincasa
3,
Ilaria Dargenio
1,
Jihad Mokhtari
4,
Maria Teresa Montagna
2 and
Danila De Vito
4
1
School of Specialization in Medical Statistics and Biometry, School of Medicine, University of Bari Aldo Moro, 70121 Bari, Italy
2
Department of Interdisciplinary Medicine, University of Bari Aldo Moro, 70121 Bari, Italy
3
Department of Biomedical Science and Human Oncology, University of Bari Aldo Moro, 70121 Bari, Italy
4
Department of Basic Medical Sciences, Neurosciences, and Sense Organs, Medical School, University of Bari Aldo Moro, 70121 Bari, Italy
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2022, 19(20), 13016; https://doi.org/10.3390/ijerph192013016
Submission received: 30 June 2022 / Revised: 30 September 2022 / Accepted: 3 October 2022 / Published: 11 October 2022
(This article belongs to the Special Issue Data and Methods for Monitoring and Decisions in Public Health)

Abstract

:
Many studies have identified predictors of outcomes for inpatients with coronavirus disease 2019 (COVID-19), especially in intensive care units. However, most retrospective studies applied regression methods to evaluate the risk of death or worsening health. Recently, new studies have based their conclusions on retrospective studies by applying machine learning methods. This study applied a machine learning method based on decision tree methods to define predictors of outcomes in an internal medicine unit with a prospective study design. The main result was that the first variable to evaluate prediction was the international normalized ratio, a measure related to prothrombin time, followed by immunoglobulin M response. The model allowed the threshold determination for each continuous blood or haematological parameter and drew a path toward the outcome. The model’s performance (accuracy, 75.93%; sensitivity, 99.61%; and specificity, 23.43%) was validated with a k-fold repeated cross-validation. The results suggest that a machine learning approach could help clinicians to obtain information that could be useful as an alert for disease progression in patients with COVID-19. Further research should explore the acceptability of these results to physicians in current practice and analyze the impact of machine learning-guided decisions on patient outcomes.

1. Introduction

A new form of pneumonia spread in Wuhan, Hubei Province of the People’s Republic of China, beginning in December 2019, from an unidentified microbiological agent [1] later identified as a novel coronavirus called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [2], and the resulting disease was named coronavirus disease 2019 (COVID-19). Since the end of 2019, the outbreak of COVID-19 has spread worldwide. As of 1 January 2021, there had been 2,123,776 confirmed cases and 74,261 deaths.
The initial symptoms of SARS-CoV-2 infection are similar to those of influenza but vary from person to person; they can be asymptomatic, paucisymptomatic, or symptomatic. The symptoms are fever, tiredness, anorexia, headache, diarrhea, sore throat, mild dyspnea, malaise, blocked nose, nausea, and vomiting [3]. People with pre-existing comorbidities (chronic obstructive pulmonary disease (COPD), hypertension, obesity, diabetes, heart disease, liver disease, acquired immunodeficiency syndrome (AIDS), renal disease, and cancer) have an increased risk of death or more critical COVID-19 [4,5]. An increase in the incidence and prevalence of COVID-19 has led to an increase in hospitalizations worldwide, and the most severe cases are characterized by acute respiratory syndrome, requiring hospitalization in intensive care units (ICU) [6,7].
ICU resources in hospitals have been stressed, especially if they lack adequate facilities, staff [8], and global health services for related and unrelated COVID-19 diseases [9]. The mortality of patients with severe SARS-CoV-2 pneumonia has been and is still considerable, especially in patients older than 65 years and with relevant comorbidities [6,8].
The need to create a model that would allow the detection of the main characteristics of severe disease and identify features that could predict the outcome of COVID-19 to better manage patients’ health as well as economic resources was soon felt [10].
Therefore, many studies have been conducted worldwide on patients with SARS-CoV-2 that consider patients’ demographics, clinical symptoms and signs, and blood chemistry data to predict the outcomes of death or ICU admission [10,11]. Different study designs were used to evaluate predictors, as retrospective studies mainly focused on immunological features [12] or clinical and laboratory values. Descriptive studies were also proposed in the initial phase of the pandemic to define signs and symptoms in patients [13].
A meta-analysis published in 2020 summarized the main biomarkers to monitor patients with COVID-19 [14]. Hematological biomarkers included white blood cells, neutrophils count, lymphocytes count, monocytes count, eosinophils count, platelet count, cluster of differentiation (CD)4, CD8 percentages, and hemoglobin. Biochemical markers were albumin, alanine aminotransferase, aspartate aminotransferase, total bilirubin, creatinine, creatinine kinase, lactate dehydrogenase (LDH), cardiac troponin I, myoglobin, and creatine kinase-MB. The coagulation markers were prothrombin time, activated partial thromboplastin time (APTT), and D-dimer. The inflammatory biomarkers were C-reactive protein (CRP), serum ferritin, procalcitonin (PCT), erythrocyte sedimentation rate, and interleukin and tumor necrosis factor-alpha (TNFα) levels.
Another retrospective study proposed a clinical risk score for critically ill patients with COVID-19, which was validated by including clinical symptoms, signs, and laboratory tests [15]. Furthermore, the results of the studies have been valuable for blood chemistry tests; white blood cells, neutrophils, lymphocytes, monocytes, eosinophils, platelets, CRP, D-dimer, and PCT were analyzed [16,17].
The main objective of this study was to evaluate, in a prospective cohort of COVID-19 patients hospitalized in a non-ICU medicine ward, the application of a decision tree to predict bad outcomes (death or transfer to an intensive care unit) and the relationships between main demographic, clinical, blood chemistry, and immunological features. The method chosen to determine valuable predictors is a machine learning approach based on a decisional tree. This assesses an algorithm and finds a useful threshold, when appropriate for continuous variables, that could guide clinicians in weighting the value of each feature in the evaluation of the clinical course.

2. Materials and Methods

2.1. Participants and Procedures

A prospective observational study was performed during the pandemic period from 2 January to 30 April 2021, including 146 consecutive not-vaccinated patients admitted to a dedicated internal medicine COVID unit (COVID-MI) in the large regional hospital Policlinico of Bari, Apulia. One patient was removed from the analysis because of missing data both on outcome and predictor variables; therefore, a total of 145 patients were included in the final analysis. Patients arrived from the emergency room after a confirmed diagnosis of infection with SARS-CoV2 by a positive rinopharyngeal swab. Data were collected upon admission and after 10 days. The outcome was registered from clinical documentation and was defined as the combined endpoint of death or transfer to ICU, whichever was first.
The main clinical symptoms, anamnestic conditions, blood exams, and immunological panels were taken: blood pressure, respiratory frequency, cardiac frequency, temperature, O2 saturation, red blood cell count, hemoglobin, neutrophil count, lymphocyte count, platelet count, serum C-reactive protein, procalcitonin, lactic dehydrogenase (LDH), albumin, aspartate aminotransferase (AST), alanine aminotransferase (ALT), bilirubin, alkaline phosphatase (ALP), creatine kinase (CPK), serum sodium (Na), serum potassium (K), serum chlorine (Cl), D-dimers, international normalized ratio (INR), interleukin-6 (IL-6), immunoglobulin M, and G (IgM and IgG) against SARS-CoV-2.
Temperature was detected by a clinical ecologic axillary thermometer. Saturation and other clinical parameters were detected by a modular multiparameter monitor (Life Scope VS, Nihon Kohden, Japan). This study was approved by the Interregional Ethical Committee of Azienda Ospedaliero Universitaria Consorziale Policlinico.

2.2. Statistical Analysis

Quantitative data are presented as median and interquartile range (IQR) because after the evaluation of normal distribution by the Shapiro–Wilk test, data were found to be not normally distributed. Comparison between independent groups was performed using the Mann–Whitney test.
Categorical variables are summarized as counts and percentages, and comparisons between independent groups were performed by the chi-square test or Fisher exact test as appropriate.
Statistical significance was set at p-value < 0.05.
To identify variables that could be valuable predictors of the outcome, a machine learning (ML) method based on the Quinlan boosting C5.0 algorithm that uses a confidence factor (CF) set at 0.25 to assign an observation to a class was applied.
Other hyperparameters (winnowing, boosting, iteration) were chosen by the application of the “one standard error rule” that consists of selecting variables that define a model within one standard error of the minimum cross-validation error.
The variables used in the model for the training set were as follows: sex (male or female); presence of symptoms of infection (nasal congestion, headache, tussis, pharyngodynia, dyspnea, fever, and myalgia); presence of chronic pulmonary disease, mainly COPD; diabetes; hypertension; cardiovascular diseases; cerebrovascular diseases; hepatitis B and C; tumors; chronic kidney diseases; and immunopathological diseases. Continuous variables included age, cardiac frequency, oxygen saturation, blood pressure, fraction of inspired oxygen (FiO2), temperature, neutrophils, lymphocytes, platelet, hemoglobin, C-reactive protein (CRP), LDH, albumin, AST, ALT, ALP, bilirubin, creatinine, CPK, Na, K, Cl, D-dimers, INR, and IgM and IgG antibodies against SARS-CoV-2. Data from the second blood collection were analyzed separately using the same method.
To evaluate the accuracy, a k-fold cross validation method with repetition was applied. The set was split in subsets as defined by the parameter “k”, and the procedure was then repeated. We chose the default values k = 10 and repetition = 10.
The measures of accuracy were based on the area under the ROC curve (AUC) and on the indicators suggested by Iwendi [18] and Bottino [19], determining sensitivity (true positive divided all events), specificity (true negative divided all non-events), positive predictive values (true positive divided all predicted as events), negative predictive value (true negative divided all predicted as non-events), global accuracy (the sum of true positive and true negative divided the whole sample), F1-score (the ratio of twice the product between precision and recall divided the sum of precision and recall), Matthew correlation coefficient (MCC), and balanced accuracy (the mean of sensitivity and specificity). Collinearity was evaluated on variables entered in the model determining the VIF (variance inflation factor) and tolerance, using as a threshold VIF > 5 and tolerance < 0.25.

2.3. Software

The model was run in R-Studio version 1.4.1106 [20] using, for data arrangement and description of the output used, the following packages: C5.0 [21], caret [22] and pROC [23]. Data management and descriptive statistics were performed using SAS/STAT version 9.4 for PC (SAS Institute, Cary, NC, USA).

3. Results

The median age of the patients was 71 years (IQR 58–82), and 54.8% were male. The main characteristics of the patients are summarized in Table 1 and Table 2. The outcome, as a combined endpoint of death and transfer to the ICU, was observed in 22.1% (32/145) of patients, and death occurred in 56.3% (18/32) with a median LOS (length of stay) of 13.5 days (IQR 9-18). Transfer to the ICU occurred in 43.7% (14/32), with a median LOS of 5.5 (IQR 3.25-12). There were 113 patients discharged alive with a median LOS of 9 days (IQR 5-16) (Figure 1). There were statistically significant differences in the comparison between dead vs. ICU transferred patients (p = 0.006).
The comparison of blood analysis and other main clinical characteristics measured after the admission in the COVID-MI ward between patients discharged alive and those with negative outcomes (deaths or transferred in ICU) are shown in Table 2.
The variables used in the model are shown in Figure 2. The most important predictors in the model’s training were the IgG and IgM values that reached 100% attribute usage. Age was a predictor of 100% attribute usage. Sex, CPK, CRP, platelet count, LDH, K, NA, INR, D-dimers, ALT, AST, creatinine, hemoglobin, and neutrophil and lymphocyte counts had an attribute usage of 90–99%.
To predict the outcome (Figure 3), the decision tree started with the value of INR, and a cut-off was defined as equal to or lower than 1.11 and greater than 1.11. Thus, we had two branches to classify patients: the first uses lymphocytes, IgM, and oxygen saturation; the second uses IgG, IgM, CRP, and creatinine.
The outcome D-ICU for patients with INR ≤ 1.11 was predicted after three consecutive steps: lymphocyte ≤ 16.8 × 103, IgM ≤ 0.08 AU/mL, and oxygen saturation ≤ 97%. The prediction for patients with INR > 1.11 was achieved after four steps with IgG ≤ 4.43 AU/mL, IgM ≤ 0.04 AU/mL, CRP ≤ 115 mg%mL and creatinine ≤ 1.17 mg%mL. The error on the training set for all previous classifications was 0%, indicating that the decision tree correctly classified all patients who died of ICU transfer.
The variables entered in the model should not be affected by collinearity; they have all shown a VIF < 5 (the higher was 1.28 for IgG) and tolerance >0.25 (the lower was 0.78 for IgG).
The fitting of the model was evaluated on the validation set which showed a total accuracy of 75.93%, balanced accuracy of 61.52%, and sensitivity and specificity of 99.61% and 23.43%, respectively (Table 3). The value of the F1-score to evaluate the fitting of the model was 89.17, and the MCC was 17.94%; both were low as the area under the ROC curve (Figure 4) was 0.61.
The follow-up values of biochemical and hematological variables were available for only 28 patients, but neither the difference between the first and second data collection nor the value of the second data collection allowed a model that could predict the outcome.

4. Discussion

This prospective study, conducted during the pandemic wave in a COVID-19 medicine ward, has allowed us to identify predictors of outcome using a machine learning algorithm. The decision tree model showed the sequence and threshold for each variable to predict unfavorable outcomes.
Various artificial intelligence algorithms have been used to predict death or hospitalization in the ICU following COVID-19 [24,25]. Among the various machine learning models, the most widely used for classification are XGBoost, linear regression, support vector machine, decision tree, random forest, and neural network convolution. For an easier understanding and representation of our study, we used a C5.0 decision tree model as an algorithm. This algorithm is one of the most commonly used approaches for representing recursive classifiers [26]. The choice of method has a consequence on the accuracy. The study by De Souza [25] compared the accuracy among models, and the decision tree had the same results with respect to XGBoost, naive Bayes, and support vector machine. Interestingly, machine learning based on logistic regression is more accurate. The decision tree algorithm split sets of data recursively, considering the symptoms; however, many other parameters predict SARS-CoV-2 infection until the procedure reaches its maximum depth [25,27].
The values of the indicators to evaluate the validity of the model have appeared as promising, given the 99.61% sensitivity and the 75.93% global accuracy. The AUC from the ROC analysis resulted in a small value of 0.61. This suggests that our variable could have a low prediction accuracy, even if we detected all clinical parameters at the entrance of the patients into the medicine unit. The clinical condition at the beginning of hospitalization could not be enough to predict the outcome. The search for a model considering clinical characteristics at the entrance to the non-intensive medical unit is an important issue. Our data show that changes in patients happen quickly given that the median of transfer to ICU was 5 days and the median of days to death was 13 days. Therefore, the search for conditions that could help to understand the course of the disease is a critical issue.
Comparing our model with other decision tree models, it can be observed that our model has an accuracy similar to that of Migriño (81.5%) [28], but lower than that of Altini et al. [29] and Naseem et al. [30], which were both higher than 85%. It is noteworthy that our model specificity was low when compared to other research [29,30]. On the other hand, the model has a 89,16% F-score, which is higher than another machine learning model that predicted outcomes in patients with COVID-19 [28]. Souza [25] has shown that this score is the lowest among the algorithms, and it was 52% in the training and 38% in the test set.
According to our model, the INR value is the first criterion used to classify patients. The INR is a parameter that more accurately evaluates the prothrombin time (PT) to eliminate the inherent variability in the calculation between various laboratories. It is calculated by dividing the PT of each patient by a standard laboratory parameter [31]. The INR value was lower among participants who survived during hospitalization than those who died or were transferred to the ICU. The difference was statistically significant (p < 0.001). Other studies have shown an increase in the INR value in participants with poor outcomes during hospitalization compared to those who survive [30,32,33]. INR assumes an important prognostic value as it is representative of the activation of coagulation, and it is now well known that COVID-19 is associated with coagulopathy [34]. Infection with SARS-CoV-2 can damage the vascular endothelium (typically at the pulmonary level and then in the subsequent phases at the systemic level) and initiate an inflammatory process that can alter the normal homeostatic procoagulant pathways and anticoagulants [35]. Our cohort had a high percentage of patients older than 70 years with cardiac or cerebrovascular disease who could be treated with anticoagulation therapy. In a practical application of the results of this model, the information coming from INR should be evaluated in light of previous therapies, even though the model did not consider age and comorbidities as valuable help.
A significant difference in lymphocyte and neutrophil counts was not found in our study between patients who survived and those who died or were transferred to the ICU; however, it resulted in an important variable in the decision tree model. Peripheral lymphocyte count is an early indicator of severe or critical COVID-19 patients [36], and in a previous study conducted by Naseem et al. [30] using a Cox regression model, an increased risk of death was associated with lymphocyte count reduction.
Serum creatinine was found as a significant variable to predict a bad outcome, and that parameter was found to be a significant variable to predict mortality in a meta-analysis of 19 studies on COVID-19 patients [37]. This sign of kidney damage should be given more attention because it is associated with worse outcomes in older and younger patients [38].
The comparison of IgM values, IgG levels and age between the patients who died or were transferred to the ICU and the surviving patients showed significantly different median values (p = 0.032 for IgM; p = 0.023 for IgG). Moreover, they are the parameters most used by the machine learning C5.0 algorithm for the construction of the decision tree. To the best of our knowledge, only a few other authors have used blood immunoglobulin levels to predict patient outcomes. In a retrospective study, Suryawanshi et al. [39] found that IgG concentration was different in three groups of patients classified according to severity. No difference was found in IgM concentration. In parallel, Yuan et al. [40] found a difference in IgG-only concentrations between non-serious patients and those with severe or critical conditions.
In a recent paper, some of us showed a reduced mortality in patients affected by COVID-19 infection linked with early antibodies against SARS-CoV-2, irrespective of age [41]. The authors actually demonstrate that an efficient immunological response during the early phase of COVID-19 protects from mortality, irrespective of age, even if advanced age is a critical risk factor for a poor outcome in infected subjects. These results are consistent with the data of the present work, since the levels of both IgM and IgG anti-SARS-CoV-2 in our model have a critical role in the prediction of the evolution of the disease. Possible therapeutic interventions able to enhance humoral immunity in elderly patients with weak antibody response during the early stage of COVID-19 infection are warranted.

5. Study Limitation

A limitation of this study is the intrinsic fault of the decision tree model related to overfitting, which often occurs in complex models for relatively simple data [42]. This could affect our results in which we have found low values of false negatives.
The decision tree model does not generate explicit coefficients, such as those generated using a logistic regression model. Therefore, it is difficult to estimate the impact of each variable on the outcome in terms of risk, but it is possible to have information on useful variables and their decisional values in a way that could be familiar to a clinical audience.
Finally, our model was trained on the data of a single cohort; therefore, if it is to be used outside of this, it is advisable to carry out independent validation and possible requalification of the model.
The unknown effect of sample size on the effectiveness of results: the study was planned for a larger sample size, but difficulties related to hospitalization overload determined the uneasy dialogue with clinicians to collect data. A simulation to determine sample size and power for a logistic regression to evaluate the effect of a single variable with an odds ratio of 2, and the probability of the event given the presence of the risk factor, requires a sample size of 178 subjects. However, there are still debatably reliable methods to determine sample size for machine learning methods that work with reliable results on so called “big data”, and sample size determination for decision trees or other algorithms require proper methods.

6. Conclusions

A machine learning approach, the decision tree model, was used to analyze the clinical data of hospitalized COVID-19 patients to establish an efficient prognosis [29]. In this study, we used the clinical, demographic, and blood chemistry parameters of the patients in order to predict two possible outcomes: discharged alive, or transferred to ICU or death, whichever was first.
The results suggest that a machine learning approach could help clinicians evaluate disease conditions in patients with COVID-19, and it could be useful in guiding decisions for therapies or diagnostic procedures. Further research should explore the acceptability of these results to physicians in current practice and analyze the impact of machine learning-guided decisions on patient outcomes, such as the feasibility of using the selected information and cut-offs.
We say that the study is prospective, but the machine learning model is a post hoc theoretical application. Therefore, we believe that in future studies, it could be introduced in the management of patients affected by COVID-19 to see how it affects the decision process of clinics in real practice.
We also believe it is necessary that further studies be conducted to evaluate machine learning models to better understand the identified predictors, especially those of an immunological nature, which are still poorly analyzed. Furthermore, more complex models, such as artificial neural networks and deep learning models, should be implemented together with an easier system to interpret whether the result is useful for clinical decisions [43].

Author Contributions

Conceptualization: P.T. and D.D.V.; methodology: P.T. and M.G.; software: M.G.; formal analysis: M.G. and P.T.; data curation: F.P., I.D. and J.M.; writing—original draft preparation: M.G. and P.T.; writing—review and editing: P.T., M.G., V.O.P. and P.P.; supervision: P.T., V.O.P., D.D.V., P.P. and M.T.M.; project administration: D.D.V. All authors have read and agreed to the published version of the manuscript.

Funding

This study received no external funding.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki. This study was approved by the Institutional Ethics Committee of Azienda Ospedaliera Consorziale Policlinico (protocol number 6665 and approval n. 0099211, year 21 December 2020).

Informed Consent Statement

All subjects gave their informed consent for inclusion before they participated in the study; data were treated according to EU Regulation 2016/679.

Data Availability Statement

Restrictions apply to the availability of these data.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lu, R.; Zhao, X.; Li, J.; Niu, P.; Yang, B.; Wu, H.; Wang, W.; Song, H.; Huang, B.; Zhu, N.; et al. Genomic Characterisation and Epidemiology of 2019 Novel Coronavirus: Implications for Virus Origins and Receptor Binding. Lancet 2020, 395, 565–574. [Google Scholar] [CrossRef] [Green Version]
  2. A Novel Coronavirus Genome Identified in a Cluster of Pneumonia Cases—Wuhan, China 2019−2020. Available online: https://weekly.chinacdc.cn/en/article/id/a3907201-f64f-4154-a19e-4253b453d10c (accessed on 7 March 2022).
  3. Iser, B.P.M.; Sliva, I.; Raymundo, V.T.; Poleto, M.B.; Schuelter-Trevisol, F.; Bobinski, F. Suspected COVID-19 Case Definition: A Narrative Review of the Most Frequent Signs and Symptoms among Confirmed Cases. Epidemiol. Serv. Saúde 2020, 29. [Google Scholar] [CrossRef]
  4. Ejaz, H.; Alsrhani, A.; Zafar, A.; Javed, H.; Junaid, K.; Abdalla, A.E.; Abosalif, K.O.A.; Ahmed, Z.; Younas, S. COVID-19 and Comorbidities: Deleterious Impact on Infected Patients. J. Infect. Public Health 2020, 13, 1833–1839. [Google Scholar] [CrossRef]
  5. Alizadehsani, R.; Alizadeh Sani, Z.; Behjati, M.; Roshanzamir, Z.; Hussain, S.; Abedini, N.; Hasanzadeh, F.; Khosravi, A.; Shoeibi, A.; Roshanzamir, M.; et al. Risk Factors Prediction, Clinical Outcomes, and Mortality in COVID-19 Patients. J. Med. Virol. 2021, 93, 2307–2320. [Google Scholar] [CrossRef] [PubMed]
  6. Huang, C.; Wang, Y.; Li, X.; Ren, L.; Zhao, J.; Hu, Y.; Zhang, L.; Fan, G.; Xu, J.; Gu, X.; et al. Clinical Features of Patients Infected with 2019 Novel Coronavirus in Wuhan, China. Lancet 2020, 395, 497–506. [Google Scholar] [CrossRef] [Green Version]
  7. Odone, A.; Delmonte, D.; Scognamiglio, T.; Signorelli, C. COVID-19 Deaths in Lombardy, Italy: Data in Context. Lancet Public Health 2020, 5, e310. [Google Scholar] [CrossRef]
  8. Yang, X.; Yu, Y.; Xu, J.; Shu, H.; Xia, J.; Liu, H.; Wu, Y.; Zhang, L.; Yu, Z.; Fang, M.; et al. Clinical Course and Outcomes of Critically Ill Patients with SARS-CoV-2 Pneumonia in Wuhan, China: A Single-Centered, Retrospective, Observational Study. Lancet Respir. Med. 2020, 8, 475–481. [Google Scholar] [CrossRef] [Green Version]
  9. Bartolomeo, N.; Giotta, M.; Trerotoli, P. In-Hospital Mortality in Non-COVID-19-Related Diseases before and during the Pandemic: A Regional Retrospective Study. Int. J. Environ. Res. Public Health 2021, 18, 10886. [Google Scholar] [CrossRef]
  10. Yadaw, A.S.; Li, Y.-C.; Bose, S.; Iyengar, R.; Bunyavanich, S.; Pandey, G. Clinical Features of COVID-19 Mortality: Development and Validation of a Clinical Prediction Model. Lancet Digit Health 2020, 2, e516–e525. [Google Scholar] [CrossRef]
  11. Tjendra, Y.; Al Mana, A.F.; Espejo, A.P.; Akgun, Y.; Millan, N.C.; Gomez-Fernandez, C.; Cray, C. Predicting Disease Severity and Outcome in COVID-19 Patients: A Review of Multiple Biomarkers. Arch. Pathol. Lab. Med. 2020, 144, 1465–1474. [Google Scholar] [CrossRef]
  12. Chen, G.; Wu, D.; Guo, W.; Cao, Y.; Huang, D.; Wang, H.; Wang, T.; Zhang, X.; Chen, H.; Yu, H.; et al. Clinical and Immunological Features of Severe and Moderate Coronavirus Disease 2019. J. Clin. Investig. 2020, 130, 2620–2629. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Chen, N.; Zhou, M.; Dong, X.; Qu, J.; Gong, F.; Han, Y.; Qiu, Y.; Wang, J.; Liu, Y.; Wei, Y.; et al. Epidemiological and Clinical Characteristics of 99 Cases of 2019 Novel Coronavirus Pneumonia in Wuhan, China: A Descriptive Study. Lancet 2020, 395, 507–513. [Google Scholar] [CrossRef] [Green Version]
  14. Henry, B.M.; de Oliveira, M.H.S.; Benoit, S.; Plebani, M.; Lippi, G. Hematologic, Biochemical and Immune Biomarker Abnormalities Associated with Severe Illness and Mortality in Coronavirus Disease 2019 (COVID-19): A Meta-Analysis. Clin. Chem. Lab. Med. 2020, 58, 1021–1028. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Liang, W.; Liang, H.; Ou, L.; Chen, B.; Chen, A.; Li, C.; Li, Y.; Guan, W.; Sang, L.; Lu, J.; et al. Development and Validation of a Clinical Risk Score to Predict the Occurrence of Critical Illness in Hospitalized Patients With COVID-19. JAMA Intern. Med. 2020, 180, 1081–1089. [Google Scholar] [CrossRef]
  16. Ertekin, B.; Yortanlı, M.; Özelbaykal, O.; Doğru, A.; Girişgin, A.S.; Acar, T. The Relationship between Routine Blood Parameters and the Prognosis of COVID-19 Patients in the Emergency Department. Emerg. Med. Int. 2021, 2021, 7489675. [Google Scholar] [CrossRef]
  17. Marietta, M.; Ageno, W.; Artoni, A.; De Candia, E.; Gresele, P.; Marchetti, M.; Marcucci, R.; Tripodi, A. COVID-19 and Haemostasis: A Position Paper from Italian Society on Thrombosis and Haemostasis (SISET). Blood Transfus. 2020, 18, 167–169. [Google Scholar] [CrossRef]
  18. Iwendi, C.; Bashir, A.K.; Peshkar, A.; Sujatha, R.; Chatterjee, J.M.; Pasupuleti, S.; Mishra, R.; Pillai, S.; Jo, O. COVID-19 Patient Health Prediction Using Boosted Random Forest Algorithm. Front. Public Health 2020, 8, 357. [Google Scholar] [CrossRef]
  19. Bottino, F.; Tagliente, E.; Pasquini, L.; Napoli, A.D.; Lucignani, M.; Figà-Talamanca, L.; Napolitano, A. COVID Mortality Prediction with Machine Learning Methods: A Systematic Review and Critical Appraisal. J. Pers. Med. 2021, 11, 893. [Google Scholar] [CrossRef]
  20. RStudio Team. RStudio: Integrated Development Environment for R. RStudio, PBC, Boston, MA. 2021. Available online: http://www.rstudio.com/ (accessed on 8 March 2022).
  21. Kuhn, M.; Quinlan, R. C50: C5.0 Decision Trees and Rule-Based Models. R Package Version 0.1.5. 2021. Available online: https://cran.r-project.org/package=c50 (accessed on 8 March 2022).
  22. Warnes, G.R.; Bolker, B.; Lumley, T.; Johnson, R.C. Gmodels: Various R Programming Tools for Model Fitting. Available online: https://CRAN.R-project.org/package=gmodels (accessed on 8 March 2022).
  23. Robin, X.; Turck, N.; Hainard, A.; Tiberti, N.; Lisacek, F.; Sanchez, J.; Müller, M. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform. 2011, 12, 77. [Google Scholar] [CrossRef]
  24. Adamidi, E.S.; Mitsis, K.; Nikita, K.S. Artificial Intelligence in Clinical Care amidst COVID-19 Pandemic: A Systematic Review. Comput. Struct. Biotechnol. J. 2021, 19, 2833–2850. [Google Scholar] [CrossRef]
  25. De Souza, F.S.H.; Hojo-Souza, N.S.; Dos Santos, E.B.; Da Silva, C.M.; Guidoni, D.L. Predicting the Disease Outcome in COVID-19 Positive Patients Through Machine Learning: A Retrospective Cohort Study With Brazilian Data. Front. Artif. Intell. 2021, 4, 579931. [Google Scholar] [CrossRef] [PubMed]
  26. Rokach, L.; Maimon, O. Decision Trees. In Data Mining and Knowledge Discovery Handbook; Maimon, O., Rokach, L., Eds.; Springer: Boston, MA, USA, 2005; pp. 165–192. ISBN 978-0-387-25465-4. [Google Scholar]
  27. Rochmawati, N.; Hidayati, H.B.; Yamasari, Y.; Yustanti, W.; Rakhmawati, L.; Tjahyaningtijas, H.P.A.; Anistyasari, Y. Covid Symptom Severity Using Decision Tree. In Proceedings of the 2020 Third International Conference on Vocational Education and Electrical Engineering (ICVEE), Surabaya, Indonesia, 3–4 October 2020; pp. 1–5. [Google Scholar]
  28. Migriño, J.R.; Batangan, A.R.U. Using Machine Learning to Create a Decision Tree Model to Predict Outcomes of COVID-19 Cases in the Philippines. West. Pac. Surveill. Response J. 2021, 12, 56–64. [Google Scholar] [CrossRef] [PubMed]
  29. Altini, N.; Brunetti, A.; Mazzoleni, S.; Moncelli, F.; Zagaria, I.; Prencipe, B.; Lorusso, E.; Buonamico, E.; Carpagnano, G.E.; Bavaro, D.F.; et al. Predictive Machine Learning Models and Survival Analysis for COVID-19 Prognosis Based on Hematochemical Parameters. Sensors 2021, 21, 8503. [Google Scholar] [CrossRef] [PubMed]
  30. Naseem, M.; Arshad, H.; Hashmi, S.A.; Irfan, F.; Ahmed, F.S. Predicting Mortality in SARS-COV-2 (COVID-19) Positive Patients in the Inpatient Setting Using a Novel Deep Neural Network. Int. J. Med. Inform. 2021, 154, 104556. [Google Scholar] [CrossRef] [PubMed]
  31. Hirsh, J.; Poller, L. The International Normalized Ratio. A Guide to Understanding and Correcting Its Problems. Arch. Intern. Med. 1994, 154, 282–288. [Google Scholar] [CrossRef]
  32. Perera, A.; Chowdary, P.; Johnson, J.; Lamb, L.; Drebes, A.; Mir, N.; Sood, T. A 10-Fold and Greater Increase in D-Dimer at Admission in COVID-19 Patients Is Highly Predictive of Pulmonary Embolism in a Retrospective Cohort Study. Ther. Adv. Hematol. 2021, 12, 20406207211048364. [Google Scholar] [CrossRef]
  33. Li, P.; Zhao, W.; Kaatz, S.; Latack, K.; Schultz, L.; Poisson, L. Factors Associated With Risk of Postdischarge Thrombosis in Patients With COVID-19. JAMA Netw. Open 2021, 4, e2135397. [Google Scholar] [CrossRef]
  34. COVID-19-Associated Coagulopathy—PubMed. Available online: https://pubmed.ncbi.nlm.nih.gov/32683333/ (accessed on 8 March 2022).
  35. Iba, T.; Warkentin, T.E.; Thachil, J.; Levi, M.; Levy, J.H. Proposal of the Definition for COVID-19-Associated Coagulopathy. J. Clin. Med. 2021, 10, 191. [Google Scholar] [CrossRef]
  36. Gao, Y.D.; Ding, M.; Dong, X.; Zhang, J.J.; Kursat Azkur, A.; Azkur, D.; Gan, H.; Sun, Y.L.; Fu, W.; Li, W.; et al. Risk factors for severe and critically ill COVID-19 patients: A review. Allergy 2021, 76, 428–455. [Google Scholar] [CrossRef]
  37. Malik, P.; Patel, U.; Mehta, D.; Patel, N.; Kelkar, R.; Akrmah, M.; Gabrilove, J.L.; Sacks, H. Biomarkers and outcomes of COVID-19 hospitalisations: Systematic review and meta-analysis. BMJ Evid. Based Med. 2021, 26, 107–108. [Google Scholar] [CrossRef]
  38. Mesas, A.E.; Cavero-Redondo, I.; Álvarez-Bueno, C.; Sarriá Cabrera, M.A.; Maffei de Andrade, S.; Sequí-Dominguez, I.; Martínez-Vizcaíno, V. Predictors of in-hospital COVID-19 mortality: A comprehensive systematic review and meta-analysis exploring differences by age, sex and health conditions. PLoS ONE 2020, 15, e0241742. [Google Scholar] [CrossRef]
  39. Suryawanshi, S.Y.; Priya, S.; Sinha, S.S.; Soni, S.; Haidry, N.; Verma, S.; Singh, S. Dynamic Profile and Clinical Implications of Hematological and Immunological Parameters in COVID-19 Patients. A Retrospective Study. J. Fam. Med. Prim. Care 2021, 10, 2518–2523. [Google Scholar] [CrossRef]
  40. Yuan, X.; Huang, W.; Ye, B.; Chen, C.; Huang, R.; Wu, F.; Wei, Q.; Zhang, W.; Hu, J. Changes of Hematological and Immunological Parameters in COVID-19 Patients. Int. J. Hematol. 2020, 112, 553–559. [Google Scholar] [CrossRef] [PubMed]
  41. De Vito, D.; Di Ciaula, A.; Palmieri, V.O.; Trerotoli, P.; Larocca, A.M.V.; Montagna, M.T.; Portincasa, P. Reduced COVID-19 mortality linked with early antibodies against SARS-CoV-2, irrispective of age. Eur. J. Int. Med. 2022, 98, 77–82. [Google Scholar] [CrossRef] [PubMed]
  42. Grokking Machine Learning. Available online: https://www.manning.com/books/grokking-machine-learning (accessed on 8 March 2022).
  43. Sun, C.; Hong, S.; Song, M.; Li, H.; Wang, Z. Predicting COVID-19 Disease Progression and Patient Outcomes Based on Temporal Deep Learning. BMC Med. Inform. Decis. Mak. 2021, 21, 45. [Google Scholar] [CrossRef]
Figure 1. Percentage distribution of patients by length of stay in days and outcome.
Figure 1. Percentage distribution of patients by length of stay in days and outcome.
Ijerph 19 13016 g001
Figure 2. Variables’ importance (attribute usage) for the training decision tree model C5.0.
Figure 2. Variables’ importance (attribute usage) for the training decision tree model C5.0.
Ijerph 19 13016 g002
Figure 3. Graphical representation of the decision tree model built on the training dataset with the predicted outcome and the fitting goodness (probability of correct outcome and its complementary value).
Figure 3. Graphical representation of the decision tree model built on the training dataset with the predicted outcome and the fitting goodness (probability of correct outcome and its complementary value).
Ijerph 19 13016 g003
Figure 4. Receiver operating characteristic curve to evaluate the fitting of the model.
Figure 4. Receiver operating characteristic curve to evaluate the fitting of the model.
Ijerph 19 13016 g004
Table 1. Main characteristics of the patients included in the study at baseline and results of comparison of percentage between outcome using chi-square or Fisher exact test.
Table 1. Main characteristics of the patients included in the study at baseline and results of comparison of percentage between outcome using chi-square or Fisher exact test.
Death or Transferred to Intensive Care Unit (n = 32)Discharged Alive (n = 113)
N % N % p-Value
Sex
Male1856.25%6153.98%1.00
Female1443.75%5246.02%
Symptoms
Dyspnea1237.50%5246.02%0.999
Cough515.63%3530.97%1.00
Fatigue721.88%3026.55%1.00
Headache26.25%1210.62%1.00
Confusion13.13%97.96%1.00
Nausea13.13%87.08%1.00
Sick13.13%65.31%1.00
Pharyngitis13.13%65.31%1.00
Nasal congestion13.13%32.65%0.999
Arthralgia00.00%32.65%1.00
Myalgia13.13%21.77%0.997
Arrhythmia39.38%1210.62%1.00
Comorbidity
Hypertension1237.50%7162.83%0.356
Cardiovascular disease1237.50%4338.05%1.00
Diabetes1134.38%3530.97%1.00
Cerebrovascular disease928.13%1916.81%0.896
Chronic kidney disease825.00%1412.39%0.585
COPD515.63%1412.39%0.999
Tumors515.63%119.73%0.986
Hepatitis B00.00%65.31%0.974
Immunopathological disease13.13%54.42%1.00
Table 2. Comparison between patients discharged alive and those who died or were transferred to an intensive care unit by demographical, and baseline clinical, hematological, and biochemistry value. The p-value refers to result of Wilcoxon test for independent groups.
Table 2. Comparison between patients discharged alive and those who died or were transferred to an intensive care unit by demographical, and baseline clinical, hematological, and biochemistry value. The p-value refers to result of Wilcoxon test for independent groups.
Patients Deaths or
Transferred in ICU (n =32)
Patients Alive (n = 113)p-Value
MedianQ1Q3MedianQ1Q3
Age (years)78.067.085.7570.057.082.00.011
Temperature (°C)36.536.036.636.436.236.670.715
Respiratory rate (rpm)20.018.020.018.015.020.00.110
Cardiac frequency (bpm)79.070.099.082.575.092.50.515
Systolic blood pressure (mmHg)137.5116.0150.0130.0122.5145.00.947
Diastolic blood pressure (mmHg)77.of 565.082.577.070.085.750.643
Temperature at admission (°C)36.536.036.636.436.236.670.715
Percentage of O2 saturation97.094.7597.2597.096.099.00.017
FiO2 (%)50.028.080.037.521.060.00.227
Neutrophil count (×103%µL)79.874.52585.3277.469.2584.40.124
Lymphocyte count (×103%µL)13.48.916.3514.959.421.30.115
Platelet count (×103%µL)202,000147,250272,250222,500168,500292,0000.212
Hemoglobin level (g%dL)127.0116.75143.0123.0108.0137.00.162
Procalcitonin levels (ng%mL)0.110.090.270.120.060.230.712
CRP (mg%mL)79.730.6105.537.614.097.60.066
LDH (mg%mL)307.0258.75368.0256.0207.0309.00.007
Albumin (mg%mL)27.024.030.528.025.031.00.098
ALT (mg%mL)23.014.7552.027.020.047.00.413
AST (mg%mL)30.022.048.028.521.038.50.371
ALP (mg%mL)72.058.587.064.053.083.00.441
Direct bilirubin (mg%mL)0.010.0090.01680.010.0070.01390.041
Indirect bilirubin (mg%mL)0.0150.0120.0220.0150.010.020.900
Total bilirubin (mg%mL)0.0270.0220.0370.0270.020.0340.586
Creatinine (mg%mL)1.0630.761.6370.830.6951.160.019
CPK (mg%mL)92.046.0165.572.541145.00.406
Sodium (mg%mL)140.0138.0142.0139.0137141.00.014
Potassium (mg%mL)4.03.64.2754.13.84.50.104
D-dimers (mg%L)0.7710.5282.2650.9080.5021.9620.882
INR1.121.0521.2021.041.01.09<0.001
IL-6 (pg%mL)38.317.3123.029.27.0583.50.183
IgM(g%L) AU/mL0.680.078.8424.190.47313.3030.032
IgG(g%L) AU/mL0.30.0653.9552.510.1855.740.023
Length of stay (days)11.05.75159.05.016.00.837
Table 3. Fitting parameters of our decision tree model verified on the validation dataset.
Table 3. Fitting parameters of our decision tree model verified on the validation dataset.
ParameterValue (%)
Accuracy75.93%
Sensitivity99.61%
Specificity23.43%
PPV82.18%
NPV40.07%
F1-score89.17%
MCC17.94%
Balanced accuracy61.52%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Giotta, M.; Trerotoli, P.; Palmieri, V.O.; Passerini, F.; Portincasa, P.; Dargenio, I.; Mokhtari, J.; Montagna, M.T.; De Vito, D. Application of a Decision Tree Model to Predict the Outcome of Non-Intensive Inpatients Hospitalized for COVID-19. Int. J. Environ. Res. Public Health 2022, 19, 13016. https://doi.org/10.3390/ijerph192013016

AMA Style

Giotta M, Trerotoli P, Palmieri VO, Passerini F, Portincasa P, Dargenio I, Mokhtari J, Montagna MT, De Vito D. Application of a Decision Tree Model to Predict the Outcome of Non-Intensive Inpatients Hospitalized for COVID-19. International Journal of Environmental Research and Public Health. 2022; 19(20):13016. https://doi.org/10.3390/ijerph192013016

Chicago/Turabian Style

Giotta, Massimo, Paolo Trerotoli, Vincenzo Ostilio Palmieri, Francesca Passerini, Piero Portincasa, Ilaria Dargenio, Jihad Mokhtari, Maria Teresa Montagna, and Danila De Vito. 2022. "Application of a Decision Tree Model to Predict the Outcome of Non-Intensive Inpatients Hospitalized for COVID-19" International Journal of Environmental Research and Public Health 19, no. 20: 13016. https://doi.org/10.3390/ijerph192013016

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop