# An Ensemble Learning Model for COVID-19 Detection from Blood Test Samples

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

- The proposed algorithm was able to effectively provide the preliminary classification of COVID-19 using relevant feature parameters.
- The proposed algorithm has a lower computational intensity, and the detection time was in a few seconds.
- Based on the effectiveness of our proposed model, it can improve pathologist efficiency and aid effective laboratory examination in pathology departments.

## 2. Materials and Methods

## 3. Proposed Methodology

#### 3.1. Dataset Description

#### 3.2. Data Preprocessing

#### 3.3. Feature Selection

#### 3.4. Cross-Validation Methods

#### 3.5. Ensemble Learning

- 1.
- Construct an ensemble:
- Select base learners $\mathcal{B}$, which must be different,
- Select a meta learner $\mathcal{L}$.

- 2.
- Train the ensemble:
- Train each base model on the training dataset $\mathcal{D}$,
- Cross-validate each base model,
- Combine the predictions from the base models to form a new training dataset $\widehat{\mathcal{D}}=\left\{{X}_{tr},{\mathcal{B}}_{1}\left({X}_{tr}\right),{\mathcal{B}}_{2}\left({X}_{tr}\right),\dots ,{\mathcal{B}}_{\mathrm{m}}\left({X}_{tr}\right)\right\},$ which consists of training inputs ${X}_{tr}$ and the corresponding predictions by $k$ base models ${\mathcal{B}}_{i}\left({X}_{tr}\right),i=1\dots k$,
- Train the meta-learner $\mathcal{M}$ on the new dataset $\widehat{\mathcal{D}}$ to generate more accurate predictions on previously unseen data.

- 3.
- Test on new data:
- Record output decisions from the base models $\mathcal{B}$,
- Feed base-model decisions into meta-learner $\mathcal{M}$ to make final decision.

#### 3.6. Machine Learning Models

- 1.
- The K-Nearest Neighbor (KNN) model has been used effectively in previous studies, especially in solving non-linear problems. It is used to assign the class label according to the smallest distance between the target point and training point(s) in the feature space. The Euclidean distance (ED) is widely used to determine the distance between the target point $x$ and the training point $y$:$$\mathrm{ED}=\sqrt{{{\displaystyle \sum}}_{i=1}^{n}{\left({x}_{i}-{y}_{i}\right)}^{2},}$$
- 2.
- Support Vector Machine (SVM): This a type of ML technique that has been used effectively in disease detection. This supervised learning algorithm selects the hyper-plane or the decision boundary defined by the solution vector $w$ to determine the maximum margins between training data samples and unknown test data. The most popular variants of SVM are linear SVM and nonlinear SVM with Radial Basis Function (RBF) kernel. The linear SVM binary classifier [64] is expressed in Equation (2). Nonlinear SVM with RBF kernel (Equation (3)) has shown very encouraging outcomes in pattern classification with wide application areas. Considering the training samples ${\{{y}_{i},{x}_{i}\}}_{i=1}^{n},$ with the label ${y}_{i}\in \left\{-1,+1\right\}$ showing the class of the feature vector ${x}_{i}\in {R}^{d}$ in $d$ feature dimensions, the hyperplane $H\left(x\right)$ is defined as follows:$$H\left(x\right)={w}^{T}x+b={{\displaystyle \sum}}_{i=1}^{n}{w}_{i}{x}_{i}+{b}_{i},$$$$H\left(x\right)={{\displaystyle \sum}}_{i=1}^{n}{w}_{i}{x}_{i}k\left(x,{v}_{t}\right)+b,$$
- 3.
- Naive Bayes (NB) is used for classification where the instances of a dataset are differentiated using specified features. This model is a probabilistic classifier based on strong independence assumptions between features. The mathematical expression for NB classifier is expressed as the best value of $P\left(x/t\right)$ and will be predicted value:$$P\left(x/t\right)=\left(P\left(t\u2044x\right)P\left(x\right)\right)/P\left(t\right),$$
- 4.
- Logistic Regression (LR): We presented a logistic regression model to find the optimal regularization strength and thereby prevent overfitting of the model.
- 5.
- Random Forest (RF) is an ensemble algorithm that applies the combination of tree predictors with the same distribution for all trees in the forest. Considering the ensemble of classifiers ${h}_{1}\left(x\right),{h}_{2}\left(x\right),\dots ,{h}_{k}\left(x\right)$, and with the training set drawn at random from the distribution of the random vector $X,Y$, the mathematical definition for the margin function is expressed in Equation (5):$$mg\left(X,Y\right)=a{v}_{k}I\left({h}_{k}\left(X\right)=Y\right)-ma{x}_{j\ne Y}a{v}_{k}I\left({h}_{k}\left(X\right)=j\right),$$

- 6.
- Linear Discriminant Analysis (LDA) is a Bayes optimal classifier that is used in many classification problems. LDA finds a one-dimensional subspace in which the classes are separated well. The discriminant function is given by Equation (7):$${d}_{k}\left(x\right)=2{\mu}_{k}^{T}{{\displaystyle \sum}}_{k}^{-1}X-{\mu}_{k}^{T}{{\displaystyle \sum}}_{k}^{-1}{\mu}_{k}-2\mathrm{log}\pi \left(k\right),$$

#### 3.7. Performance Metrics

#### 3.8. Software and Hardware

## 4. Results

#### 4.1. Convolutional Neural Network (First Stage of Ensemble Learning)

#### 4.2. Ablation Study of Machine Learning Algorithms (Second Stage of Ensemble Learning)

#### 4.3. Computational Complexity

#### 4.4. Statistical Analysis

#### 4.5. Comparison with Previous Studies

## 5. Discussion and Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Bennett, J.; Rokas, O.; Chen, L. Healthcare in the smart home: A study of past, present and future. Sustainability
**2017**, 9, 840. [Google Scholar] [CrossRef] [Green Version] - Sanei, S.; Smaragdis, P.; Ho, A.T.S.; Nandi, A.K.; Larsen, J. Guest editorial: Machine learning for signal processing. J. Signal Process. Syst.
**2015**, 79, 113–116. [Google Scholar] [CrossRef] [Green Version] - Ding, X.; Clifton, D.; Ji, N.; Lovell, N.H.; Bonato, P.; Chen, W.; Yu, X.; Xue, Z.; Xiang, T.; Zhang, Y.; et al. Wearable sensing and telehealth technology with potential applications in the coronavirus pandemic. IEEE Rev. Biomed. Eng.
**2021**, 14, 48–70. [Google Scholar] [CrossRef] [PubMed] - Ray, P.P.; Dash, D.; Kumar, N. Sensors for internet of medical things: State-of-the-art, security and privacy issues, challenges and future directions. Comput. Commun.
**2020**, 160, 111–131. [Google Scholar] [CrossRef] - Girdhar, A.; Kapur, H.; Kumar, V.; Kaur, M.; Singh, D.; Damasevicius, R. Effect of COVID-19 outbreak on urban health and environment. Air Qual. Atmos. Health
**2021**, 14, 389–397. [Google Scholar] [CrossRef] - Corman, V.M.; Landt, O.; Kaiser, M.; Molenkamp, R.; Meijer, A.; Chu, D.K.; Bleicker, T.; Brünink, S.; Schneider, J.; Schmidt, M.L.; et al. Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR. Eurosurveillance
**2020**, 25, 2000045. [Google Scholar] [CrossRef] [Green Version] - Kalane, P.; Patil, S.; Patil, B.P.; Sharma, D.P. Automatic Detection of COVID-19 Disease using U-Net Architecture Based Fully Convolutional Network. Biomed. Signal Process. Control
**2021**, 67, 102518. [Google Scholar] [CrossRef] - Li, D.; Wang, D.; Dong, J.; Wang, N.; Huang, H.; Xu, H.; Xia, C. False-Negative Results of Real-Time Reverse-Transcriptase Polymerase Chain Reaction for Severe Acute Respiratory Syndrome Coronavirus 2: Role of Deep-Learning-Based CT Diagnosis and Insights from Two Cases. Korean J. Radiol.
**2020**, 21, 505–508. [Google Scholar] [CrossRef] - Alyasseri, Z.A.A.; Al-Betar, M.A.; Doush, I.A.; Awadallah, M.A.; Abasi, A.K.; Makhadmeh, S.N.; Alomari, O.A.; Abdulkareem, K.H.; Adam, A.; Damasevicius, R.; et al. Review on COVID-19 diagnosis models based on machine learning and deep learning approaches. Expert Syst.
**2021**, 39, e12759. [Google Scholar] [CrossRef] - Kumar, V.; Singh, D.; Kaur, M.; Damaševičius, R. Overview of current state of research on the application of artificial intelligence techniques for COVID-19. PeerJ Comput. Sci.
**2021**, 7, e564. [Google Scholar] [CrossRef] - Alimadadi, A.; Aryal, S.; Manandhar, I.; Munroe, P.B.; Joe, B.; Cheng, X. Artificial intelligence and machine learning to fight COVID-19. Physiol. Genom.
**2020**, 52, 200–202. [Google Scholar] [CrossRef] [PubMed] - Shi, F.; Wang, J.; Shi, J.; Wu, Z.; Wang, Q.; Tang, Z.; Shen, D. Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for COVID-19. IEEE Rev. Biomed. Eng.
**2020**, 14, 4–15. [Google Scholar] [CrossRef] [Green Version] - Al-Qaness, M.A.A.; Ewees, A.A.; Fan, H.; Aziz, M.A.E. Optimization method for forecasting confirmed cases of COVID-19 in China. Appl. Sci.
**2020**, 9, 674. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Wieczorek, M.; Silka, J.; Polap, D.; Wozniak, M.; Damaševicius, R. Real-time neural network based predictor for cov19 virus spread. PLoS ONE
**2020**, 15, e0243189. [Google Scholar] [CrossRef] [PubMed] - Zhou, T.; Lu, H.; Yang, Z.; Qiu, S.; Huo, B.; Dong, Y. The ensemble deep learning model for novel COVID-19 on CT images. Appl. Soft Comput.
**2021**, 98, 106885. [Google Scholar] [CrossRef] - Akram, T.; Attique, M.; Gul, S.; Shahzad, A.; Altaf, M.; Naqvi, S.S.R.; Damasevicius, R.; Maskeliūnas, R. A novel framework for rapid diagnosis of COVID-19 on computed tomography scans. Pattern Anal. Appl.
**2021**, 24, 951–964. [Google Scholar] [CrossRef] - Khan, M.A.; Alhaisoni, M.; Tariq, U.; Hussain, N.; Majid, A.; Damaševičius, R.; Maskeliūnas, R. Covid-19 case recognition from chest ct images by deep learning, entropy-controlled firefly optimization, and parallel feature fusion. Sensors
**2021**, 21, 7286. [Google Scholar] [CrossRef] - Rehman, N.; Zia, M.S.; Meraj, T.; Rauf, H.T.; Damaševičius, R.; El-Sherbeeny, A.M.; El-Meligy, M.A. A self-activated cnn approach for multi-class chest-related covid-19 detection. Appl. Sci.
**2021**, 11, 9023. [Google Scholar] [CrossRef] - Roy, S.; Menapace, W.; Oei, S.; Luijten, B.; Fini, E.; Saltori, C.; Demi, L. Deep learning for classification and localization of COVID-19 markers in point-of-care lung ultrasound. IEEE Trans. Med. Imaging
**2020**, 39, 2676–2687. [Google Scholar] [CrossRef] - Udhaya Sankar, S.M.; Ganesan, R.; Katiravan, J.; Ramakrishnan, M.; Ruhin Kouser, R. Mobile application based speech and voice analysis for COVID-19 detection using computational audit techniques. Int. J. Pervasive Comput. Commun.
**2020**, 6. [Google Scholar] [CrossRef] - Imran, A.; Posokhova, I.; Qureshi, H.N.; Masood, U.; Riaz, M.S.; Ali, K.; John, C.N.; Iftikhar Hussain, M.D.; Nabeel, M. AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app. Inform. Med. Unlocked
**2020**, 20, 100378. [Google Scholar] [CrossRef] [PubMed] - Kim, J.; Kim, H.M.; Lee, E.J.; Jo, H.J.; Yoon, Y.; Lee, N.; Yoo, C.K. Detection and isolation of SARS-CoV-2 in serum, urine, and stool specimens of COVID-19 patients from the republic of Korea. Osong Public Health Res. Perspect.
**2020**, 11, 112–117. [Google Scholar] [CrossRef] [PubMed] - Lamb, L.E.; Dhar, N.; Timar, R.; Wills, M.; Dhar, S.; Chancellor, M.B. COVID-19 inflammation results in urine cytokine elevation and causes COVID-19 associated cystitis (CAC). Med. Hypotheses
**2020**, 145, 110375. [Google Scholar] [CrossRef] [PubMed] - Kermali, M.; Khalsa, R.K.; Pillai, K.; Ismail, Z.; Harky, A. The role of biomarkers in diagnosis of COVID-19—A systematic review. Life Sci.
**2020**, 254, 117788. [Google Scholar] [CrossRef] - Soltan, A.A.S.; Kouchaki, S.; Zhu, T.; Kiyasseh, D.; Taylor, T.; Hussain, Z.B.; Peto, T.; Brent, A.J.; Eyre, D.W.; A Clifton, D. Rapid triage for COVID-19 using routine clinical data for patients attending hospital: Development and prospective validation of an artificial intelligence screening test. Lancet Digit. Health
**2021**, 3, e87. [Google Scholar] [CrossRef] - Youssef, A.; Kouchaki, S.; Shamout, F.; Armstrong, J.; El-Bouri, R.; Taylor, T.; Birrenkott, D.; Vasey, B.; Soltan, A.; Zhu, T.; et al. Development and validation of early warning score systems for COVID-19 patients. Health Technol. Lett.
**2021**, 8, 105–117. [Google Scholar] [CrossRef] - Brinati, D.; Campagner, A.; Ferrari, D.; Locatelli, M.; Banfi, G.; Cabitza, F. Detection of COVID-19 infection from routine blood exams with machine learning: A feasibility study. J. Med. Syst.
**2020**, 44, 135. [Google Scholar] [CrossRef] - Cabitza, F.; Campagner, A.; Ferrari, D.; Di Resta, C.; Ceriotti, D.; Sabetta, E.; Carobene, A. Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests. Clin. Chem. Lab. Med.
**2021**, 59, 421–431. [Google Scholar] [CrossRef] - Yao, H.; Zhang, N.; Zhang, R.; Duan, M.; Xie, T.; Pan, J.; Wang, G. Severity detection for the coronavirus disease 2019 (COVID-19) patients using a machine learning model based on the blood and urine tests. Front. Cell Dev. Biol.
**2020**, 8, 683. [Google Scholar] [CrossRef] - Gunčar, G.; Kukar, M.; Notar, M.; Brvar, M.; Černelč, P.; Notar, M.; Notar, M. An application of machine learning to haematological diagnosis. Sci. Rep.
**2018**, 8, 411. [Google Scholar] [CrossRef] - Wu, G.; Zhou, S.; Wang, Y.; Li, X. Machine learning: A predication model of outcome of sars-cov-2 pneumonia. Nat. Res.
**2020**. [Google Scholar] [CrossRef] [Green Version] - Banerjee, A.; Ray, S.; Vorselaars, B.; Kitson, J.; Mamalakis, M.; Weeks, S.; Mackenzie, L.S. Use of machine learning and artificial intelligence to predict SARS-CoV-2 infection from full blood counts in a population. Int. Immunopharmacol.
**2020**, 86, 106705. [Google Scholar] [CrossRef] [PubMed] - Zheng, Y.; Zhu, Y.; Ji, M.; Wang, R.; Liu, X.; Zhang, M.; Liu, J.; Zhang, X.; Qin, C.H.; Fang, L.; et al. A Learning-Based Model to Evaluate Hospitalization Priority in COVID-19 Pandemics. Patterns
**2020**, 1, 100092. [Google Scholar] [CrossRef] [PubMed] - Bao, F.S.; He, Y.; Liu, J.; Chen, Y.; Li, Q.; Zhang, C.R.; Chen, S. Triaging moderate COVID-19 and other viral pneumonias from routine blood tests. arXiv
**2020**, arXiv:2005.06546. [Google Scholar] - de Moraes Batista, A.F.; Miraglia, J.L.; Donato, T.H.R.; Chiavegatto Filho, A.D.P. COVID-19 diagnosis prediction in emergency care patients: A machine learning approach. medRxiv
**2020**. [Google Scholar] [CrossRef] [Green Version] - Feng, C.; Huang, Z.; Wang, L.; Chen, X.; Zhai, Y.; Zhu, F.; Chen, H.; Wang, Y.; Su, X.; Huang, S.; et al. A novel triage tool of artificial intelligence assisted diagnosis aid system for suspected COVID-19 pneumonia in fever clinics. medRxiv
**2020**. [Google Scholar] [CrossRef] - Joshi, R.P.; Pejaver, V.; Hammarlund, N.E.; Sung, H.; Lee, S.K.; Furmanchuk, A.O.; Lee, H.Y.; Scott, G.; Gombar, S.; Shah, N.; et al. A predictive tool for identification of SARS-CoV-2 PCR-negative emergency department patients using routine test results. J. Clin. Virol.
**2020**, 129, 104502. [Google Scholar] [CrossRef] - De Freitas Barbosa, V.A.; Gomes, J.C.; de Santana, M.A.; de Almeida Albuquerque, J.E.; de Souza, R.G.; de Souza, R.E.; dos Santos, W.P. Heg. ia: An intelligent system to support diagnosis of COVID-19 based on blood tests. medRxiv
**2020**. [Google Scholar] [CrossRef] - Kang, J.; Chen, T.; Luo, H.; Luo, Y.; Du, G.; Jiming-Yang, M. Machine learning predictive model for severe COVID-19. Infect. Genet. Evol.
**2021**, 90, 104737. [Google Scholar] [CrossRef] - Kukar, M.; Gunčar, G.; Vovko, T.; Podnar, S.; Černelč, P.; Brvar, M.; Zalaznik, M.; Notar, M.; Moškon, S.; Notar, M. COVID-19 diagnosis by routine blood tests using machine learning. Sci. Rep.
**2021**, 11, 10738. [Google Scholar] [CrossRef] - Soares, F. A novel specific artificial intelligence-based method to identify COVID-19 cases using simple blood exams. MedRxiv
**2020**. [Google Scholar] [CrossRef] [Green Version] - AlJame, M.; Ahmad, I.; Imtiaz, A.; Mohammed, A. Ensemble learning model for diagnosing COVID-19 from routine blood tests. Inform. Med. Unlocked
**2020**, 21, 100449. [Google Scholar] [CrossRef] [PubMed] - Wu, J.; Shen, J.; Xu, M.; Shao, M. A novel combined dynamic ensemble selection model for imbalanced data to detect COVID-19 from complete blood count. Comput. Methods Programs Biomed.
**2021**, 211, 106444. [Google Scholar] [CrossRef] [PubMed] - AlJame, M.; Imtiaz, A.; Ahmad, I.; Mohammed, A. Deep forest model for diagnosing COVID-19 from routine blood tests. Sci. Rep.
**2021**, 11, 16682. [Google Scholar] [CrossRef] - Babaei Rikan, S.; Sorayaie Azar, A.; Ghafari, A.; Bagherzadeh Mohasefi, J.; Pirnejad, H. COVID-19 diagnosis from routine blood tests using artificial intelligence techniques. Biomed. Signal Process. Control
**2022**, 72, 103263. [Google Scholar] [CrossRef] - Buturovic, L.; Zheng, H.; Tang, B.; Lai, K.; Kuan, W.S.; Gillett, M.; Santram, R.; Shojaei, M.; Almansa, R.; Nieto, J.; et al. A 6-mRNA host response classifier in whole blood predicts outcomes in COVID-19 and other acute viral infections. Sci. Rep.
**2022**, 12, 889. [Google Scholar] [CrossRef] - Du, R.; Tsougenis, E.D.; Ho, J.W.K.; Chan, J.K.Y.; Chiu, K.W.H.; Fang, B.X.H.; Ng, M.Y.; Leung, S.-T.; Lo, C.S.Y.; Wong, H.-Y.F.; et al. Machine learning application for the prediction of SARS-CoV-2 infection using blood tests and chest radiograph. Sci. Rep.
**2021**, 11, 14250. [Google Scholar] [CrossRef] - Hu, J.; Han, Z.; Heidari, A.A.; Shou, Y.; Ye, H.; Wang, L.; Huang, X.; Chen, H.; Chen, Y.; Wu, P. Detection of COVID-19 severity using blood gas analysis parameters and Harris hawks optimized extreme learning machine. Comput. Biol. Med.
**2021**, 142, 105166. [Google Scholar] [CrossRef] - Rahman, T.; Khandakar, A.; Abir, F.F.; Faisal, M.A.A.; Hossain, M.S.; Podder, K.K.; Abbas, T.O.; Alam, M.F.; Kashem, S.B.; Islam, M.T.; et al. QCovSML: A reliable COVID-19 detection system using CBC biomarkers by a stacking machine learning model. Comput. Biol. Med.
**2022**, 143, 105284. [Google Scholar] [CrossRef] - Qu, J.; Sumali, B.; Lee, H.; Terai, H.; Ishii, M.; Fukunaga, K.; Mitsukura, Y.; Nishimura, T. Finding of the factors affecting the severity of COVID-19 based on mathematical models. Sci. Rep.
**2021**, 11, 24224. [Google Scholar] [CrossRef] - Langer, T.; Favarato, M.; Giudici, R.; Bassi, G.; Garberi, R.; Villa, F.; Gay, H.; Zeduri, A.; Bragagnolo, S.; Molteni, A.; et al. Development of machine learning models to predict RT-PCR results for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in patients with influenza-like symptoms using only basic clinical data. Scand. J. Trauma Resusc. Emerg. Med.
**2020**, 28, 113. [Google Scholar] [CrossRef] [PubMed] - Bayat, V.; Phelps, S.; Ryono, R.; Lee, C.; Parekh, H.; Mewton, J.; Sedghi, F.; Etminani, P.; Holodniy, M. A Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Prediction Model from Standard Laboratory Tests. Clin. Infect. Dis.
**2021**, 73, e2901–e2907. [Google Scholar] [CrossRef] [PubMed] - Alves, M.A.; de Castro, G.Z.; Soares Oliveira, B.A.; Ferreira, L.A.; Ramírez, J.A.; Silva, R.; Guimarães, F.G. Explaining Machine Learning based Diagnosis of COVID-19 from Routine Blood Tests with Decision Trees and Criteria Graphs. J. Comput. Biol. Med.
**2021**, 132, 104335. [Google Scholar] [CrossRef] [PubMed] - Wu, J.; Zhang, P.; Zhang, L.; Meng, W.; Li, J.; Tong, C.; Li, Y.; Cai, J.; Yang, Z.; Zhu, J.; et al. Rapid and accurate identification of COVID-19 infection through machine learning based on clinical available blood test results. MedRxiv
**2020**. [Google Scholar] [CrossRef] - Alakus, T.B.; Turkoglu, I. Comparison of deep learning approaches to predict COVID-19 infection. ChaosSolitons Fractals
**2020**, 140, 110120. [Google Scholar] [CrossRef] [PubMed] - Nan, S.N.; Ya, Y.; Ling, T.L.; Nv, G.H.; Ying, P.H.; Bin, J. A prediction model based on machine learning for diagnosing the early COVID-19 patients. MedRxiv
**2020**. [Google Scholar] [CrossRef] - Göreke, G.; Sarı, V.; Kockanat, S. A novel classifier architecture based on deep neural network for COVID-19 detection using laboratory findings. Appl. Soft Comput.
**2021**, 106, 107329. [Google Scholar] [CrossRef] - Yang, H.S.; Hou, Y.; Vasovic, L.V.; Steel, P.A.; Chadburn, A.; Racine-Brzostek, S.E.; Velu, P.; Cushing, M.M.; Loda, M.; Kaushal, R.; et al. Routine laboratory blood tests predict sars-cov-2 infection using machine learning. Clin. Chem.
**2020**, 66, 1396–1404. [Google Scholar] [CrossRef] - Kotsiantis, S.B.; Kanellopoulos, D.; Pintelas, P.E. Data preprocessing for supervised leaning. Int. J. Comput. Sci.
**2006**, 1, 111–117. [Google Scholar] - Beretta, L.; Santaniello, A. Nearest neighbor imputation algorithms: A critical evaluation. BMC Med. Inform. Decis. Mak.
**2016**, 16, 74. [Google Scholar] [CrossRef] [Green Version] - Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res.
**2002**, 16, 321–357. [Google Scholar] [CrossRef] - ur Rehman, M.H.; Liew, C.S.; Abbas, A.; Jayaraman, P.P.; Wah, T.Y.; Khan, S.U. Big Data Reduction Methods: A Survey. Data Sci. Eng.
**2016**, 1, 265–284. [Google Scholar] [CrossRef] [Green Version] - Dong, X.; Yu, Z.; Cao, W.; Shi, Y.; Ma, Q. A survey on ensemble learning. Front. Comput. Sci.
**2020**, 14, 241–258. [Google Scholar] [CrossRef] - Ladicky, L.; Torr, P.H. Locally linear support vector machines. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, WA, USA, 28 June–2 July 2011; pp. 985–992. [Google Scholar]
- Maji, P.; Mullins, R. On the Reduction of Computational Complexity of Deep Convolutional Neural Networks. Entropy
**2018**, 20, 305. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Shaban, W.M.; Rabie, A.H.; Saleh, A.I.; Abo-Elsoud, M.A. Detecting COVID-19 patients based on fuzzy inference engine and Deep Neural Network. Appl. Soft Comput.
**2021**, 99, 106906. [Google Scholar] [CrossRef] - Aktar, S.; Ahamad, M.M.; Rashed-Al-Mahfuz, M.; Azad, A.; Uddin, S.; Kamal, A.; Alyami, S.A.; Lin, P.; Islam, S.M.S.; Quinn, J.M.; et al. Machine Learning Approach to Predicting COVID-19 Disease Severity Based on Clinical Blood Test Data: Statistical Analysis and Model Development. JMIR Med. Inform.
**2021**, 9, e25884. [Google Scholar] [CrossRef] - Chadaga, K.; Prabhu, S.; Vivekananda Bhat, K.; Umakanth, S.; Sampathila, N. Medical diagnosis of COVID-19 using blood tests and machine learning. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2022; Volume 2161, p. 012017. [Google Scholar] [CrossRef]

**Figure 5.**Critical difference diagram of the final-stage classifiers (meta-learners) based on their performances.

Ref. | Methods | Feature Selection Methods | Metrics (Value) | Data Samples (COVID-19 Samples) |
---|---|---|---|---|

[42] | Ensemble learning extra trees, random forest (RF), logistic regression (LR), extreme gradient boosting (ERLX) classifier | Manual | Accuracy: 99.88% AUC: 99.38%, Sensitivity: 98.72% Specificity: 99.99% | 5644 (559) |

[47] | Categorical gradient boosting (CatBoost), support vector machine (SVM), and LR | Manual | AUC: 89.9–95.8% Specificity: 91.5–98.3% Sensitivity: 55.5–77.8% | 5148 (447) |

[53] | Ensemble learning with RF, LR, XGBoost, Support Vector Machine (SVM), MLP | Decision Tree Explainer (DTX) | Accuracy (0.88 ± 0.02) | 608 (84) |

[39] | Artificial Neural Network (ANN) predictive model | Pearson and Kendall correlation coefficient | Area under curve (AUC) values of 0.953 (0.889–0.982). | 151 |

[35] | ANN, RF, gradient boosting trees, LR and SVM | NA | AUC: 0.85; Sensitivity: 0.68; Specificity: 0.85; Brier Score: 0.16 | 235 (102) |

[54] | RF classifier | manual | Accuracy: 96.95%, Sensitivity: 95.12%, Specificity: 96.97% | 253 (105) |

[55] | ANN, Convolutional Neural Network (CNN), Long-Short Term Memory (LSTM), Recurrent Neural Network (RNN), CNN-LSTM, and CNN-RNN | CNN and LSTM | AUC: 0.90, Accuracy: 0.9230, FI-score: 0.93, Precision: 0.9235, Recall: 0.9368 | 600 (80) |

[56] | SVM, LR, DT, RF and deep neural network (DNN) | Logistic regression (LR) | Accuracy: 91%, Sensitivity: 87%, AUC: 97.1%, Specificity: 95%. | 921 (361) |

[57] | ANN, CNN, RNN | SMOTE | Accuracy: 94.95%, F1-score: 94.98%, precision: 94.98%, recall: 94.98%, AUC: 100% | 600 (80) |

[31] | LR | Maximum relevance minimum redundancy (mRMR) algorithm | Sensitivity: 98%, Specificity: 91% | 110 (51) |

[58] | LR, DT, RF, gradient boosted decision tree | NA | Sensitivity: 75.8%, Specificity: 80.2%, AUC: 85.3% | 3346 (1394) |

S/N | Features | Data Types | Number of Missing Values | Mean/Average |
---|---|---|---|---|

1 | Gender | Nominal | 0 | - |

2 | Age | Numeric | 0 | 61.3 |

3 | WBC ^{1} | Numeric | 2 | 8.6 |

4 | Platelets | Numeric | 2 | 226.5 |

5 | CRP ^{2} | Numeric | 6 | 90.9 |

6 | AST ^{3} | Numeric | 2 | 54.2 |

7 | ALT ^{4} | Numeric | 13 | 44.9 |

8 | GGT ^{5} | Numeric | 143 | 82.5 |

9 | ALP ^{6} | Numeric | 148 | 89.9 |

10 | LDH ^{7} | Numeric | 85 | 380.5 |

11 | Neutrophils | Numeric | 70 | 6.2 |

12 | Lymphocytes | Numeric | 70 | 1.2 |

13 | Monocytes | Numeric | 70 | 0.6 |

14 | Eosinophils | Numeric | 70 | 0.05 |

15 | Basophils | Numeric | 71 | 0 |

16 | Swab | Nominal | 0 | - |

^{1}WBC = Leukocytes;

^{2}CRP = C-Reactive Protein;

^{3}AST = Aspartate Transaminases;

^{4}ALT = Alanine Transaminases;

^{5}GGT = γ-Glutamyl Transferasi;

^{6}ALP= Alkaline phosphatase;

^{7}LDH = Lactate dehydrogenase.

Model | Parameters Values | |
---|---|---|

KNN | n_neighbors = 3, weights = ‘uniform’, algorithm = ‘auto’, leaf_size = 30, p = 2, metric = ‘minkowski’ | |

SVM | Linear | C: 0.025, kernel: [‘linear’] |

RBF | C: 1, gamma: 2, kernel: [‘rbf’] | |

Decision Tree | criterion = ‘gini’, max_depth = 5, max_features = None, max_leaf_nodes = None, min_samples_leaf = 1, min_samples_split = 2, random_state = None, splitter = ‘best’, in_weight_fraction_leaf = 0.0 | |

Naïve Bayes (Gaussian) | priors = None, var_smoothing = 10^{−9} | |

Neural Network (MLP Classifier) | activation = ‘relu’, alpha = 1, batch_size = 1024, hidden_layer_sizes = 100, learning_rate_init = 0.001, max_iter = 1000, max_iter = 200, power_t = 0.5, random_state = None, shuffle = True, solver = ‘adam’, tol = 0.0001 | |

Discriminant Analysis | Linear | n_components = None, priors = None, shrinkage = None, solver = ‘svd’ |

Quadratic | tol = 0.0001, store_covariance = False, reg_param = 0.0, priors = None | |

Passive | C = 1.0, n_iter_no_change = 5, max_iter = 1000, random_state = None | |

Ridge | fit_intercept = True, alpha = 1.0, normalize = False, max_iter = None, random_state = None, solver = ‘auto’, | |

SGDC | loss = ‘hinge’, penalty = ‘l2’, alpha = 0.0001, fit_intercept = True, max_iter = 1000, | |

Logistic Regression | C = 1.0, cv = None, dual = False, fit_intercept = True, max_iter = 100, penalty = ‘l2’, random_state = None, solver = ‘lbfgs’, tol = 0.0001, | |

Ensemble Learner | ||

Random Forest | max_features = 1, n_estimators = 10, max_depth = 5, criterion = ‘gini’, random_state = None, verbose = 0 | |

AdaBoost | algorithm = ‘SAMME.R’, learning_rate = 1, n_estimators = 50, random_state = None | |

Extra Trees | criterion = ‘gini’, max_depth = None, max_features = 12, min_samples_leaf = 1, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 100 |

Metrics | Definition |
---|---|

Accuracy (Acc) | $Acc=\left(\left(TP+TN\right)/\left(TP+TN+FP+FN\right)\right)$ |

False Negative Rate (FNR) | $FNR=\left(FN/\left(TP+FN\right)\right)$ |

False Positive Rate (FPR) | $FPR=\left(FP/\left(TN+FP\right)\right)$ |

Matthews Correlation Coefficient (MCC) | $MCC=\left(TP\times TN-FP\times FN\right)/\sqrt{\left(TP+FP\right)\times \left(TP+FN\right)\times \left(TN+FP\right)\times \left(TN+FN\right)}$ |

Cohen Kappa | $K=\left({P}_{o}-{P}_{e}\right)/\left(1-{P}_{e}\right)$ |

Parameters | Description |
---|---|

Activation Function | Input layer: ReLU |

Hidden layer: ReLU | |

Output layer: Softmax | |

Loss = sparse_categorical_crossentropy, optimizer = adam, | |

Input layer: ReLU | |

Epochs | 10 |

Epoch 2 | 50 |

Batch Size | 1024 |

Dropout ratio (Input) | 0.5 |

Dropout ratio (Output) | 0.3 |

**Table 6.**The results of ablation study: performance of the proposed model using different final stage ML classifiers. Best values are shown in bold.

ML Model | Accuracy (%) | FPR (%) | FNR (%) | AUC (%) | MCC (%) | Kappa (%) |
---|---|---|---|---|---|---|

Nearest Neighbors | 78.9 | 39.86 | 11.02 | 74.56 | 51.48 | 50.8 |

Linear SVM | 64.66 | 100 | 0 | 50 | 0 | 0 |

RBF SVM | 71.44 | 79.06 | 1.72 | 59.62 | 26.34 | 21.66 |

Decision Tree | 94.64 | 9.36 | 3.2 | 93.72 | 88.24 | 87.94 |

Random Forest | 90.74 | 22.38 | 2.2 | 87.72 | 79.54 | 78.64 |

Neural Net | 65.02 | 99.04 | 0 | 50.48 | 3.48 | 1.18 |

AdaBoost | 99.28 | 2.24 | 0 | 98.88 | 98.36 | 98.32 |

ExtraTrees | 99.28 | 0 | 1.04 | 99.48 | 98.4 | 98.4 |

Naive Bayes | 72.14 | 54.14 | 13.78 | 66.06 | 35.48 | 34.26 |

LDA | 70.34 | 67.62 | 8.86 | 61.8 | 30.08 | 26.44 |

QDA | 91.44 | 18.4 | 3.3 | 89.14 | 81.26 | 80.46 |

Logistic | 65.02 | 99.04 | 0 | 50.48 | 3.48 | 1.18 |

Passive | 59.64 | 60 | 29.48 | 55.26 | 11.48 | 9.74 |

Ridge | 67.18 | 92.2 | 0.52 | 53.62 | 17.24 | 8.82 |

SGDC | 58.96 | 52.38 | 34.88 | 56.36 | 13.12 | 15.1 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Abayomi-Alli, O.O.; Damaševičius, R.; Maskeliūnas, R.; Misra, S.
An Ensemble Learning Model for COVID-19 Detection from Blood Test Samples. *Sensors* **2022**, *22*, 2224.
https://doi.org/10.3390/s22062224

**AMA Style**

Abayomi-Alli OO, Damaševičius R, Maskeliūnas R, Misra S.
An Ensemble Learning Model for COVID-19 Detection from Blood Test Samples. *Sensors*. 2022; 22(6):2224.
https://doi.org/10.3390/s22062224

**Chicago/Turabian Style**

Abayomi-Alli, Olusola O., Robertas Damaševičius, Rytis Maskeliūnas, and Sanjay Misra.
2022. "An Ensemble Learning Model for COVID-19 Detection from Blood Test Samples" *Sensors* 22, no. 6: 2224.
https://doi.org/10.3390/s22062224