Identification of Transcriptome Biomarkers for Severe COVID-19 with Machine Learning Methods
Abstract
:1. Introduction
2. Materials and Methods
2.1. Dataset
2.2. Boruta Feature Selection
2.3. Minimum Redundancy Maximum Relevance
2.4. Incremental Feature Selection
2.5. SMOTE
2.6. Classification Algorithms
2.7. Performance Measurement
2.8. Biological Function Analysis
3. Results
3.1. Results of the Feature Selection Using the Boruta and mRMR Methods
3.2. Identification of the Optimal Features to Distinguish the Disease State and Severity by the IFS Method
3.3. Results of the Biological Function Analysis for the Top 104 Features
4. Discussion
4.1. Analysis of the Top Prediction Features
4.2. Functional Analysis of the Features (Genes)
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yang, L.; Liu, S.; Liu, J.; Zhang, Z.; Wan, X.; Huang, B.; Chen, Y.; Zhang, Y. COVID-19: Immunopathogenesis and Immunotherapeutics. Signal Transduct. Target. Ther. 2020, 5, 128. [Google Scholar] [CrossRef]
- Tyrrell, D.; Bynoe, M. Cultivation of viruses from a high proportion of patients with colds. Lancet 1966, 287, 76–77. [Google Scholar] [CrossRef] [PubMed]
- Zhu, N.; Zhang, D.; Wang, W.; Li, X.; Yang, B.; Song, J.; Zhao, X.; Huang, B.; Shi, W.; Lu, R. A novel coronavirus from patients with pneumonia in China, 2019. N. Engl. J. Med. 2020, 382, 727–733. [Google Scholar] [CrossRef]
- Guan, W.-J.; Ni, Z.-Y.; Hu, Y.; Liang, W.-H.; Ou, C.-Q.; He, J.-X.; Liu, L.; Shan, H.; Lei, C.-L.; Hui, D.S. Clinical characteristics of 2019 novel coronavirus infection in China. MedRxiv 2020. [Google Scholar] [CrossRef]
- Velavan, T.P.; Meyer, C.G. The COVID-19 epidemic. Trop. Med. Int. Health 2020, 25, 278. [Google Scholar] [CrossRef] [Green Version]
- Diao, B.; Wang, C.; Tan, Y.; Chen, X.; Liu, Y.; Ning, L.; Chen, L.; Li, M.; Liu, Y.; Wang, G. Reduction and functional exhaustion of T cells in patients with coronavirus disease 2019 (COVID-19). Front. Immunol. 2020, 11, 827. [Google Scholar] [CrossRef]
- Ni, L.; Ye, F.; Cheng, M.-L.; Feng, Y.; Deng, Y.-Q.; Zhao, H.; Wei, P.; Ge, J.; Gou, M.; Li, X. Detection of SARS-CoV-2-specific humoral and cellular immunity in COVID-19 convalescent individuals. Immunity 2020, 52, 971–977. [Google Scholar] [CrossRef] [PubMed]
- Zheng, H.-Y.; Zhang, M.; Yang, C.-X.; Zhang, N.; Wang, X.-C.; Yang, X.-P.; Dong, X.-Q.; Zheng, Y.-T. Elevated exhaustion levels and reduced functional diversity of T cells in peripheral blood may predict severe progression in COVID-19 patients. Cell. Mol. Immunol. 2020, 17, 541–543. [Google Scholar] [CrossRef]
- Zhang, B.; Zhou, X.; Zhu, C. Immune phenotyping based on neutrophil-to-lymphocyte ratio and IgG predicts disease severity and outcome for patients with COVID-19. Front. Mol. Biosci. 2020, 7, 157. [Google Scholar] [CrossRef]
- Qin, C.; Zhou, L.Q.; Hu, Z.W.; Zhang, S.Q.; Yang, S.; Tao, Y.; Xie, C.H.; Ma, K.; Shang, K.; Wang, W.; et al. Dysregulation of immune response in patients with coronavirus 2019 (COVID-19) in wuhan, china. Clin. Infect. Dis. 2020, 71, 762–768. [Google Scholar] [CrossRef]
- Yang, L.; Gou, J.; Gao, J.; Huang, L.; Zhu, Z.; Ji, S.; Liu, H.; Xing, L.; Yao, M.; Zhang, Y. Immune characteristics of severe and critical COVID-19 patients. Signal Transduct. Target. Ther. 2020, 5, 179. [Google Scholar] [CrossRef] [PubMed]
- Group, S.C.-G. Genomewide association study of severe COVID-19 with respiratory failure. N. Engl. J. Med. 2020, 383, 1522–1534. [Google Scholar]
- Overmyer, K.A.; Shishkova, E.; Miller, I.J.; Balnis, J.; Bernstein, M.N.; Peters-Clarke, T.M.; Meyer, J.G.; Quan, Q.; Muehlbauer, L.K.; Trujillo, E.A.; et al. Large-Scale Multi-omic Analysis of COVID-19 Severity. Cell Syst. 2021, 12, 23–40.e7. [Google Scholar] [CrossRef] [PubMed]
- Gallo Marin, B.; Aghagoli, G.; Lavine, K.; Yang, L.; Siff, E.J.; Chiang, S.S.; Salazar-Mather, T.P.; Dumenco, L.; Savaria, M.C.; Aung, S.N. Predictors of COVID-19 severity: A literature review. Rev. Med. Virol. 2021, 31, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Kursa, M.B.; Rudnicki, W.R. Feature selection with the Boruta package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef] [Green Version]
- Peng, H.; Fulmi, L.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
- Liu, H.A.; Setiono, R. Incremental feature selection. Appl. Intell. 1998, 9, 217–230. [Google Scholar] [CrossRef]
- Safavian, S.R.; Landgrebe, D. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 1991, 21, 660–674. [Google Scholar] [CrossRef] [Green Version]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Kohavi, R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In Proceedings of the International Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 20–25 August 1995; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1995; pp. 1137–1143. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Ding, S.; Wang, D.; Zhou, X.; Chen, L.; Feng, K.; Xu, X.; Huang, T.; Li, Z.; Cai, Y. Predicting Heart Cell Types by Using Transcriptome Profiles and a Machine Learning Method. Life 2022, 12, 228. [Google Scholar] [CrossRef]
- Zhou, X.; Ding, S.; Wang, D.; Chen, L.; Feng, K.; Huang, T.; Li, Z.; Cai, Y.-D. Identification of cell markers and their expression patterns in skin based on single-cell RNA-sequencing profiles. Life 2022, 12, 550. [Google Scholar] [CrossRef]
- Chen, L.; Li, Z.; Zhang, S.; Zhang, Y.-H.; Huang, T.; Cai, Y.-D. Predicting RNA 5-methylcytosine sites by using essential sequence features and distributions. BioMed Res. Int. 2022, 2022, 4035462. [Google Scholar] [CrossRef]
- Ran, B.; Chen, L.; Li, M.; Han, Y.; Dai, Q. Drug-Drug interactions prediction using fingerprint only. Comput. Math. Methods Med. 2022, 2022, 7818480. [Google Scholar] [CrossRef]
- Chen, W.; Chen, L.; Dai, Q. iMPT-FDNPL: Identification of membrane protein types with functional domains and a natural language processing approach. Comput. Math. Methods Med. 2021, 2021, 7681497. [Google Scholar] [CrossRef]
- Li, X.; Lu, L.; Chen, L. Identification of protein functions in mouse with a label space partition method. Math. Biosci. Eng. 2022, 19, 3820–3842. [Google Scholar] [CrossRef]
- Tang, S.; Chen, L. iATC-NFMLP: Identifying classes of anatomical therapeutic chemicals based on drug networks, fingerprints and multilayer perceptron. Curr. Bioinform. 2022, 17, 814–824. [Google Scholar]
- Wu, C.; Chen, L. A model with deep analysis on a large drug network for drug classification. Math. Biosci. Eng. 2023, 20, 383–401. [Google Scholar] [CrossRef]
- Onesime, M.; Yang, Z.; Dai, Q. Genomic Island Prediction via Chi-Square Test and Random Forest Algorithm. Comput. Math. Methods Med. 2021, 2021, 9969751. [Google Scholar] [CrossRef] [PubMed]
- Jurman, G.; Riccadonna, S.; Furlanello, C. A comparison of MCC and CEN error measures in multi-class prediction. PLoS ONE 2012, 7, e41882. [Google Scholar] [CrossRef] [PubMed]
- Wu, T.; Hu, E.; Xu, S.; Chen, M.; Guo, P.; Dai, Z.; Feng, T.; Zhou, L.; Tang, W.; Zhan, L.; et al. Clusterprofiler 4.0: A universal enrichment tool for interpreting omics data. Innovation 2021, 2, 100141. [Google Scholar] [CrossRef] [PubMed]
- Huang, S.; Feng, C.; Chen, L.; Huang, Z.; Zhou, X.; Li, B.; Wang, L.-l.; Chen, W.; Lv, F.-q.; Li, T.-s. Molecular mechanisms of mild and severe pneumonia: Insights from RNA sequencing. Med. Sci. Monit. Int. Med. J. Exp. Clin. Res. 2017, 23, 1662. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Szabo, P.A.; Dogra, P.; Gray, J.I.; Wells, S.B.; Connors, T.J.; Weisberg, S.P.; Krupska, I.; Matsumoto, R.; Poon, M.M.; Idzikowski, E. Longitudinal profiling of respiratory and systemic immune responses reveals myeloid cell-driven lung inflammation in severe COVID-19. Immunity 2021, 54, 797–814. [Google Scholar] [CrossRef]
- Vastrad, B.M.; Vastrad, C.M. Bioinformatics analysis of expression profiling by high throughput sequencing for identification of potential key genes among SARS-CoV-2/COVID-19. Res. Sq. 2021, 21, 100956. [Google Scholar]
- Vadillo, E.; Taniguchi-Ponciano, K.; Lopez-Macias, C.; Carvente-Garcia, R.; Mayani, H.; Ferat-Osorio, E.; Flores-Padilla, G.; Torres, J.; Gonzalez-Bonilla, C.R.; Majluf, A. A shift towards an immature myeloid profile in peripheral blood of critically Ill COVID-19 patients. Arch. Med. Res. 2021, 52, 311–323. [Google Scholar] [CrossRef] [PubMed]
- El-Chemaly, S.; Cheung, F.; Kotliarov, Y.; O’Brien, K.J.; Gahl, W.A.; Chen, J.; Perl, S.Y.; Biancotto, A.; Gochuico, B.R. The immunome in two inherited forms of pulmonary fibrosis. Front. Immunol. 2018, 9, 76. [Google Scholar] [CrossRef] [Green Version]
- Xiong, Y.; Liu, Y.; Cao, L.; Wang, D.; Guo, M.; Jiang, A.; Guo, D.; Hu, W.; Yang, J.; Tang, Z. Transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in COVID-19 patients. Emerg. Microbes Infect. 2020, 9, 761–770. [Google Scholar] [CrossRef]
- Machitani, M.; Yasukawa, M.; Nakashima, J.; Furuichi, Y.; Masutomi, K. RNA-dependent RNA polymerase, RdRP, a promising therapeutic target for cancer and potentially COVID-19. Cancer Sci. 2020, 111, 3976. [Google Scholar] [CrossRef]
- Beigel, J.H.; Tomashek, K.M.; Dodd, L.E.; Mehta, A.K.; Zingman, B.S.; Kalil, A.C.; Hohmann, E.; Chu, H.Y.; Luetkemeyer, A.; Kline, S. Remdesivir for the treatment of COVID-19. N. Engl. J. Med. 2020, 383, 1813–1826. [Google Scholar] [CrossRef]
- Wang, M.; Cao, R.; Zhang, L.; Yang, X.; Liu, J.; Xu, M.; Shi, Z.; Hu, Z.; Zhong, W.; Xiao, G. Remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus (2019-nCoV) in vitro. Cell Res. 2020, 30, 269–271. [Google Scholar] [CrossRef]
- Chen, J.; Feng, G.; Guo, Q.; Wardenburg, J.B.; Lin, S.; Inoshima, I.; Deaton, R.; Yuan, J.X.; Garcia, J.G.; Machado, R.F. Transcriptional events during the recovery from MRSA lung infection: A mouse pneumonia model. PLoS ONE 2013, 8, e70176. [Google Scholar] [CrossRef]
- Auwul, M.R.; Rahman, M.R.; Gov, E.; Shahjaman, M.; Moni, M.A. Bioinformatics and machine learning approach identifies potential drug targets and pathways in COVID-19. Brief. Bioinform. 2021, 22, bbab120. [Google Scholar] [CrossRef] [PubMed]
- Huang, X.; Zhang, X.; Machireddy, N.; Mutlu, G.; Fang, Y.; Wu, D.; Zhao, Y.-Y. Decitabine Reactivation of FoxM1-Dependent Endothelial Regeneration and Vascular Repair for Potential Treatment of Elderly ARDS and COVID-19 Patients. bioRxiv 2021. [Google Scholar] [CrossRef]
- Zhang, Z. Five Critical Genes Related to Seven COVID-19 Subtypes: A Data Science Discovery. J. Data Sci. 2021, 19, 142–150. [Google Scholar] [CrossRef]
- Li, S.; Duan, X.; Li, Y.; Li, M.; Gao, Y.; Li, T.; Li, S.; Tan, L.; Shao, T.; Jeyarajan, A.J. Differentially expressed immune response genes in COVID-19 patients based on disease severity. Aging 2021, 13, 9265. [Google Scholar] [CrossRef]
- Wang, G.; Xiong, Z.; Yang, F.; Zheng, X.; Zong, W.; Li, R.; Bao, Y. Identification of COVID-19-Associated DNA Methylation Variations by Integrating Methylation Array and scRNA-Seq Data at Cell-Type Resolution. Genes 2022, 13, 1109. [Google Scholar] [CrossRef] [PubMed]
- Liu, P.; Fang, M.; Luo, Y.; Zheng, F.; Jin, Y.; Cheng, F.; Zhu, H.; Jin, X. Rare Variants in Inborn Errors of Immunity Genes Associated with COVID-19 Severity. Front. Cell. Infect. Microbiol. 2022, 12, 888582. [Google Scholar] [CrossRef]
- Desterke, C.; Turhan, A.G.; Bennaceur-Griscelli, A.; Griscelli, F. PPARγ cistrome repression during activation of lung monocyte-macrophages in severe COVID-19. iScience 2020, 23, 101611. [Google Scholar] [CrossRef]
- Pahima, H.; Zaffran, I.; Ben-Chetrit, E.; Jarjoui, A.; Gaur, P.; Manca, M.L.; Reichmann, D.; Orenbuch-Harroch, E.; Tiligada, E.; Puxeddu, I. COVID-19 patients are characterized by dysregulated levels of membrane and soluble CD48. Ann. Allergy Asthma Immunol. 2022; in press. [Google Scholar] [CrossRef]
- Westmeier, J.; Paniskaki, K.; Karaköse, Z.; Werner, T.; Sutter, K.; Dolff, S.; Overbeck, M.; Limmer, A.; Liu, J.; Zheng, X. Impaired cytotoxic CD8+ T cell response in elderly COVID-19 patients. mBio 2020, 11, e02243-20. [Google Scholar] [CrossRef]
- Zhang, J.-Y.; Wang, X.-M.; Xing, X.; Xu, Z.; Zhang, C.; Song, J.-W.; Fan, X.; Xia, P.; Fu, J.-L.; Wang, S.-Y. Single-cell landscape of immunological responses in patients with COVID-19. Nat. Immunol. 2020, 21, 1107–1118. [Google Scholar] [CrossRef] [PubMed]
- Meijer, L. Cyclin-dependent kinases inhibitors as potential anticancer, antineurodegenerative, antiviral and antiparasitic agents. Drug Resist. Updates 2000, 3, 83–88. [Google Scholar] [CrossRef] [PubMed]
- Bouhaddou, M.; Memon, D.; Meyer, B.; White, K.M.; Rezelj, V.V.; Marrero, M.C.; Polacco, B.J.; Melnyk, J.E.; Ulferts, S.; Kaake, R.M. The global phosphorylation landscape of SARS-CoV-2 infection. Cell 2020, 182, 685–712. [Google Scholar] [CrossRef]
- Habtemariam, S.; Nabavi, S.F.; Banach, M.; Berindan-Neagoe, I.; Sarkar, K.; Sil, P.C.; Nabavi, S.M. Should we try SARS-CoV-2 helicase inhibitors for COVID-19 therapy? Arch. Med. Res. 2020, 51, 733–735. [Google Scholar] [CrossRef]
- Li, G.; He, X.; Zhang, L.; Ran, Q.; Wang, J.; Xiong, A.; Wu, D.; Chen, F.; Sun, J.; Chang, C. Assessing ACE2 expression patterns in lung tissues in the pathogenesis of COVID-19. J. Autoimmun. 2020, 112, 102463. [Google Scholar] [CrossRef] [PubMed]
- Ke, Z.; Oton, J.; Qu, K.; Cortese, M.; Zila, V.; McKeane, L.; Nakane, T.; Zivanov, J.; Neufeldt, C.J.; Cerikan, B.; et al. Structures and distributions of SARS-CoV-2 spike proteins on intact virions. Nature 2020, 588, 498–502. [Google Scholar] [CrossRef]
- Ramaiah, M.J. mTOR inhibition and p53 activation, microRNAs: The possible therapy against pandemic COVID-19. Gene Rep. 2020, 20, 100765. [Google Scholar] [CrossRef]
- Li, S. Regulation of ribosomal proteins on viral infection. Cells 2019, 8, 508. [Google Scholar] [CrossRef]
Category | Sample Size |
---|---|
COVID-19 non-ICU | 50 |
COVID-19 ICU | 50 |
non-COVID-19 non-ICU | 10 |
non-COVID-19 ICU | 16 |
Classification Algorithm | Number of Features | ACC | MCC | Macro F1 | Weighted F1 |
---|---|---|---|---|---|
Decision tree | 13 | 0.6984 | 0.5515 | 0.7047 | 0.6996 |
104 a | 0.8095 | 0.7180 | 0.7841 | 0.8111 | |
145 | 0.7857 | 0.6823 | 0.7888 | 0.7874 | |
253 | 0.7619 | 0.6430 | 0.7893 | 0.7610 | |
k-nearest neighbor | 13 | 0.6746 | 0.5426 | 0.6821 | 0.6780 |
104 | 0.7222 | 0.5942 | 0.7535 | 0.7180 | |
145 a | 0.7540 | 0.6436 | 0.7834 | 0.7496 | |
253 | 0.7302 | 0.6099 | 0.7487 | 0.7264 | |
Random forest | 13 | 0.7619 | 0.6520 | 0.7716 | 0.7608 |
104 | 0.8175 | 0.7320 | 0.8093 | 0.8176 | |
145 | 0.7937 | 0.6976 | 0.7899 | 0.7935 | |
253 a | 0.8413 | 0.7656 | 0.8302 | 0.8421 | |
Support vector machine | 13 a | 0.8413 | 0.7669 | 0.8397 | 0.8420 |
104 | 0.7778 | 0.6688 | 0.7724 | 0.7778 | |
145 | 0.7778 | 0.6707 | 0.7910 | 0.7767 | |
253 | 0.8095 | 0.7199 | 0.7862 | 0.8113 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, X.; Zhou, X.; Ding, S.; Chen, L.; Feng, K.; Li, H.; Huang, T.; Cai, Y.-D. Identification of Transcriptome Biomarkers for Severe COVID-19 with Machine Learning Methods. Biomolecules 2022, 12, 1735. https://doi.org/10.3390/biom12121735
Li X, Zhou X, Ding S, Chen L, Feng K, Li H, Huang T, Cai Y-D. Identification of Transcriptome Biomarkers for Severe COVID-19 with Machine Learning Methods. Biomolecules. 2022; 12(12):1735. https://doi.org/10.3390/biom12121735
Chicago/Turabian StyleLi, Xiaohong, Xianchao Zhou, Shijian Ding, Lei Chen, Kaiyan Feng, Hao Li, Tao Huang, and Yu-Dong Cai. 2022. "Identification of Transcriptome Biomarkers for Severe COVID-19 with Machine Learning Methods" Biomolecules 12, no. 12: 1735. https://doi.org/10.3390/biom12121735