Next Article in Journal
Machine Learning in the Analysis of Carbon Dioxide Flow on a Site with Heterogeneous Vegetation
Next Article in Special Issue
Algorithm-Based Data Generation (ADG) Engine for Dual-Mode User Behavioral Data Analytics
Previous Article in Journal
Security Analysis and Enhancement of INTERBUS Protocol in ICS Based on Colored Petri Net
Previous Article in Special Issue
A New Social Media Analytics Method for Identifying Factors Contributing to COVID-19 Discussion Topics
 
 
Article
Peer-Review Record

Predicting COVID-19 Hospital Stays with Kolmogorov–Gabor Polynomials: Charting the Future of Care

Information 2023, 14(11), 590; https://doi.org/10.3390/info14110590
by Hamidreza Marateb 1, Mina Norouzirad 2, Kouhyar Tavakolian 3, Faezeh Aminorroaya 4, Mohammadreza Mohebbian 5, Miguel Ángel Mañanas 1,6, Sergio Romero Lafuente 1,6, Ramin Sami 7 and Marjan Mansourian 1,4,*
Reviewer 1:
Reviewer 2:
Reviewer 3:
Reviewer 4: Anonymous
Reviewer 5: Anonymous
Information 2023, 14(11), 590; https://doi.org/10.3390/info14110590
Submission received: 28 August 2023 / Revised: 9 October 2023 / Accepted: 23 October 2023 / Published: 31 October 2023
(This article belongs to the Special Issue Artificial Intelligence and Big Data Applications)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The study trained a Kolmogorov-Gabor polynomial using Regularized Least Squares and validated it on a dataset of 1600 COVID-19 patients. Overall, this paper presents a promising approach to predict the length of stay for COVID-19 patients using machine learning models. The results indicate the potential for improved resource management and patient care in future pandemics. However, there are still some concerns about this manuscipt.

(1)The author's background explanation lacks necessary contextualization of relevant research in the field.

(2)There are different variants of the SARS-CoV-2 virus. The original strain that author analysed,  pose a significant difference from the  current strain. Consequently, the conclusions drawn from the research may have limited practical relevance to current clinical scenarios.

(3)With only three folds, there is a risk of high variance in the evaluation results, which can lead to unstable or biased performance estimates. Additionally, the choice of how to divide the data into three folds can impact the results, as certain patterns in the data may not be adequately represented in each fold. This can limit the generalizability of the model's performance to unseen data. To mitigate these limitations, higher-fold cross-validation techniques or alternative validation approaches should be considered.

(4) It is crucial to note that the results of different variants of the SARS-CoV-2 virus cannot be directly compared. Therefore, it is necessary to confirm whether the results presented in Table 4 exclusively pertain to the initial strain of the SARS-CoV-2 coronavirus.

(5) Further validation and application of the model in different dataset are necessary to confirm its generalizability and effectiveness.

Author Response

Please see the attachment. Thank you.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

This Abstract for this manuscript outlined that the authors used ‘machine Learning models’ to detect the COVID-19 patients' length of stay in the hospital but in the methods the authors only refer to applying cross-validation which does not constitute a ML model. There was no discussion of your training/test sets nor inclusion of key metrics such as a confusion matrix etc. There was no discussion surrounding the imbalance in the author’s binary outcome which can affect the model performance as there are ML methods to handle this. It is unclear in the manuscript as to which model related to LOS was treated as continuous and which was when LOS was binary. This also applies to the performance metrics used as they differ according to the model used. These models need to be clearly articulated separately in both the methods and results.

INTRODUCTION

LINE 62-64: Please provide a number of references for your statements “Numerous studies have delved into predicting hospital resource needs for COVID 19 patients. A considerable number of these investigations have leveraged machine learning (ML) for this purpose.”

LINE 78: Multivariate analysis relates to analysis of multiple outcomes at the one time (e.g. MANOVA). Your models deal with only one outcome and multiple predictors so should be termed “multivariable”. Please amend throughout your manuscript.

MATERIALS AND METHODS:

Please include the units of measurement for your laboratory tests and justify your cut points for many of your measurements in Table 1 (e.g. temp, heart rate, BP, laboratory tests etc). It is also unclear why you treated some variables as binary and others as continuous (e.g. heart rate was binary but respiratory rate was continuous)

Please outline in more detail how you established for the source data if a patient had a certain comorbidity. For example, did you use ICD10 coding and if so, then this should be included in the methods section and the specific ICD codes per condition should be included as supplementary material.

LINE 120: Please clarify what you meant by “Symptoms were questioned”.

Please clarify why you discuss PCEP as these do not appear to have been used in your analysis.

Please ensure you are consistent with how you reference length of stay as sometimes you use LOS, other times LoS. Please correct.

Clinical justification is required for the naming of your quartile groups of LOS (≤ 7 days as normal, and > 7 days as prolonged LOS).

Equations: Please make sure you have explained all parameters in your equations (e.g. Equation 1: j; Equation 2: T; Equation 4: adj. etc)

It is very unclear how you did ‘one hot coding’ for your nominal and ordinal variables as Table 1 outlines binary variables that have been specifically grouped according to a cut point and there do not appear to be any nominal or ordinal variables that were one hot coded.

Please add a sentence providing a medical example of the use of Ridge regression elsewhere to justify its use.

Please outline why you chose three-fold cross validation when, generally, a minimum of five-fold cross validation is used. This appears to be quite a low cross-validation rate.

Your binary outcome LOS would have used a generalized linear model but your model definition appears to be a binary result.

Please include the probability cut-point used for your binary model diagnostics (e.g. Sensitivity, specificity etc)

RESULTS:

Table 1: it is unclear if you are referring to mean, SD or median, IQR for your continuous measurements (e.g. CCI, respiratory rate etc). Please ensure you are consistent with your reporting of decimal places in your table as you have some figures as you have the data to 4 decimal places (e.g Total CCI ‘2.1334’; Total respiratory rate ‘5.668’).Please include LOS as a continuous at the top of your Table 1 so the reader is provided with distributions for total and within categories.

Table 2: It seems strange that the difference between your LOS for ‘Cough’ was not significant (i.e. p=0.218) when you report a difference between the groups of 85% vs 31%. Can you please explain this further?

Figure 2: Please provide proportion of patients rather than number of patients as this would be a more meaningful graph.

LINE 238: Please check your sentence commencing with ‘… analysis on laboratory…’

Table 3: Please add in your methods the definitions of MAE1 and MAE2 to your mean/median AE.

Please provide the model results related to your parameters. Also, please provide ROC curves.

DISCUSSION:

Limitations: Please include the limited sample size as a limitation.

Comments on the Quality of English Language

Please check some minor English edits.

Author Response

Please see the attachment. Thank you.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

In this paper, Machine Learning models were developed to predict the duration of hospital stay for COVID-19 patients based on clinical and demographic variables collected at the time of hospital admission. The authors propose an approach based on  Kolmogorov-Gabor polynomial trained ok Regularized Least Squares and validated on a dataset of 1600 COVID-19 patients of Khorshid Hospital in the central province of Iran. The key factors that played a significant role in predicting the length of stay (LoS) included inflammatory markers, HCO3 (bicarbonate levels), and fever.

The article covers an interesting topic, and it is well-structured and fluently written. I would like to offer the following recommendations to the authors:

  1. Introduce some numerical statistics on beds.
  2. In the statement, "Numerous studies have focused on predicting hospital resource needs for COVID-19 patients," it would be beneficial to include citations of relevant studies.
  3. Include a table summarizing the most significant literature works, highlighting the pros and cons of various proposals and datasets used.
  4. It would be beneficial to present anonymized data from a single patient in its entirety. This would provide a comprehensive overview of the available features.
  5. I recommend emphasizing the limitations of the present study more clearly and outlining the possible future research directions that the authors intend to pursue.
  6. Make Table 4 more readable and format it according to the journal's style.
  7. It would be interesting if the authors could consider a development using metadata linking their results with evidence from image investigations. Some recent research products can make diagnoses using chest X-rays (see "A multi-modal bone suppression, lung segmentation, and classification approach for accurate COVID-19 detection using chest radiographs"). Combining these approaches could enable a more comprehensive follow-up.

I hope these suggestions are helpful to the authors in further improving their paper.

Comments on the Quality of English Language

Good

Author Response

Please see the attachment. Thank you.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The paper entitled "Predicting COVID-19 Hospital Stays with Kolmogorov-Gabor Polynomials: Charting the Future of Care" addresses an important and timely topic. In total, the manuscript is 20 pages long, including the references section and other information provided at the end of the manuscript. As my first impression of the paper is positive, after careful reading I would like to ask the authors to consider the following issues when revising the current version of the paper.

[Abstract]

1/ Not all abbreviations have been explained and developed before.

2/ Why did you choose the Kolmogorov-Gabor polynomial for the prediction task?

3/ What are the limitations of your study?

4/ What are the future research avenues?

[Introduction]

No research gap has been presented and discussed. Alternatively, more emphasis could be placed on the motives and needs for developing a predictive model and the potential beneficiaries of the results obtained. On the other hand, the authors could discuss the limitations of the hospital's capacity and how Covid-19 is destroying its ability to provide timely health services. Moreover, the fatigue and stress of the hospital personnel are also very important factors which should be underlined. Below, as a guideline I provide a few interesting papers which should be included in the qualitative analysis.

·        Sagherian, K., Steege, L. M., Cobb, S. J., & Cho, H. (2023). Insomnia, fatigue and psychosocial well‐being during COVID‐19 pandemic: A cross‐sectional survey of hospital nursing staff in the United States. Journal of clinical nursing, 32(15-16), 5382-5395.

·        Sasangohar, F., Jones, S. L., Masud, F. N., Vahidy, F. S., & Kash, B. A. (2020). Provider burnout and fatigue during the COVID-19 pandemic: lessons learned from a high-volume intensive care unit. Anesthesia and analgesia.

·        Sikaras, C., Ilias, I., Tselebis, A., Pachi, A., Zyga, S., Tsironi, M., ... & Panagiotou, A. (2022). Nursing staff fatigue and burnout during the COVID-19 pandemic in Greece. AIMS public health, 9(1), 94.

[Materials and Methods]

This section is well written and organised. However, in subsection 2.4, the authors should explain why they chose the Kolmogorov-Gabor polynomials for the prediction task. In fact, the authors should present a comprehensive analysis of the state of the art methods and finally briefly justify the choice based on their prior knowledge and experience. On the other hand, this is the limitation of your study. As they only used one method, no comparison with other methods (e.g. LSTM) was made. This is acceptable, but the authors should clearly discuss this issue. In this line of thinking, such a limitation imposes the research avenue that could concern the comparison of the existing methods using different datasets. It should also be noted that the authors made a comparison with other related works, indicating their main characteristics, as well as their strengths and weaknesses.

[Results]

This section is well written and organized. However, some figures are in color and some are in grey scale. The authors should have taken one approach and tried to be consistent.  

[Discussion]

The authors should consider adding more content to subsection 4.5 Limitations and Future Activities (honestly speaking, the naming of this subsection is quite unusual; in my papers I term this subsection as Limitations and Future Research). Before that, the authors should consider providing explicit contributions of their study.

[Conclusions]

This section is definitely too short. As a reader, I expect to be given the brief summary of your whole study. On the other hand, if you wish to be cited, please keep in mind that some readers tend to skip the whole content, expect the abstract and conclusions.

[My recommendation]

 

I think you have done a great job. Your paper makes interesting reading. However, the current version of your manuscript needs some improvements and corrections. In short, your paper needs a major revision.

Author Response

Please see the attachment. Thank you.

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

This study focuses on the development of Machine Learning models to predict the length of stay (LoS) of COVID-19 patients in hospitals based on clinical and demographic variables collected upon admission. The Kolmogorov-Gabor polynomial, trained using Regularized Least Squares, is employed to model LoS prediction.

The authors validated the model's performance using a dataset comprising 1600 COVID-19 patients admitted to a hospital in Iran. Cross-validation results reveal that the most influential factors in LoS prediction include inflammatory markers, HCO3 levels, and fever. The adjusted R2 value is 0.65 [95% CI: 0.58-0.71], and the Concordance Correlation Coefficient is 0.89 [0.88-0.90], demonstrating the model's robust predictive capability. The authors found that the estimation bias is statistically insignificant (P-value=0.09; paired-sample t-test).

Furthermore, the study extends its analysis to distinguish between "normal" LoS (≤ 7 days) and "prolonged" LoS (>7 days) groups, achieving a high level of balanced diagnostic accuracy and fair-to-good agreement rates. This research contribution holds the potential to empower hospitals and healthcare providers to optimize resource management, thereby enhancing their efficiency in responding to future pandemics and ultimately contributing to the preservation of lives.

 

While the subject of the paper is very actual and belongs to a hot research area, I recommend a few small changes before publication:

- The comparison of the distribution of LoS and the normal distribution given in Figure 1 is useless since LoS can have only positive values.

- To assess the feature importance, a decision tree could be a better choice.

- The method chosen by the authors needs an explanation. Why Kolmogorov-Gabor polynomials and not another method? Why this method is better in this case?

 

 

 

Comments on the Quality of English Language

The quality of English is good enough.

Author Response

Please see the attachment. Thank you.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The author has made significant improvements to the issues present in the original text

Author Response

We want to thank the reviewer for their insightful comments and positive remarks. The quality of the revised manuscript significantly improved, thanks to the reviewer’s valuable comments.

Reviewer 2 Report

Comments and Suggestions for Authors

I commend the authors on the extensive revision they have performed to dramatically improve their manuscript. There are still a number of revisions required before the manuscript is ready for publication. The English still requires review and editing. Further comments are below:

 Lines 116-120: Please clarify this study as it was not clear what the authors were predicting and the comparator. For example, “Working in a tertiary care hospital in China, Hong et al. (2020) [16] adopted a more focused approach with a set of 37 variables, emphasizing lymphocyte count, heart rate, and procalcitonin levels, among others”

Lines 122-123: Please clarify what Ebinger et al were classifying and include the model(s) they used.

Lines 128-133: It is not clear what Usher et al were predicting.

Line 187: Please amend this paragraph to past tense.

Lines 236-237: The following sentence does not make sense: “Such medical conditions were adopted by International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM)….”. Perhaps the authors meant “These medical conditions were classified using the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) codes

You have not responded correctly to my request to provide the cut-off probability used to classify your predictions to obtain your performance metrics. This cut value is used to classify the case into your predicted binary outcome and dictates your confusion matrix and this matrix can dramatically change according to this cut-value.

Table 1: Please ensure you have explained your statistics as some of these figures represent n (%) others appear to represent mean (sd). These need to be clearly articulated to the reader. Please provide a note at the bottom of the table as to what tests were performed for the generation of the p-values (e.g. ttest, chi-square).

Figure 1: Please check your figures as the x-axis label for the relative frequency has been cut off.

Figure 2.: Add y-axis label and values above the bars

Whilst it is potentially very beneficial to have sections 4.3 and 4.4 in the Discussion there is a mix of methods, results and discussion in these sections. These sections require a clearer introduction in the Discussion to orient these for the reader.

Comments on the Quality of English Language

This manuscript has improved extensively but requires further checking of the English.

Author Response

Please see the attachment. Thank you.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

Dear Authors.

Thank you for addressing all the issues raised.

Well done. 

Author Response

We want to thank the reviewer for their insightful comments and positive remarks. The quality of the revised manuscript significantly improved, thanks to the reviewer’s valuable comments.

Back to TopTop