Next Article in Journal
Existence and Uniqueness of Nontrivial Periodic Solutions to a Discrete Switching Model
Next Article in Special Issue
A New Generalized t Distribution Based on a Distribution Construction Method
Previous Article in Journal
A Metaheuristic Algorithm for Flexible Energy Storage Management in Residential Electricity Distribution Grids
 
 
Article
Peer-Review Record

A More Accurate Estimation of Semiparametric Logistic Regression

Mathematics 2021, 9(19), 2376; https://doi.org/10.3390/math9192376
by Xia Zheng †, Yaohua Rong *,†, Ling Liu and Weihu Cheng
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Mathematics 2021, 9(19), 2376; https://doi.org/10.3390/math9192376
Submission received: 25 August 2021 / Revised: 20 September 2021 / Accepted: 22 September 2021 / Published: 24 September 2021
(This article belongs to the Special Issue Computational Statistics and Data Analysis)

Round 1

Reviewer 1 Report

Please refer to the appended file.

Comments for author File: Comments.pdf

Author Response

Response to Reviewer 1 Comments

We would like to thank the Editor, the Associate Editor and the two referees for your constructive comments/suggestions. In light of the comments, we have made a thorough revision of the paper that we believe has led to significant improvement of the quality and the presentation of the paper. We respond below to your comments/suggestions point by point.

1. The goal is NOT specified explicitly enough. Authors should explain why the document was prepared, who should profit from the new knowledge acquired in their investigations. Un- fortunately, the Authors fail to specify WHICH TYPE OF BENEFITS and WHAT TYPE OF STAKEHOLDERS; the description of their inquiry results may be of interest or support the managerial and policy decisions. Authors do not use either of the words: goal; task, or purpose.

Response. Thank you for this suggestion. And following this suggestion, we have added some details in the abstract and introduction sections to specify our research aim as clearly as we can; please see lines 3-4 and 78-81 of the revised manuscript.

2. Besides, there is some misunderstanding in the authors’ formulation. Authors claim we are interested in how (...) variables affect disease outcomes. Variables are just representations of real-life phenomena; as such, variables cannot affect anything. Such formulations are imprecise (jargon).

Response. Thank you very much for pointing out this problem. We have modified the content, emphasizing we focus on the relationship between predictors and disease outcomes; please see lines 95-96 of the revised manuscript.

3. Abstract and introduction should be modified. The research problem and research goals were NOT identified in the work. The selection of the theoretical basis of the research was NOT appropriately described and justified. Therefore, the methods of literature selection are not precisely explained.

Response. Thank you very much for your kind suggestion. And following this suggestion, we have added some details in the abstract and introduction sections to specify our research purpose as clearly as we can; please see lines 3-4 and 78-81 of the revised manuscript.

4. There is a lack of data description. The definition of the data point is missing. The demo- graphic and socioeconomic characteristics of the data source are missing. The authors fail to describe the respondents. Authors should elaborate description of the data collection process and data set composition.

Response. Thanks for your remark. We consider a breast cancer data (GSE70947) in this paper obtained from the CuMiDa database (https://sbcb.inf.ufrgs.br/cumida). We have expression data on 35981 genes from 289 observations which are composed of 143 patients with breast cancer and 146 healthy individuals.

5. Authors should try and distinguish which statements are the Authors’ opinion, the literature knowledge, and the analysis outcome. The used literature references are (almost exclusively) referred to in such a manner that it is not clear why the publication is cited. Usually, there are no details, whether individual, mentioned authors support Authors theses and findings. Authors should specify which references support their position and why, which are in opposi- tion to their conclusions and why. There is no generalisation effort in the literature review. Authors should reformulate the text of the literature review. Instead of being purely reporting, descriptive, the text’s style needs to be analytical, with generalising indications.

Response. Thank you very much for your suggestion. In our opinion, we describe the literature review based on the strengths and defects of different types of models. First, we introduce the logistic regression model, and draw out the main body of our model. However, logistic regression is a model with linear structure which may be insufficient to capture the complicated relationship between responses and predictors. To release the linear assumption, some generalized additive models which are nonparametric methods have been presented. But the additive structure may still be not flexible enough for some real data. As a newly popular nonparametric method, kernel machine method only makes a qualitative assumption on the nonparametric function rather than decides a predetermined form or basis, and hence greatly simplifies specification of a nonparametric function, especially for multidimensional data. However, some early kernel machine methods can not achieve variable selection. Thus, the prediction accuracies of these methods would be decreased when redundant predictors were contained in the models. Recently, some variable selection based on kernel machine methods are proposed. But these variable selection methods are based on fully nonparametric models. Moreover, the assumption of general kernel function is not feasible enough. Thus, we propose a flexible variable selection procedure for semiparametric logistic model based on a new class of garrotized kernels. Therefore, our literature review focuses more on the differences and connections between different types of models.

6. The authors use statistical and econometric techniques designed for metric scale. However intelligent – statistical and econometric software cannot distinguish whether the data is mea- sured on a strong, metric scale (ratio or interval) or comes from the weaker, nonmetric measurement scale (ordinal or nominal). The authors should consult an expert, whether their information is suitable for the selected quantitative analytical tools. Most of the techniques are designed for metric data, and it seems that the measurement results are nonmetric in the analysed dataset.

Response. This is a good point. In this paper, we focus on continuous variables. Categorical variables are interesting and challenging. For redundant categorical variables, we should not consider them in our model. And for the other categorical variables, we are wondering if adding a dummy white noise to the categorical variables or modeling separately for different scales may help us to find the answer. We will try to follow this research direction in the future.

7. The authors should discuss the measurement scales used for the predictors’ xi and zi measure- ments. Especially for clinical and demographic variables and gene variables, the values tend to be measured on weak measurement scales, ordinal or nominal (categorical). Usually, they are not measured on metric scales, i.e. ratio or interval.

Response. We only consider continuous clinical and demographic variables in this paper. Besides, some gene variables may be measured on weak measurement scales, but the gene variables we consider in this paper are measured on metric scales. So the predictors in this paper are all continuous. In terms of categorical predictors, we will conduct it in future research.

8. Authors should provide detailed information on the standardisation procedure used. The choice of standardisation procedure is crucial for the properties preservation of standardised data, e.g. distribution characteristics.

Response. Thank you very much. In this paper, we focus on continuous predictors and thus we use a zero-mean normalization method to standard the predictors.

9. Usually, the patients’ description variables are sex, marital status, residence category (big city, small town, village), etc. Authors should explain precisely how they standardise the data and how they feed the formula with the data.

Response. Since the predictors in this paper are all continuous, we don’t consider the standardization procedure for categorical variables. But we will conduct research in terms of categorical predictors in future research.

10. Authors should check whether their algorithm is neutral towards the optimisation starting points. Authors should check whether their algorithm is neutral towards the order in which dimensions are removed. Most results of the dimension reduction techniques are not robust against the order in which variables (dimensions) are removed. Authors should touch upon the dataset completeness (missing data issue).

Response. This is a good point. First, according to the experimental results, it is found that our algorithm is neutral towards the optimization starting points. When setting different initial values, the simulation results do not change significantly. Second, we apply the regu- larization methods to realize variable selection in this paper. By shrinking some regression coefficients to exactly zero, the regularization methods can select important variables simul- taneously; e.g. Suppose we have five predictors xi. It is possible that the coefficients of two predictors are estimated to be 0 by the regularization methods, so these two predictors are removed from the model, and we select the remaining three predictors. This process occurs at the same time. So our variable selection process is not sequential but simultaneous. Finally, the real data we analyze in Section 4 is complete and has no missing data. If the real data is not complete, we will fill the missing data with the multivariate imputation method.

11. The authors should discuss comparisons criteria. The improvement of the cross-entropy loss measure may be paid with the substantially increased complication of the computation (time) or is considered unimportant (or essential enough) by the end-user of the results. Authors should try and prove that the CEL value reduction is deemed worth investing in additional work.

Response. Thanks for your remark. This is a excellent point. In fact, by comparing with other competing methods, our method can obtain a lower CEL without costing significant addition computation cost. And our method only takes a little longer. Therefore, our method always yields higher prediction accuracy compared to the competing method and does not cost considerably much time. And CEL is an important evaluation criterion. It is a general widely used prediction measure in classification problems and very convenient to use.

12. The sentence should be extended; e.g. In this paper, we have proposed a PLGKM method for semiparametric logistic model based on the LASSO method and an innovative class of garrotized kernel functions, suitable for the data measured on a strong, metric scale.

Response. Thank you very much for pointing out the deficiencies in the details, we have modified this sentence; please see line 340 of the revised manuscript.

13. but also allow predictors interactions
      Is this a unique feature of the Authors proposal?

Response. Yes, it is one of the advantages of our PLGKM method. To our knowledge, few methods can achieve variable selection while allowing predictors interactions.

14. Authors did not formulate Conclusions that are anchored in the research results. The text of the part contains NO conclusions. The text should be added with elements that are anchored in research findings. The text should contain analytical statements describing the MERIT TOPIC. The further research recommendation should (might) be included.

Response. Thank you very much for your remark. This comment is very important and helpful. Exactly, the last section is a conclusion, but we have forgotten to change the section name (discussion) which is used in our draft. And now, we have modified this section name to be conclusions. And following this suggestion, we have added the directions of future research in our revised manuscript; please see the last paragraph of the Conclusions Section.

Author Response File: Author Response.pdf

Reviewer 2 Report

The Authors propose a novel procedure of estimation of semiparametric logistic regression. The manuscript is generally well-written and has good scientific potential, however, it needs some minor revisions that will improve its quality. I have two main remarks. First, in the section, where the Authors describe the methodology and present the formulas (pages 3-6), they should describe used symbols, for example q, Q, "zu" in equation (2), or c in the Polynomial Kernel. I know that these notation can be explained in the source literature, but the manuscript should be standalone as much, as possible, therefore all used symbols should be explained in the text. Second, when the Authors present the Discussion, it looks more like Conclusions. Discussion should consist in comparing the obtained results and applied methodology with similar researches conducted in the past, thus indicating its contribution to the literature. Please add the Discussion (can be merged with the Results section) and rename the Discussion with Conclusions. I also have two minor remarks: First - specify the aim of the research in the Abstract and the Introduction sections (it would also be a good idea to add the research hypothesis in the Introduction section). Second - add the directions of future research in the Conclusions section.

Author Response

Response to Reviewer 2 Comments

We would like to thank the Editor, the Associate Editor and the two referees for your constructive comments/suggestions. In light of the comments, we have made a thorough revision of the paper that we believe has led to significant improvement of the quality and the presentation of the paper. We respond below to your comments/suggestions point by point.

1. In the section, where the Authors describe the methodology and present the formulas (pages 3-6), they should describe used symbols, for example q, Q,“zu” in equation (2), or c in the Polynomial Kernel. I know that these notation can be explained in the source literature, but the manuscript should be standalone as much, as possible, therefore all used symbols should be explained in the text.

Response. Thank you very much for pointing out this problem. Clear symbol description is important and makes the text easy to follow. For these symbols, we have added corresponding supplementary description; please see lines 104-106, 117 of the revised manuscript. And we have checked all the other symbols in pages 3-6 to make sure that they are clear enough.

2. When the Authors present the Discussion, it looks more like Conclusions. Discussion should consist in comparing the obtained results and applied methodology with similar researches con- ducted in the past, thus indicating its contribution to the literature. Please add the Discussion (can be merged with the Results section) and rename the Discussion with Conclusions.

Response. Thank you very much for your remark. This comment is very important and helpful. Exactly, this section is a conclusion, but we have forgotten to change the section name (discussion) which is used in our draft. And now, we have modified this section name to be conclusions.

3. Specify the aim of the research in the Abstract and the Introduction sections.

Response. Thank you for this suggestion. And following this suggestion, we have added some details in the abstract and introduction sections to specify our research aim as clearly as we can; please see lines 3-4 and 78-81 of the revised manuscript.

4. Add the directions of future research in the Conclusions section.

Response. Thank you very much for your remark. We have added the directions of future research in our revised manuscript; please see the last paragraph of the Conclusions Section.  

Author Response File: Author Response.pdf

Back to TopTop