Next Article in Journal
Climate Change: Linear and Nonlinear Causality Analysis
Next Article in Special Issue
Modeling Model Misspecification in Structural Equation Models
Previous Article in Journal
Big Data Analytics and Machine Learning in Supply Chain 4.0: A Literature Review
Previous Article in Special Issue
Model Selection with Missing Data Embedded in Missing-at-Random Data
 
 
Article
Peer-Review Record

Combining Probability and Nonprobability Samples by Using Multivariate Mass Imputation Approaches with Application to Biomedical Research

Stats 2023, 6(2), 617-625; https://doi.org/10.3390/stats6020039
by Sixia Chen 1,*, Alexandra May Woodruff 1, Janis Campbell 1, Sara Vesely 1, Zheng Xu 2 and Cuyler Snider 3
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Stats 2023, 6(2), 617-625; https://doi.org/10.3390/stats6020039
Submission received: 10 April 2023 / Revised: 29 April 2023 / Accepted: 5 May 2023 / Published: 8 May 2023
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)

Round 1

Reviewer 1 Report

  1. Could you please add some more details about the imputation methods: latent joint multivariate normal model mass imputation (e.g. Generalized Efficient Regression-Based Imputation with Latent Processes (GERBIL)) and fully conditional specification (FCS)?
  2. What are the main advantages of the fully conditional specification imputation method?

Author Response

  1. Could you please add some more details about the imputation methods: latent joint multivariate normal model mass imputation (e.g. Generalized Efficient Regression-Based Imputation with Latent Processes (GERBIL)) and fully conditional specification (FCS)?

Reply: Thanks for the comments. We have added more detailed description of both methods in a new section named ‘Multivariate mass imputation approaches’ in the revised paper

  1. What are the main advantages of the fully conditional specification imputation method?

Reply: We have added the following sentences in the new section named ‘Multivariate mass imputation approaches’ to describe it: “Practically, the FCS has advantage in terms of modeling flexibility since it is relatively easier to model the conditional distribution for unit study item instead of the joint distribution of study variable vector.”

Reviewer 2 Report

The paper is dealing with an interesting topic in the field of survey data analysis. The proposed methods are useful and the paper is fairly well written. To further highlight the contribution of the paper, for the real data application study, it would help to add more comments to clarify what is the added value of the new method compared to the existing methods.

Author Response

The paper is dealing with an interesting topic in the field of survey data analysis. The proposed methods are useful and the paper is fairly well written. To further highlight the contribution of the paper, for the real data application study, it would help to add more comments to clarify what is the added value of the new method compared to the existing methods.

Reply: Thanks for the comments. We have the following sentences in the results section to clarify the added value of the new method: “In practice, researchers only considered Naïve methods by using unweighted TBRFSS data file only. In this application, we were the first to show the advantages of multivariate mass imputation methods. In addition, we were the first to compare different multivariate mass imputation methods and provide empirical evidence for other researchers.”

 

Reviewer 3 Report


Comments for author File: Comments.docx


Author Response

  1. It would be helpful to use examples to show what are nonprobability samples, especially to readers not in the field.

Reply: Thanks for the comments. We have already provided the following examples to show what are nonprobability samples in the introduction. “Nonprobability samples have been used frequently in practice. To name a few, Pew Research Centre (http://www.pewresearch.org) provides 2015 dataset consists of nine nonprobability samples with a total of 9,301 individuals and a wide range of measurements over 56 variables related to economics, social economics, and health behaviors. 2019 Tribal Behavioral Risk Factor Surveillance System (TBRFSS) conduced by Oklahoma Tribal Epidemiology Center used a mix of convenience sampling by attending tribal events in person, over email, and through website availability [8]. TBRFSS collects health related information for Native American population who lived in Kansas, Oklahoma, and Texas. Another example is that [9] estimated the national Criminal Justice Attitudes from five online non-probability samples drawn either from Amazon Mechanical Turk or an opt-in panel.” In addition, to make it more clearer, we added one final sentence at the end as following “In summary, nonprobability samples include all types of samples where random selection process is lacking.”

 

  1. On page 3, line 119, why consider using 10 imputations, not other numbers?

Reply: According to previous literature such as Rubin (2004), it is sufficient to only use 5 to 10 imputed values for multiple imputation in practice. We also added the following sentence in the Monte Carlo Simulation Study section of the paper: “[25] suggested that it is sufficient to use 5 to 10 imputed values for multiple imputation in practice.”

  1. Are there limitations or drawbacks for the proposed mass imputation method?

Reply: We added the following sentence in the conclusion section “Drawbacks of multivariate mass imputation methods include the correct model assumptions of imputation models and the treatment for high dimensional and big data. Existing mass imputation methods can only handle low dimensional small/medium data files.”

 

Reference:

Rubin, D. B. (2004). Multiple imputation for nonresponse in surveys (Vol. 81). John Wiley & Sons.

Round 2

Reviewer 1 Report

I don't have further comments or suggestions. The article can be published. 

Reviewer 3 Report

No

Back to TopTop