Next Article in Journal
Multi-Physics Coupling Simulation Technique for Phase Stable Cables
Previous Article in Journal
Top-Gate Transparent Organic Synaptic Transistors Based on Co-Mingled Heterojunctions
Previous Article in Special Issue
A Novel Fusion Approach Consisting of GAN and State-of-Charge Estimator for Synthetic Battery Operation Data Generation
 
 
Article
Peer-Review Record

Nonparametric Generation of Synthetic Data Using Copulas

Electronics 2023, 12(7), 1601; https://doi.org/10.3390/electronics12071601
by Juan P. Restrepo, Juan Carlos Rivera, Henry Laniado *, Pablo Osorio and Omar A. Becerra
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2023, 12(7), 1601; https://doi.org/10.3390/electronics12071601
Submission received: 31 January 2023 / Revised: 15 March 2023 / Accepted: 16 March 2023 / Published: 29 March 2023
(This article belongs to the Special Issue Recent Advances in Synthetic Data Generation)

Round 1

Reviewer 1 Report

A method to generate synthetic data using copulas is proposed. 

The introductory section should focus more on the critical points of synthetic data generation methods in the literature and how the authors intend to address them in their research. For example, authors must synthetically better specify what performance advantages the use of copulas entails compared to methods that use only local information and why the proposed method has a high usability.

In section 2 it is necessary to review the use of all indexes and all variables. For example, of the partition of the closed interval [X[1]i, X[n]i] is X[1]i = a0 < a1 < · · · < at = X[n]i, nothing is said about choice of the number t of nodes. Furthermore, the index t remains unused in subsequent calculations.

Why is the text in lines 118121 and 125-137  in italics? If there is no specific reason, you need to change the font to normal.

In equation 5 the variable I must be defined; furthermore j is an index and not a variable. Again, you need to check the accuracy of the indexes and variables. The generic random variable in line 141 is reported as Xi i = 1,...,p, instead in equation 5 it is the index j which varies from 1 to p.

An analysis of the computational complexity of Algorithm 1 depending on N, n and p is needed.

Authors should discuss more thoroughly the experimental results, especially those shown in the scatter plots in Fig. 5.

In the conclusions, the authors must add a brief discussion on the benefits and on any critical points and limitations that emerged from the proposed method.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Synthetic data generation methods are a crucial part of the research on machine learning algorithms. The authors have proposed a novel non-parametric method for generating synthetic data. Their approach uses copulas - multivariate cumulative distribution functions for which the marginal probability distribution of each variable is uniform on the interval [0, 1]. The paper is well-structured. It starts from the review of related research works, based on which the research problem is formulated. Then the authors propose a novel method, which is well described and formally defined. The proposed method was experimentally verified. The multivariate homogeneity test demonstrated that the marginal and joint distributions of the real data were maintained by the data generated with the use of the proposed method. The paper is interesting, and the proposed method will be particularly useful in the research on machine learning methods. The English language should be improved when it comes to grammar and style.

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

1.     The manuscript is concerned with nonparametric generation of synthetic data using copulas, which is interesting. It is relevant and within the scope of the journal.

2.     However, the manuscript, in its present form, contains several weaknesses. Adequate revisions to the following points should be undertaken in order to justify recommendation for publication.

3.     Full names should be shown for all abbreviations in their first occurrence in texts. For example, DD in p.3, etc.

4.     For readers to quickly catch the contribution in this work, it would be better to highlight major difficulties and challenges, and your original achievements to overcome them, in a clearer way in abstract and introduction.

5.     p.1 - a nonparametric method using copulas is adopted for generating synthetic data. What are other feasible alternatives? What are the advantages of adopting this approach over others in this case? How will this affect the results? The authors should provide more details on this.

6.     p.5 - two examples are adopted in the experiments. What are the other feasible alternatives? What are the advantages of adopting these examples over others in this case? How will this affect the results? More details should be furnished.

7.     p.9 - several experiments are adopted to illustrate the goodness of the method. What are the other feasible alternatives? What are the advantages of adopting these experiments over others in this case? How will this affect the results? More details should be furnished.

8.     p.9 - Liu et al. [25] is adopted for the homogeneity test. What are other feasible alternatives? What are the advantages of adopting this approach over others in this case? How will this affect the results? The authors should provide more details on this.

9.     p.10 - five pairs of scatter plots of specific patterns are adopted for demonstration. What are the other feasible alternatives? What are the advantages of adopting these patterns over others in this case? How will this affect the results? More details should be furnished.

10.  Some key model parameters are not mentioned. The rationale on the choice of the set of parameters should be explained with more details. Have the authors experimented with other sets of values? What are the sensitivities of these parameters on the results?

11.  The discussion section in the present form is relatively weak and should be strengthened with more details and justifications.

12.  Some assumptions are stated in various sections. More justifications should be provided on these assumptions. Evaluation on how they will affect the results should be made.

13.  Moreover, the manuscript could be substantially improved by relying and citing more on recent literature about real-life applications of soft computing techniques in different fields such as the following. Discussions about result comparison and/or incorporation of those concepts in your works are encouraged:

          Banan, A., et al., “Deep learning-based appearance features extraction for automated carp species identification,” Aquacultural Engineering 89: 102053 2020.

          Fan, Y.J., et al., “Spatiotemporal modeling for nonlinear distributed thermal processes based on KL decomposition, MLP and LSTM network,” IEEE Access 8: 25111-25121 2020.

          Afan, H., et al., “Modeling the fluctuations of groundwater level by employing ensemble deep learning techniques,” Engineering Applications of Computational Fluid Mechanics 15 (1): 1420-1439 2021.

14.  Some inconsistencies and minor errors that needed attention are:

          Replace “…Recall that if one wants to generate…” with “…It is recalled that if one wants to generate…” in line 108 of p.5

          Replace “…According to Sklar’s theorem, see Sklar [27] for more details, any…” with “…According to Sklar’s theorem [27], any …” in line 149 of p.7

          and more…

15.  In the conclusion section, the limitations of this study and suggested improvements of this work should be highlighted.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Authors took into account all my suggestions, significantly improving the quality of their manuscript. I don't have further issues with this paper.

Author Response

Your insightful comments and constructive criticism have undoubtedly enhanced the quality and impact of our research.

We are truly grateful for your time and effort in carefully reviewing and analyzing our work.

Reviewer 3 Report

The most significant comments in the last review (including novelty, major difficulties and challenges, their original achievements to overcome them, etc.) have not been demonstrated satisfactorily.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 3

Reviewer 3 Report

The most significant comments in the last review (including novelty, major difficulties and challenges, their original achievements to overcome them, etc.) have not been demonstrated satisfactorily.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop