Next Article in Journal
Mathematical Model Investigation of a Technological Structure for Personal Data Protection
Next Article in Special Issue
Reversible Transitions in a Fluctuation Assay Modify the Tail of Luria–Delbrück Distribution
Previous Article in Journal
A Novel Decomposition-Based Multi-Objective Evolutionary Algorithm with Dual-Population and Adaptive Weight Strategy
Previous Article in Special Issue
Numbers of Mutations within Multicellular Bodies: Why It Matters
 
 
Article
Peer-Review Record

SimuBP: A Simulator of Population Dynamics and Mutations Based on Branching Processes

by Xiaowei Wu
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Submission received: 28 December 2022 / Revised: 14 January 2023 / Accepted: 16 January 2023 / Published: 18 January 2023

Round 1

Reviewer 1 Report

Having read the manuscript "SimuBP: A Simulator of Population Dynamics and Mutations based on Branching Processes", I have two critical comments. The first relates to the presentation of the model(s); the other concerns Algorithms 2-3.

The presentation of the model/models is somewhat confusing. Algorithm 1 has imputs mu1 and mu2. There's no mu2 in the Figure 1 schematic. Algorithm 2 has imput mu, a1, a2. Is mu the same as mu1 in Figure 1? The proliferation rates a1 and a2 are not in the schematic. Same for Algorithm 3. Figure 1 distinguishes between pre and post division mutations. How is the dichotomy reflected in Algorithms 1-3?

Algorithms 2 and 3 contain unjustified assumptions. For example, why should the total number of offspring be geometrically distributed in the non neutral mutation case? Or, why should the number of mutations in be a sum of Bernoulli trials (Algorithm 2, Step 3)? Why is this number be approximated by its mean in Step 2 of Algorithm 3? If these are approximations, as I believe them to be, then one should say under what conditions they are valid.

Author Response

1. The presentation of the model/models is somewhat confusing. Algorithm 1 has iuputs mu1 and mu2. There's no mu2 in the Figure 1 schematic. Algorithm 2 has input mu, a1, a2. Is mu the same as mu1 in Figure 1? The proliferation rates a1 and a2 are not in the schematic. Same for Algorithm 3. Figure 1 distinguishes between pre and post division mutations. How is the dichotomy reflected in Algorithms 1-3?

Response: As described in Section 2.1, lines 103-123, there are mainly two different models considered in the manuscript: the General Two-type Branching Process (GTBP) and the Simplified Two-type Branching Process (STBP). We need to pay special attention on which model these algorithms and the Figure 1 schematic are based on. Here are some clarifications to your questions:
(1) As stated in lines 138-139, Figure 1 illustrates the difference between cell mutations in the Kendall’s two-type branching process (KTBP) and STBP. Both models assume only forward mutations, so only mu1 appeares in Figure 1. Also the figure is only about cell mutations, so the proliferation rates of the wild-type and mutant cells are not included in the schematic.
(2) As stated in lines 143-144, Algorithm 1 is based on GTBP. GTBP allows both forward and backward mutations (denoted by mu1 and mu2, respectively).
(3) As stated in line 197, both Algorithms 2 and 3 are based on STBP. Since STBP does not consider backward mutations, for convenience, the mutation probability of the wild-type cell is denoted by mu (see lines 170-172). So mu in STBP is the same as mu1 in GTBP.
(4) About how the dichotomy is reflected in Algorithms 1-3, it was decribed in Step 3(a) of Algorithm 1 that "Based on the offspring distribution(s), generate the numbers of offspring for the wild-type and mutant cells in current generation", so Algorithm 1 generates the actual number of offspring for each cell from multinomial distribution by using the offspring distribution. For Algorithm 2, it was described in its Step 2 that "Count the number of elements in {Si} by ni. Because of the binary-fission property, the population size at tp, initiated by the ith cell is (ni + 1)", so Algorithm 2 automatically adopts the binary-fission setting on cell divisions. For Algorithm 3, it was described in its Step 1 that "Generate the population size zt at tp by summing up z0 random numbers, each drawn from geo(e^{−a1*tp})", so Algorithm 3 also assumes the binary-fission MBP model. Lastly, after cell proliferation is simulated, Algorithm 1 directly applies mutation (both forward and backward) to each cell by generating a Bernoulli random number (see Step 2 in Algorithm 1), but Algorithms 2 and 3 apply mutations independently to the entire population of wild-type cells (see Step 3 in Algorithm 2 and Step 2 in Algorithm 3) by generating Bernoulli random numbers. Note that, the generation of mutant cells or mutation events in all three algorithms can be either pre and post division, and that is why a footnote on "two times" was added on page 6.

2. Algorithms 2 and 3 contain unjustified assumptions. For example, why should the total number of offspring be geometrically distributed in the non neutral mutation case? Or, why should the number of mutations in be a sum of Bernoulli trials (Algorithm 2, Step 3)? Why is this number be approximated by its mean in Step 2 of Algorithm 3? If these are approximations, as I believe them to be, then one should say under what conditions they are valid.

Response: As clarified in previous response, Algorithms 2 and 3 are based on the STBP model, which requires the assumption of binary-fission MBP (as seen in lines 119-122). The geometrically distributed population size is a result of binary-fission MBP, see Eq. (1) on page 5. Detailed derivation of this result can be found in Appendix A.1. About the number of mutations in Algorithms 2 and 3, we may imagine to answer such a question "Among all zt-z0 or zt (because z0 is small) cell division events, how many of them contain mutations?" Clearly the resulting number of mutation for each wild-type cell is a Bernoulli trail, so the total number of mutations is a sum of Bernoulli trials. As for the last question, the number of mutations in Algorithm 3 is a simulated random number, not approximated by the mean of a random variable. To see this, we need to understand the difference between Algorithm 3 and Salvador (see page 240 in reference [9], Zheng 2002). As described in lines 205-209 that, Algorithm 3 "generates the number of wild-type cells zt at the time of plating from geometric growth rather than treating zt as input, and replaces the Poisson distributed number of mutations based on deterministic growth by the actual number of mutations based on stochastic growth". Salvador uses zt as input and assumes deterministic growth, so it generates the number of mutations from Poisson distribution with mean specified by the expected number of mutations mu*zt (note, here zt is fixed, representing the mean wild-type population size); however, Algorithm 3 uses tp as input and assumes stochastic growth, so it generates zt first from geometric distribution (note, here zt is already random, representing the actual wild-type population size), and then generate the number of mutations directly from mu*zt.

Reviewer 2 Report

This paper discusses a new technology of the application of fluctuation analysis (FA) called SimuBP whose design is to simulate cell proliferation and mutagenesis through the branching process in microbial populations. This is a very interesting and novel application of in silico methods for FA.

The introduction is a well-written and appropriate documentation of the history of FA from both in vivo and in silico applications as well as the computational and statistical limits of FA in these approaches. The intro is an appropriate length and ends with clear objective statements for the purpose of the paper. 

Please describe why SimuBP is a "flexible" simulator? 

The methods section of the paper is well organized. The schematic plot is helpful to understand the 2 types of processes described.

-Please explain i.i.d (independent and identically distributed) so that the reader doesn't have to look up this variable to understand the rest of the branching process being described. 

I think an additional table/figure in this section with the components of the 4 input arguments of the SimuBP code in R and how they relate back to the component definitions earlier in this section would be very useful for someone trying to make practical use/understanding of this code. 

Additional citations are needed for paragraph 1 of page 3 describing the branching components since the described processes are established in statistical literature and therefore should be cited.

Please relate the components of algorithm 1 in the table back to the portions of the SimuBP code in R to make it easier to understand for the reader. There is a lot of statistical terminology and components to understand and keep track of so any additional visualizations and comparisons to the actual R program would be beneficial.

There is no mention of the subdivisions of the algorithms at the beginning of the methods (ie S3a and S3b) so please update the components of the simulations at the beginning and what the components are for.

The figures in the results (Figure 2) are not stand alone and need additional footnotes and text to understand the plots and what is being described in the algothms/simulation. Please label the X and Y axis.

Same comment for Figures 3-7 and Table 1. Please list the components of the equations/algorithm so they are stand alone and the reader doesn't have to refer to the text to understand the visuals.

For Figure 7, I don't remember a heatmap or a few other of the figures In the results described in the methods. Please update.

Are the results present in the Appendix mentioned in the results section and methods? Please also make the figures in the appendix stand-alone so that the reader doesn't have to look up what 'ECDF' and 'LD' are. I see the abbreviations in the appendix which is fine for the text but figures need to be stand-alone.

The discussion and conclusions could be extended further although most of the components are present. What do you see as the true applicability of SimuBP given the exhaustive computational challenges? 

Same comment for algorithms 2 and 3.

Author Response

1. Please describe why SimuBP is a "flexible" simulator? 

Response: Briefly speaking, SimuBP is flexible because it is based on the General Two-type Branching Process (GTBP) model. As described in lines 103-112, there are only two fundamental rules for GTBP, characterizing necessary assumptions for cell proliferation and mutagenesis, respectively. Such a model could be considered as the basic "platform" on which other more specific FA process can be defined, including the traditional Luria-Delbruck model and others allowing various settings on cell proliferation rates or mutation rates. The flexibility of SimuBP is described in multiple places in the manuscript, for example, in lines 65-69, 217-226, 353-360.

The methods section of the paper is well organized. The schematic plot is helpful to understand the 2 types of processes described.

2. Please explain i.i.d (independent and identically distributed) so that the reader doesn't have to look up this variable to understand the rest of the branching process being described. 

Response: Thank you for the comment. I have added the full term in lines 83-84 where i.i.d. first appears, and also to the abbreviation list.

3. I think an additional table/figure in this section with the components of the 4 input arguments of the SimuBP code in R and how they relate back to the component definitions earlier in this section would be very useful for someone trying to make practical use/understanding of this code. 

Response: Thank you for the suggestion. I have added a new Figure 2 to illustrate the input arguments of the SimuBP algorithm.

4. Additional citations are needed for paragraph 1 of page 3 describing the branching components since the described processes are established in statistical literature and therefore should be cited.

Response: Thank you for the comment. I have included additional citations for the branching process terms appearing in Section 2.1.

5. Please relate the components of algorithm 1 in the table back to the portions of the SimuBP code in R to make it easier to understand for the reader. There is a lot of statistical terminology and components to understand and keep track of so any additional visualizations and comparisons to the actual R program would be beneficial.

Response: I agree that additional visualizations to the actual R program would be beneficial. The newly added Figure 2 can be used to relate the components of Algorithm 1 to the portions of the SimuBP code in R.

6. There is no mention of the subdivisions of the algorithms at the beginning of the methods (ie S3a and S3b) so please update the components of the simulations at the beginning and what the components are for.

Response: Thank you. I have modified lines 227-228 to mention the subdivisions S3a and S3b of Simulation S3.

7. The figures in the results (Figure 2) are not stand alone and need additional footnotes and text to understand the plots and what is being described in the algorithms/simulation. Please label the X and Y axis.

Same comment for Figures 3-7 and Table 1. Please list the components of the equations/algorithm so they are stand alone and the reader doesn't have to refer to the text to understand the visuals.

Response: Thank you. I have modified and double checked all Figures 2-7 and Table 1 (legends, symbols, axis labels) to make sure they are now stand alone.

8. For Figure 7, I don't remember a heatmap or a few other of the figures In the results described in the methods. Please update.

Response: Thank you. I have updated the text in Section 2.3 to describe the heatmap in Figure 7, see lines 239-241.

9. Are the results present in the Appendix mentioned in the results section and methods? Please also make the figures in the appendix stand-alone so that the reader doesn't have to look up what 'ECDF' and 'LD' are. I see the abbreviations in the appendix which is fine for the text but figures need to be stand-alone.

Response: Yes, the results present in the Appendix were mentioned in line 179 (for Appendix A.1), line 271 (for Appendix A.2), and line 289 (for Appendix A.3). According to your suggestion, I have made Figure A1 and Figure A2 in the Appendix stand-alone.

10. The discussion and conclusions could be extended further although most of the components are present. What do you see as the true applicability of SimuBP given the exhaustive computational challenges? 

Same comment for algorithms 2 and 3.

Response: As described in lines 353-362, the true applicability of SimuBP lies in its capability of taking into account realistic experimental conditions, whereas Algorithms 2 and 3 are both based strictly on the settings of Simplified Two-type Branching Process (STBP) for traditional Luria-Delbrück experiments thus may not be applicable to more general, real fluctuation experimental data.

Reviewer 3 Report

The manuscript provides a new simulation technique that can be used to simulate different population models. Different algorithms are provided, tested and compared.

The manuscript is well written and presented in a good format. I advise to accept it as it is.

Only I have a recommendation for minor language check, e.g.

page 1 : (1) It provides ground truth =>  It provides a ground truth

 

Regards

Author Response

Thank you. I have made this correction.

Back to TopTop