2.1. DiscreteTime, ContinuousTime Branching Processes, and TwoType Branching Processes
Branching processes are commonly used to model an evolving population of microbial cells [
2]. In general, cell reproduction can be described by the following Markov chain. Denote the population size of the
nth cell generation (synchronized or not) by
${X}_{n},n\ge 0$ with
${X}_{0}>0$, and the number of offspring of the
ith cell of the
$(n1)$th generation by
${Z}_{i}$, the Markov chain
$\{{X}_{n},n\ge 0\}$ satisfies such a branching (or cell proliferation) rule
where
${Z}_{i},i=1,2,\cdots $ are nonnegative, integervalued, independent and identically distributed (i.i.d.) random variables following some discrete distribution (called the offspring distribution). Depending on whether the cell lifespan is fixed or random, branching processes can be categorized into discretetime and continuoustime variants. For the discretetime branching process (known as the Galton–Watson process or GWP [
3]), the cell lifespan is assumed to be a constant, hence the cell birth events in each generation are synchronized. Such an assumption is relaxed in the continuoustime branching process by allowing the cell lifetime to vary as a continuous random variable. The branching process with cell lifespan following an arbitrary continuous distribution is called the agedependent branching process or the BellmanHarris process (BHP) [
4,
5]. In particular, when the cell lifetime distribution is i.i.d. exponential, the resulting process is called the Markov branching process (MBP) [
6], otherwise, the corresponding continuoustime branching process is nonMarkovian. It is obvious that, because of the branching rule, branching processes serve as appropriate mathematical models for cell population dynamics.
Since the Luria–Delbrück experiment is about random mutagenesis in cell populations, distinguishable cells with different probabilistic behavior should be allowed in the branching process model. For a typical fluctuation analysis involving two types of cells, namely the wildtype (nonmutant) and mutant cells, the population sizes of these two types of cells changing over time can be modeled by a twotype branching process [
7]. The specific interest is in the distribution of mutant cell numbers at a given time, based on which the cell mutation probability (or mutation rate per cell division) can be inferred. Without loss of generality, let us consider a “General TwoType Branching Process”, hereinafter abbreviated as GTBP, which satisfies the following two fundamental rules: (i) Each cell lives a certain time (fixed or random) and then splits into a random number of offspring, independent of other cells. In particular, we may allow the wildtype and mutant cells to have different lifetime distributions and different offspring distributions. The parameters of these two distributions determine the growth rates of the two types of cells; (ii) Upon cell division, each cell mutates with a certain constant probability, independent of the division times. In a general setting, we allow backward mutations and assume that wildtype and mutant cells have mutation probabilities
${\mu}_{1}$ and
${\mu}_{2}$, respectively, where
$0\le {\mu}_{1},{\mu}_{2}\le 1$. Note that, this GTBP should not be confused with the general branching process (also called the CrumpModeJagers, or CMJ, process), which allows multiple birth events from each cell according to a point process [
8,
9]. We assume that the cell population starts from
${y}_{0}$ wildtype and
${x}_{0}$ mutant cells at
$t=0$, and denote the time of plating (i.e., the time for cell counting) by
${t}_{p}$ and correspondingly the number of wildtype and mutant cells at
${t}_{p}$ by
${y}_{t}$ and
${x}_{t}$, respectively.
In contrast to the GTBP, we also define a “Simplified TwoType Branching Process” (STBP) which is a special twotype MBP initiated by wildtype cell(s), with i.i.d. exponential lifetime for wildtype and mutant cells (i.e., nondifferential growth) and binaryfission (i.e., Yule process), and without cell deaths or backward mutations. This model will be used throughout our simulation studies as described in
Section 2.3. We note that the STBP is similar to Kendall’s twotype branching process (KTBP) [
10], which is often known as the stochastic Luria–Delbrück model, with slight differences. The KTBP allows cell deaths and assumes that, upon division, each wildtype cell will either die, give birth to two wildtype offspring, or turn into one wildtype + one mutant cell, with certain rates; each mutant cell, on the other hand, will either die or divide into two mutant offspring with a certain rates [
11]. However, in the STBP formulation, we assume all cells grow according to binaryfission with i.i.d. exponentially distributed lifetime, mutant cells always divide into two mutant offspring, and wildtype cells produce mutant offspring according to either pre or postdivision mutation. That is, for predivision mutation, each wildtype cell will mutate with probability (
${\mu}_{1}$, say) right before its division, but for postdivision mutation, each wildtype cell will first divide into two wildtype offspring, then these two offspring will mutate independently with probability (
${\mu}_{1}$) right after the division. In other words, from the wildtype cell perspective, the offspring distribution probability generating function (PGF) is
$f\left(s\right)={\mu}_{1}+(1{\mu}_{1}){s}^{2}$ for predivision mutation, and
$f\left(s\right)={\mu}_{1}^{2}+2{\mu}_{1}(1{\mu}_{1})s+{(1{\mu}_{1})}^{2}{s}^{2}$ for postdivision mutation. A schematic plot is shown in
Figure 1 to illustrate cell mutations in the KTBP and STBP models.
2.2. Algorithm for Simulating Population Dynamics and Mutations Based on a GTBP
In the present study, we consider the GTBP defined above. Clearly, such a model is flexible enough to cover various branching processes, e.g., the GWP, the MBP, and the BHP, with mutations taken into account. Algorithm 1 shows the simulation procedure of SimuBP based on such a GTBP. As described in the algorithm, there are four input arguments passed to the R function SimuBP:
among which the first one “bran” (structured as an R list object) determines the branching rule of cell proliferation. In this list object, the “bran
$span” component takes a character string, e.g., “fixed”, “exp”, “unif”, or “gam”, to specify the cell lifetime distribution (allowed to be different for wildtype and mutant cells). The “bran
$para” component is a vector or matrix which provides the lifetime distribution parameters in a pair, for example, if “bran
$span= ‘exp’ ”, then “bran
$para= ‘c(1, 2)’ ” means the exponential rate parameter for wildtype cells is 1 and for mutant cells is 2. The third and last component “bran
$offd” is a vector
$({p}_{0},{p}_{1},\cdots )$ specifying the offspring distribution, so “bran
$offd= ‘c(0,0,2)’ ” means binaryfission (if necessary, the wildtype and mutant cells can have different offspring distributions by changing “bran
$offd” to a matrix with two rows). The second input of the SimuBP function, “mupr=c(
${\mu}_{1},{\mu}_{2}$)”, is a vector specifying the forward and backward mutation probabilities. The third input vector, “n0=c(
${y}_{0},{x}_{0}$)”, specifies the initial number of wildtype and mutant cells. The last input “tp” is a scalar for the time of plating. Actually, both the time of plating and the population size at the time of plating can be used as input, however, considering the stochastic growth assumption of the GTBP, the former should be more appropriate for this simulator. For better illustration, these input arguments are shown in a schematic plot in
Figure 2. The output of SimuBP is simply a vector
$({z}_{t},{x}_{t})$ where
${x}_{t}$ is the number of mutant cells at
${t}_{p}$ and
${z}_{t}={x}_{t}+{y}_{t}$ is the total number of viable cells at
${t}_{p}$.
Algorithm 1 The SimuBP algorithm for simulating cell population with mutations based on a GTBP 
Input: branching rule parameters including cell lifetime and offspring distributions (wildtype and mutant cells can have different parameters), mutation probability $({\mu}_{1},{\mu}_{2})$ for forward and backward mutations, initial cell number $({y}_{0},{x}_{0})$, time of plating ${t}_{p}$ Output: total number of viable cells ${z}_{t}$ and number of mutant cells ${x}_{t}$ at ${t}_{p}$ Step 1. Initialize the number of wildtype and mutant cells at ${t}_{p}$ by setting ${y}_{t}=0,{x}_{t}=0$. Step 2. Generate two vectors ${\overrightarrow{T}}_{1},{\overrightarrow{T}}_{2}$ from the specified lifetime distribution. ${\overrightarrow{T}}_{1}$ and ${\overrightarrow{T}}_{2}$ are of length ${y}_{0}$ and ${x}_{0}$, denoting the lifetime of wildtype and mutant cell(s) in the first (or current) generation. Generate two binary vectors ${\overrightarrow{\delta}}_{1},{\overrightarrow{\delta}}_{2}$ from $\mathrm{Bern}\left({\mu}_{1}\right)$ and $\mathrm{Bern}\left({\mu}_{2}\right)$. ${\overrightarrow{\delta}}_{1}$ and ${\overrightarrow{\delta}}_{2}$ are of length ${y}_{0}$ and ${x}_{0}$, indicating whether mutation occurs for the wildtype and mutant cell(s) in the first (or current) generation. Based on ${\overrightarrow{T}}_{1},{\overrightarrow{T}}_{2}$ and ${\overrightarrow{\delta}}_{1},{\overrightarrow{\delta}}_{2}$, calculate the accumulated lifetimes ${\overrightarrow{T}}_{w},{\overrightarrow{T}}_{m}$ for wildtype and mutant cells. Step 3. Count wildtype cells with ${T}_{w}<{t}_{p}$ and denote this number by ${n}_{w}$, these wildtype cells will continue to divide. Count wildtype cells with ${T}_{w}\ge {t}_{p}$ and denote this number by ${y}_{w}$, update ${y}_{t}={y}_{t}+{y}_{w}$. Similarly, count mutant cells with ${T}_{m}<{t}_{p}$ and denote this number by ${n}_{m}$, count mutant cells with ${T}_{m}\ge {t}_{p}$ and denote this number by ${x}_{w}$, update ${x}_{t}={x}_{t}+{x}_{w}$. Let ${z}_{t}={y}_{t}+{x}_{t}$. while
${n}_{w}+{n}_{m}>0$
do (a) Based on the offspring distribution(s), generate the numbers of offspring for the wildtype and mutant cells in current generation. (b) Repeat Steps 2∼3 by updating ${y}_{0}$ and ${x}_{0}$ with the numbers of offspring in (a). As cell division/mutation continues along generations, ${n}_{w}$ and ${n}_{m}$ decrease and the sum eventually reaches 0 to quit the loop. end while

2.3. Simulation Studies for Validation, Comparison, and Demonstration
We perform simulation studies based on an STBP to evaluate the performance of SimuBP, including three components S1∼S3 with the following specific aims:
 S1:
To check goodnessoffit (GoF) of the STBP model to the data generated by SimuBP.
 S2:
To compare the data generated by SimuBP with those by two alternative simulators.
 S3:
To demonstrate mutation rate estimation based on the data generated by SimuBP.
Simulation S1 focused on validating the simulated data by SimuBP based on the STBP model. Suppose that, in the STBP the exponential rate of the cell lifetime distribution is
a, and the mutation probability of the wildtype cell is
$\mu $. Two different cases, S1a and S1b, are considered depending on the initial number of wildtype cells:
${y}_{0}=1$ and
${y}_{0}>1$. Denote the random variable of the total number of viable cells at the time of plating
${t}_{p}$ by
${Z}_{t}$, and the random variable of the number of wildtype cells at
${t}_{p}$ by
${Y}_{t}$. For S1a:
${y}_{0}=1$, the distributions of
${Z}_{t}$ and
${Y}_{t}$ can be obtained explicitly by using the property of binaryfission MBP [
12] (for convenience, a brief derivation is provided in
Appendix A.1):
and
When
$\mu $ is small as in typical fluctuation experiments, Formula (
2) can be approximated by
Consequently, for S1b:
${y}_{0}>1$,
and
We then use SimuBP with properly specified input arguments to generate data
$({z}_{t},{y}_{t})$ based on the STBP, and check the GoF of these data to the above theoretical distributions. Note that, since forward simulation is generally not efficient, to avoid slow computation, SimuBP does not simply apply superposition (via looping) of the
${y}_{t}$ and
${x}_{t}$ counts initiated by a single cell, but rather generates
${y}_{t}$ and
${x}_{t}$ samples directly from nonunit
${y}_{0}$ (and
${x}_{0}$ as well in a generalized setting).
It is worth noting that, this STBP is different from the traditional Luria–Delbrück or Lea–Coulson model because it assumes stochastic growth for both wildtype and mutant cells. To illustrate this point, we perform an additional simulation study S1c, where the ${z}_{t}$ and ${y}_{t}$ counts are generated from SimuBP according to the STBP used above. The distribution of the number of mutants ${x}_{t}={z}_{t}{y}_{t}$ is then calculated and compared with a corresponding LD distribution to check the GoF.
Simulation S2 is conducted to compare SimuBP with two other simulation algorithms. Both Algorithms 2 and 3 simulate counts of
${z}_{t}$ and
${x}_{t}$ based on the STBP model. Algorithm 2 comprises four steps: First, obtain the occurring time of each cell division event prior to plating. This is done by using the distribution of the interarrival times of binaryfission MBP (see Proposition 1 in [
13]). Second, count the population size resulting from each initial cell and sum up across the
${z}_{0}$ initial cells to obtain
${z}_{t}$. Third, determine among all cell division events the ones corresponding to mutation events, and consequently calculate for each mutation event its excess time until plating. Last, generate the resulting number of mutant cells from each mutation and sum up across the mutation events to obtain
${x}_{t}$. We denote the simulation study comparing Algorithms 1 and 2 by S2a.
Algorithm 2 Alternative simulator based on an STBP 
Input: exponential rates ${a}_{1},{a}_{2}$ for wildtype and mutant cell life times, initial number of cells (wildtype) ${z}_{0}$, mutation probability $\mu $, time of plating ${t}_{p}$ Output: total number of viable cells ${z}_{t}$ and number of mutant cells ${x}_{t}$ at ${t}_{p}$ Step 1. For each initial wildtype cell, calculate the occurring times of the successive division events along its genealogy until ${t}_{p}$ by generating and summing up the exponentially distributed interarrival times with rate $j{a}_{1},j=1,2,\cdots $. Denote the occurring times starting from the ith initial cell by $\left\{{S}_{i}\right\}$. Step 2. Count the number of elements in $\left\{{S}_{i}\right\}$ by ${n}_{i}$. Because of the binaryfission property, the population size at ${t}_{p}$, initiated by the ith cell is $({n}_{i}+1)$, hence ${z}_{t}={\sum}_{i=1}^{{z}_{0}}({n}_{i}+1)$. Step 3. Determine whether each cell division incurs mutation by generating $({z}_{t}{z}_{0})$ random numbers from $\mathrm{Bern}\left(\mu \right)$. Denote the number of mutations by m which is two times (This number may vary depending on the assumption of pre or postdivision mutations.) the sum of the $({z}_{t}{z}_{0})$ Bernoulli random numbers. Denote the occurring time of the ith mutation event by ${t}_{i},1\le i\le m$, so its excess time until plating is ${t}_{p}{t}_{i}$. Step 4. For each mutation, generate its resulting mutant cell count at ${t}_{p}$ from shifted geometric distribution $\mathrm{geo}\left({e}^{{a}_{2}({t}_{p}{t}_{i})}\right)+1$, and finally sum up across all m mutant cell counts to get the total number of mutant cells ${x}_{t}$ at ${t}_{p}$.

In the second part of Simulation S2, denoted by S2b, we compare SimuBP with another simulator (Algorithm 3) adapted from the software SALVADOR [
14]. Algorithm 3 differs from SALVADOR mainly in that it generates the number of wildtype cells
${z}_{t}$ at the time of plating from geometric growth rather than treating
${z}_{t}$ as input, and replaces the Poisson distributed number of mutations based on deterministic growth by the actual number of mutations based on stochastic growth. These adaptions make it easier to compare Algorithm 3 with Algorithms 1 or 2. It can be seen that Algorithms 2 and 3 are closely related and both rely on the exponential lifetime assumption so that once the number of mutations and the time from each mutation to plating are determined, the number of mutant cells
${x}_{t}$ at the time of plating can be obtained by generating geometrically distributed (with shift) random numbers.
Algorithm 3 Alternative simulator adapted from SALVADOR [14] 
Input: exponential rates ${a}_{1},{a}_{2}$ for wildtype and mutant cell life times, initial number of cells (wildtype) ${z}_{0}$, mutation probability $\mu $, time of plating ${t}_{p}$ Output: total number of viable cells ${z}_{t}$ and number of mutant cells ${x}_{t}$ at ${t}_{p}$ Step 1. Generate the population size ${z}_{t}$ at ${t}_{p}$ by summing up ${z}_{0}$ random numbers, each drawn from $\mathrm{geo}\left({e}^{{a}_{1}{t}_{p}}\right)$. Step 2. Calculate the total number of mutations by $m=\mu \xb7{z}_{t}$, and generate the occurring time ${t}_{i}$ of the ith mutation event, $1\le i\le m$, from truncated, flipped exponential distribution with range $[0,{t}_{p}]$, i.e., from CDF $F\left(t\right)=\frac{{e}^{{a}_{1}t}1}{{e}^{{a}_{1}{t}_{p}}1},0\le t\le {t}_{p}$. This is done by simulating ${t}_{i}$ as $[log(u({e}^{{a}_{1}{t}_{p}}1)+1)]/{a}_{1}$ where the random number $u\sim \mathrm{unif}(0,1)$. Step 3. For each mutation, generate its resulting mutant cell count at ${t}_{p}$ from shifted geometric distribution $\mathrm{geo}\left({e}^{{a}_{2}({t}_{p}{t}_{i})}\right)+1$, and finally sum up across all m mutant cell counts to get the total number of mutant cells ${x}_{t}$ at ${t}_{p}$.

It should be emphasized that, SimuBP is flexible to generate more general fluctuation experimental data than most of the other simulators including Algorithms 2 and 3, for instance, by allowing
 (1)
the cell lifetime to follow an arbitrary continuous distribution, or be a constant,
 (2)
the offspring distribution to be any discrete distribution, not just binaryfission,
 (3)
cell deaths and backward mutations,
 (4)
the initial cell population to contain both wildtype and mutant cells.
Moreover, SimuBP can be further extended to simulate other complex mutation processes governed by nonconstant (e.g., piecewise constant or even timevarying) mutation rate, as seen in the second example of the following demonstrations.
Lastly, we demonstrate the application of SimuBP through Simulation S3 of estimating mutation rates in a twotype MBP via two examples, S3a and S3b. In Simulation S3a, we first generate data from SimuBP based on the STBP model and then perform point estimation for the mutation probability by using the MOM/MLE estimator proposed in [
12]. Example S3b considers the case of twostage mutations, that is, during cell proliferation, mutations occur at a constant rate in stage 1 and, when entering stage 2 switch to another constant rate. Such data may be observed in fluctuation experiments comprising abrupt changes in external conditions. A typical example can be found in the protocol of mutagenesis experiment on
E. coli under subinhibitory antibiotic stress [
15], which introduces a cell recovery step prior to plating. The mutation rate in this twostage process is a piecewise constant function, which can be easily incorporated by SimuBP, but not by any other simulators. We then estimate the three unknown parameters of the piecewise constant mutation rate function by using an estimator proposed in [
16] based on approximate Bayesian computation. The estimation results of the three parameters are shown by a heatmap of the joint posterior samples of Markov chain Monte Carlo (MCMC).