Two Multi-Sigmoidal Diffusion Models for the Study of the Evolution of the COVID-19 Pandemic

Barrera, Antonio; Román-Román, Patricia; Serrano-Pérez, Juan José; Torres-Ruiz, Francisco

doi:10.3390/math9192409

Open AccessArticle

Two Multi-Sigmoidal Diffusion Models for the Study of the Evolution of the COVID-19 Pandemic

by

Antonio Barrera

^1,2,†

,

Patricia Román-Román

^2,3,†

,

Juan José Serrano-Pérez

^3,†

and

Francisco Torres-Ruiz

^2,3,*,†

¹

Departamento de Análisis Matemático, Estadística e Investigación Operativa y Matemática Aplicada, Facultad de Ciencias, Universidad de Málaga, Bulevar Louis Pasteur, 31, 29010 Málaga, Spain

²

Instituto de Matemáticas de la Universidad de Granada (IMAG), Calle Ventanilla, 11, 18001 Granada, Spain

³

Departamento de Estadística e Investigación Operativa, Facultad de Ciencias, Universidad de Granada, Avenida Fuente Nueva s/n, 18071 Granada, Spain

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2021, 9(19), 2409; https://doi.org/10.3390/math9192409

Submission received: 30 July 2021 / Revised: 20 September 2021 / Accepted: 22 September 2021 / Published: 28 September 2021

(This article belongs to the Special Issue Methodological and Applied Contributions on Stochastic Modelling and Forecasting)

Download

Browse Figures

Versions Notes

Abstract

:

A proposal is made to employ stochastic models, based on diffusion processes, to represent the evolution of the SARS-CoV-2 virus pandemic. Specifically, two diffusion processes are proposed whose mean functions obey multi-sigmoidal Gompertz and Weibull-type patterns. Both are constructed by introducing polynomial functions in the ordinary differential equations that originate the classical Gompertz and Weibull curves. The estimation of the parameters is approached by maximum likelihood. Various associated problems are analyzed, such as the determination of initial solutions for the necessary numerical methods in practical cases, as well as Bayesian methods to determine the degree of the polynomial. Additionally, strategies are suggested to determine the best model to fit specific data. A practical case is developed from data originating from several Spanish regions during the first two waves of the COVID-19 pandemic. The determination of the inflection time instants, which correspond to the peaks of infection and deaths, is given special attention. To deal with this particular issue, point estimation as well as first-passage times have been considered.

Keywords:

COVID-19; diffusion processes; multi-sigmoidal curves; inference in diffusion processes; first-passage times

1. Introduction

There can be no doubt that the outbreak of the global pandemic caused by the SARS-CoV-2 virus has produced a strong shock worldwide, the repercussions of which are still far from being seen. Obviously, the main concern is to contain the spread as soon as possible, thus putting a stop to the interminable trickle of deaths left in its wake. The data provided in real time by Johns Hopkins University [1] quantify the number of the infected, as of 22 July 2021, at around 192 million, while the deceased amount to more than 4.1 million. Fortunately, vaccination campaigns are beginning to bear fruit, although there is still a long way to go, especially in developing countries. To add to the problem, severe economic losses are being felt as a result of the pandemic. It is still too early to evaluate their full impact, although some studies are already looking into the matter (see for example [2]).

Since the beginning of the pandemic, multiple efforts have been made to model its evolution and understand its behavior. Knowledge of applicable models can help, on the one hand, to make correct decisions to mitigate the spread of the virus and, on the other, to design actions against future health crises of the same nature. For this reason, the scientific community is making strenuous efforts to apply existing techniques and develop new ones capable of modeling and predicting the evolution of the epidemic.

In this regard, compartmental epidemiological models have been a main focus of attention. Among them, SIR models stand out, which are based on the assumption that the population can be classified into three independent compartmentalized groups (susceptible, infected, and recovered). The number and type of compartmentalized groups can be modified to better reflect the specific dynamics of the disease, giving way, for example, to SEIR models (susceptible, exposed, infected, and recovered). In both cases, the models represent how individuals can progress from one compartmentalized group to the next (see [3] for more details about the compartmental models).

Although such models have been widely applied, the specifics of the present pandemic has led researchers to explore how certain modifications can be introduced in the models. For example, Gounane et al. [4] developed a non-linear SIR model that includes the effect of social distancing imposed by governments around the world. Khan et al. [5] introduced the so-called SQUIDER model in which new compartments are incorporated; specifically infected and undetected cases (U), undetected recovered cases (E), quarantined cases affected by social distancing (Q), and deaths from contagion (D). Furthermore, Ianni et al. [6] introduced temporal dependence on the parameters of the classical SIR model, thus affecting the behavior of the basic reproductive number

R_{0}

over time. This is intended to take into account containment measures, such as lockdowns, social distancing, and limitations on commercial activities. The resulting models are called SIR-T and SIRD-T. While all the efforts mentioned above address compartmental models from a deterministic point of view, some research has looked into the development of stochastic models, including fractional versions of the same (see for instance [7]).

Apart from these epidemiological models, several authors have considered other approaches. Among others we can cite the work of Maleki et al. [8] in which they considered ARMA processes based on two-piece scale mixture normal distributions; Ünlu and Namb [9] used machine learning techniques and addressed various epidemiological models from a Bayesian perspective. Special mention must be made of functional data techniques, which have been applied to several lines of research. For example, Ref. [10] used principal component analysis to discover patterns of pandemic progression as well as to identify the countries that better represent these archetypes. Similarly, Acal et al. [11] proposed a principal components multiple function-on-function regression model for the imputation of missing data in the COVID-19 hospitalized and intensive care curves of several Spanish regions.

The evolution dynamics of the current pandemic, as in previous ones, point at the importance of considering models based on growth curves, mainly those of the sigmoidal type. Some of them, such as the Richards curve, have already been used in previous situations of this nature (Hsi et al. [12], Wang et al. [13]). Concerning COVID-19 specifically, mention must be made of Català et al. [14], where the authors employed the Gompertz curve; and of Li et al. [15], a comparative study performed using the Richards, logistic, Von Bertalanffy, and Gompertz curves, among others. However, such curves present a fairly rigid behavior, so it is necessary to resort to more flexible models. In this sense, Tovissodé et al. [16] have used the generic growth model introduced by Turner et al. [17]. A fundamental aspect that must be taken into consideration is that practically all the studies in this line of research focus on modeling the first wave of the pandemic. This is due to these curves presenting a sigmoidal behavior with only one inflection point. Sadly, this does not allow for a more in-depth analysis of the evolution of the pandemic as a whole. Therefore, if more complex behaviors are to be modeled, it becomes necessary to consider multi-sigmoidal curves. Likewise, a large part of the studies have been carried out from a deterministic point of view, so they do not take into account possible random fluctuations as well as influences external to the system. The fact that growth curves have their origin, for the most part, in the solution of an ordinary differential equation naturally leads to the consideration of stochastic differential equations whose solutions are, under certain conditions, diffusion processes. Some works have recently been published which combine both aspects, that is, the multi-sigmoidal and the stochastic character of the model. Román-Román et al. [18] considered a diffusion process associated with a plurisigmoidal Gompertz curve, while Di Crescenzo et al. [19] proposed two growth and death processes and a diffusion process whose mean functions are logistic curves that present several inflection points.

In summary, the starting motivations of the present work can be summarized as:

The need for the broadest possible knowledge of the evolution dynamics of the pandemic, focusing on determining points of contagion peaks, inflections in its evolution, duration, and the succession of waves.
The need to consider models that reproduce the observed multi-sigmoidal behavior of the pandemic.
The need to consider stochastic models that allow the aspects mentioned above to be studied from a probabilistic perspective.

With these ideas in mind, the present paper is a proposal to employ diffusion processes, whose mean functions are multi-sigmoidal curves, in order to model the evolution of the pandemic through various waves by considering both the number of infected individuals and the number of deaths. Specifically, the Gompertz process introduced in [18] is considered, as well as a version of the generalized Weibull-type process that was proposed by Barrera et al. [20]. In both cases, the processes are obtained by including polynomial-type functions in the ordinary differential equations that give rise to the Gompertz and Weibull curves, respectively. Both processes can be viewed from a common perspective such as that provided by the inhomogeneous lognormal diffusion process. Stochastic diffusion processes provides advantages over other commonly-used models for the study of epidemiological phenomena, such as SIR models based on differential equations. The probabilistic nature of the model allows the researcher to evaluate the variability of the model estimates, considering, for example, confidence intervals for the estimates and predictions made. In addition, in the specific case of diffusion processes such as the ones considered in this paper, they can be used to study several temporary variables of notable interest. Specifically, in the present case the determination of the time instants in which the infection and death peaks are reached can be approached by calculating first-passage times, which provides a valuable probabilistic tool for its study and analysis.

Under these general considerations, this paper is structured as follows: Section 2 is dedicated to introducing the multi-sigmoidal Gompertz and Weibull curves, while Section 3 introduces the processes associated with each of the previous curves, based on the inhomogeneous lognormal diffusion process. These two sections cover both models in a unified way, which will help simplify the expressions to be derived later on. Section 4 is dedicated to the estimation of the parameters of the models being considered, which is carried out by maximum likelihood. This section includes some strategies aimed at the selection of initial solutions for the numerical methods necessary for the estimation, and describes a strategy for the selection of the optimal polynomial in each case. Finally, Section 5 uses both processes to model pandemic-related data originating from several Spanish regions during the first two waves of infection, and proposes a strategy for selecting the model that best fits the observed data. For the selected model, a study of the inflection times is considered, which determine the infection and deaths peaks in each wave. This analysis is carried out from the point estimation of said instants and from the study of the first-passage times of the process through the values where inflections occur. Finally, some conclusions are drawn.

2. Gompertz and Weibull Multi-Sigmoidal Curves

In this section, we introduce two multi-sigmoidal versions of the Gompertz and Weibull curves. Both are obtained through the inclusion of polynomial functions in the ordinary differential equations whose solutions are said curves. More in detail, let

Q_{β} (t) = \sum_{ℓ = 1}^{p} β_{ℓ} t^{ℓ}

be a polynomial of degree

p > 0

, where

β = {(β_{1}, \dots, β_{p})}^{T}

is a real parametric vector such that

β_{p} > 0

. Let us consider

t \geq 0

and define the multi-sigmoidal Gompertz function as

f_{{\bar{θ}}_{g}} (t) = k_{g} exp (- α_{g} exp (- Q_{β_{g}} (t))), {\bar{θ}}_{g} = {(α_{g}, β_{g}^{T})}^{T},

(1)

and the multi-sigmoidal Weibull functions as

f_{{\bar{θ}}_{w}} (t) = k_{w} - α_{w} exp (- Q_{β_{w}} (t)), {\bar{θ}}_{w} = {(α_{w}, β_{w}^{T})}^{T},

(2)

where

α_{g}, α_{w} > 0

and the parameters

k_{g}, k_{w} > 0

are the asymptotic values of every curve, i.e., the carrying capacity of every model. Here we denote with g and w those expressions related to the Gompertz and Weibull models, respectively. Each of such functions can be viewed as the solution of the Malthusian-type linear differential equation

\frac{d}{d t} f_{θ} (t) = h_{θ} (t) f_{θ} (t), θ = {\bar{θ}}_{g}, {\bar{θ}}_{w},

(3)

where the key function

h_{θ}

is defined as follows:

h_{θ} (t) = \{\begin{matrix} α_{g} P_{β_{g}} (t) e^{- Q_{β_{g}} (t)} & if θ = {\bar{θ}}_{g}, \\ α_{w} P_{β_{w}} (t) e^{- Q_{β_{w}} (t)} {(k_{w} - α_{w} exp (- Q_{β_{w}} (t)))}^{- 1} & if θ = {\bar{θ}}_{w} . \end{matrix}

Here,

P_{β_{g}} (t)

is the polynomial satisfying

\frac{d}{d t} Q_{β_{g}} (t) = P_{β_{g}} (t)

for all t (analogously for the Weibull case with

β_{w}

). From the definition of

h_{θ}

it follows that the growth periods of the curves are related with the roots of

P_{β_{g}}

and

P_{β_{w}}

. Indeed, at every root the derivative vanishes and the process stops growing. This is also directly related with the inflection points of the curves, which are discussed at the end of this section.

Other differential equations can be derived from (3) by taking into account the original expressions (1) and (2) for Gompertz and Weibull multi-sigmoidal models, respectively. Indeed,

\frac{d}{d t} f_{θ} (t) = \{\begin{matrix} (log k_{g} - log f_{{\bar{θ}}_{g}} (t)) f_{{\bar{θ}}_{g}} (t) P_{β_{g}} (t) & if θ = {\bar{θ}}_{g}, \\ (k_{w} - f_{{\bar{θ}}_{w}} (t)) P_{β_{w}} (t) & if θ = {\bar{θ}}_{w}, \end{matrix}

leads to solutions, provided the initial condition

f_{θ} (t_{0}) = f_{0} > 0

for all

θ

,

f_{θ} (t) = \{\begin{matrix} exp (log k_{g} (1 - e^{- Q_{β_{g}} (t) - Q_{β_{g}} (t_{0})}) + log f_{0} e^{- Q_{β_{g}} (t) - Q_{β_{g}} (t_{0})}) & if θ = {\bar{θ}}_{g}, \\ k_{w} - (k_{w} - f_{0}) exp (- α_{w} (Q_{β_{w}} (t) - Q_{β_{w}} (t_{0}))) & if θ = {\bar{θ}}_{w} . \end{matrix}

By taking limits when t goes to infinity, such functions tend to values (the carrying capacity) which do not depend on the initial values. This is the main difference with respect to the model solution of Equation (3). As a matter of fact, by considering

f_{θ} (t_{0}) = f_{0}

for

t \geq t_{0} \geq 0

, Equation (3) leads to curves

f_{θ} (t) = \{\begin{matrix} f_{0} exp (- α (e^{- Q_{β_{g}} (t)} - e^{- Q_{β_{g}} (t_{0})})) & if θ = θ_{g}, \\ f_{0} \frac{η - exp (- Q_{β_{w}} (t))}{η - exp (- Q_{β_{w}} (t_{0}))} & if θ = θ_{w}, \end{matrix}

(4)

where

θ_{g} = {(α, β_{g}^{T})}^{T}

and

θ_{w} = {(η, β_{w}^{T})}^{T}

for

α = α_{g}

and

η = k_{w} / α_{w}

. Note that, provided such reformulation, the Weibull model now has one less parameter. From now on, all definitions depending on

θ

will be considered for values

θ_{g}

and

θ_{w}

, instead of

{\bar{θ}}_{g}

and

{\bar{θ}}_{w}

.

Related with the above equations, note that the expression of

h_{θ}

is simplified after the reformulation, being now

h_{θ} (t) = \{\begin{matrix} α P_{β_{g}} (t) e^{- Q_{β_{g}} (t)} & if θ = θ_{g}, \\ P_{β_{w}} (t) e^{- Q_{β_{w}} (t)} {(η - exp (- Q_{β_{w}} (t)))}^{- 1} & if θ = θ_{w} . \end{matrix}

(5)

Furthermore, note that the limit value of such models depends on initial values

t_{0}

and

f_{0}

, that is,

lim_{t \to \infty} f_{θ} (t) = \{\begin{matrix} f_{0} exp (α e^{- Q_{β_{g}} (t_{0})}) & if θ = θ_{g}, \\ f_{0} \frac{η}{η - exp (- Q_{β_{w}} (t_{0}))} & if θ = θ_{w} . \end{matrix}

The use of the latter models is justified in the cases of phenomena with multiple sample paths of observations, each starting at different points and having different limit values. Due to this more general character, in what follows we will consider the expressions of

f_{θ}

given in (4).

As mentioned above, inflection points are of great importance in studying the behavior of this type of curves. As such points make the second derivative of the curve vanish, expressions can be obtained by differentiating (3) and taking into account expressions of

h_{θ}

. Therefore, in the case of the multi-sigmoidal Gompertz function, condition

\frac{d^{2}}{d t^{2}} f_{θ_{g}} (t) = 0

leads to equation

\frac{d}{d t} P_{β_{g}} (t) = P_{β_{g}}^{2} (t) (1 - α exp (- Q_{β_{g}} (t))),

whereas, in the case of the multi-sigmoidal Weibull function, it follows that

\frac{d}{d t} P_{β_{w}} (t) = P_{β_{w}}^{2} (t) .

The solutions of these equations are the inflection points of the curves. Many growth phenomena usually present at least one inflection, representing the moment when the growth rate changes. Multi-sigmoidal models may have multiple inflection points, and are therefore useful in modeling such types of growth phenomena.

In order to illustrate the behavior related with inflection points, Figure 1 shows multi-sigmoidal Gompertz and Weibull curves for different values of the polynomial coefficients when

p = 3, f_{0} = 5, α = log 2

and

η = 2

, as well as its first and second derivatives, all according to the values of Table 1.

3. Gompertz and Weibull Multi-Sigmoidal Diffusion Processes

The aforementioned deterministic models are useful to describe growth phenomena in their most basic terms. However, in order to build a more precise model able to take into account other factors and uncertainties, their stochastic counterparts are recommended. Random fluctuations appear in every dynamical behavior, whether as a result of the nature of the phenomenon under study or as the uncertainty derived from observation and measurement instruments. They must therefore be included in any model attempting to describe the growth phenomena in any significant depth.

A traditional way to build stochastic models is including a random term (usually a white noise) in an ordinary differential equation. Nevertheless, such approach may lead to intractable models due to the potentially high complexity of the deterministic base model.

For this reason, it might be useful to consider other approaches, such as adding a random perturbation to a parametric function. In this case, following (3), it is possible to introduce a white noise

ζ (t)

with spectral density

σ^{2}

, where

σ > 0

will be the diffusion coefficient of the stochastic model, in order to randomly perturb function

h_{θ} (t)

(from (5)) and to obtain a function

h_{θ} (t) + ζ (t)

. Please note that, for the sake of clarity and following the unified notation of the previous section, from now on we will omit the subscript in the expression for

θ

.

After replacing

h_{θ} (t)

with

h_{θ} (t) + ζ (t)

in (3), such expression becomes a stochastic differential equation; concretely

d X (t) = h_{θ} (t) X (t) d t + σ X (t) d W (t),

(6)

where

W (t)

is the standard Wiener process, independent of initial value

X (t_{0})

for

t \geq t_{0}

. Here,

t_{0} \geq 0

is the initial time and

t \in [t_{0}, T]

where

[t_{0}, T]

is the parametric space of the process.

Taking into account that

h_{θ}

is continuous in

t \in [t_{0}, T]

, Equation (6) verifies the conditions for the existence and uniqueness of a solution. Said solution is a stochastic diffusion process, known as inhomogeneous lognormal diffusion process, taking values on

R^{+}

and characterized by infinitesimal moments

A_{1} (x, t) = h_{θ} (t) x

and

A_{2} (x) = σ^{2} x^{2}

. In addition, a closed-form expression for the solution can be provided. In fact, by considering initial condition

X (t_{0}) = X_{0}

, being

X_{0}

a random variable whose distribution must be either degenerate or lognormal (as further explained later in this section), we have

X (t) = X_{0} exp (H_{\tilde{θ}} (t_{0}, t) + σ (W (t) - W (t_{0}))), \tilde{θ} = {(θ^{T}, σ^{2})}^{T},

where function

H_{\tilde{θ}}

is defined, for

t_{0} \leq s < t \leq T

, as

H_{\tilde{θ}} (s, t) = \int_{s}^{t} h_{θ} (u) d u - \frac{σ^{2}}{2} (t - s) .

Note that parameter

\tilde{θ}

takes the form

{\tilde{θ}}_{g}

or

{\tilde{θ}}_{w}

when

θ = θ_{g}

or

θ = θ_{w}

, respectively, where

{\tilde{θ}}_{g} = {(θ_{g}^{T}, σ^{2})}^{T}

and

{\tilde{θ}}_{w} = {(θ_{w}^{T}, σ^{2})}^{T}

.

Function

H_{\tilde{θ}}

depends on the integral of function

h_{θ}

, defined in the previous section. Such integral can be computed easily for every multi-sigmoidal model, leading to

H_{θ} (s, t) = \int_{s}^{t} h_{θ} (u) d u = \{\begin{matrix} - α e^{- Q_{β (θ)} (t)} + α e^{- Q_{β (θ)} (s)} & if θ = θ_{g}, \\ log (η - e^{- Q_{β (θ)} (t)}) - log (η - e^{- Q_{β (θ)} (s)}) & if θ = θ_{w} . \end{matrix}

where parameter

β

depends on

θ

in the sense that

β (θ_{g}) = β_{g}

and

β (θ_{w}) = β_{w}

.

Such results can be expressed in a more compact form as

H_{θ} (s, t) = ψ_{θ} (e^{- Q_{β (θ)} (t)}) - ψ_{θ} (e^{- Q_{β (θ)} (s)})

where function

ψ_{θ}

is defined as

ψ_{θ} (r) = \{\begin{matrix} - α r & if θ = θ_{g}, \\ log (η - r) & if θ = θ_{w} . \end{matrix}

(7)

The inhomogeneous lognormal process has been the subject of many studies, theoretical as well as applied. In particular, Román and Torres [21] analyzed their versatility when modeling phenomena governed by growth curves such as those considered here (for example Richards [22] and Hubbert [23], among other curves). Gutiérrez et al. [24] carried out a detailed analysis of that process from the perspective of SDE as well as from the point of view of Kolmogorov’s partial differential equations, including the study of the distribution of the process and characteristics of interest, such as the moment and percentile functions.

As regards the distribution of the process, if

X_{0}

is either degenerate or lognormal, then the finite-dimensional distributions of the process are distributed according to a lognormal law. Indeed, random vector

{(X (t_{1}), \dots, X (t_{n}))}^{T}

, where

t_{1} < \dots < t_{n}

for all

n > 0

, has an n-dimensional lognormal distribution

Λ_{n} [ε, Σ]

where

ε

is a vector of elements

ε_{i} = μ_{0} + H_{\tilde{θ}} (t_{0}, t_{i})

and

Σ

is a matrix of elements

σ_{i j} = σ_{0}^{2} + σ^{2} (min (t_{i}, t_{j}) - t_{0})

, for

i, j = 1, \dots, n

. Here,

μ_{0}

and

σ_{0}^{2}

are the parameters of the initial distribution

Λ_{1} [μ_{0}, σ_{0}^{2}]

. Note that, if

X_{0} > 0

is degenerate at a point

x_{0}

, i.e.,

X_{0} = x_{0}

a.s., then

μ_{0} = x_{0}

and

σ_{0}^{2} = 0

in the previous expressions.

The transition probability distribution is particularly useful for inferential purposes. From the two-dimensional distributions

{(X (s), X (t))}^{T}

,

s < t

, the transitions of the process can be obtained, which are also lognormal. Concretely,

X (t) | X (s) = y \sim Λ_{1} [log y + H_{\tilde{θ}} (s, t), σ^{2} (t - s)] .

Once the distribution of the process has been established, different characteristics associated with it can be calculated, including the mean and conditioned mean functions, whose expressions are

E (X (t)) = E (X_{0}) \frac{exp (ψ_{θ} (e^{- Q_{β (θ)} (t)}))}{exp (ψ_{θ} (e^{- Q_{β (θ)} (t_{0})}))}

and

E (X (t) | X (t_{0}) = x_{0}) = x_{0} \frac{exp (ψ_{θ} (e^{- Q_{β (θ)} (t)}))}{exp (ψ_{θ} (e^{- Q_{β (θ)} (t_{0})}))} .

In addition, taking into account (4) and (7) it can be verified that

\frac{exp (ψ_{θ} (e^{- Q_{β (θ)} (t)}))}{exp (ψ_{θ} (e^{- Q_{β (θ)} (t_{0})}))} = \frac{f_{θ} (t)}{f_{θ} (t_{0})},

thus the two mean functions are of the type considered in the previous section. This supports the use of the processes introduced to model phenomena whose behavior is of the multi-sigmoidal type. This provides good reason to study the inference of the processes, which is the subject of the next section.

4. Inference

In this section, the maximum likelihood estimation of the parameters of the two multi-sigmoidal processes is discussed, following the general methodology suggested by Román-Román et al. [25].

Let us now consider a discrete sampling of d paths observed at times

t_{i j}

for

i = 1, \dots, d

and

j = 1, \dots, n_{i}

. For the sake of simplicity, we will consider

t_{i 1} = t_{0}

for all i, i.e., all the paths are observed at the same time instants. Furthermore, let

X_{i}^{T} = (X (t_{i 1}), \dots, X (t_{i n_{i}}))

be the vector corresponding to the

i

-th sample path (

i = 1, \dots, d

). Then,

X = {(X_{1}^{T} | \dots | X_{d}^{T})}^{T}

.

Let us assume a lognormal initial distribution, i.e.,

X_{0} \sim Λ_{1} [μ_{1}, σ_{1}^{2}]

. Then, for a fixed value

x \in X

, the log-likelihood function is

\begin{matrix} log L_{x} (ζ, \tilde{θ}) = & - \frac{(n + d) log (2 π)}{2} - \frac{d log σ_{1}^{2}}{2} - \frac{n log σ^{2}}{2} \\ - \sum_{i = 1}^{d} log x_{i 1} - \frac{1}{2 σ_{1}^{2}} \sum_{i = 1}^{d} {(log x_{i 1} - μ_{1})}^{2} - \frac{1}{2 σ^{2}} (Z_{1} + Φ_{\tilde{θ}} - 2 Γ_{\tilde{θ}}) \end{matrix}

(8)

where

n = \sum_{i = 1}^{d} (n_{i} - 1)

,

ζ = {(μ_{1}, σ_{1}^{2})}^{T}

is the vector containing the parameters of the initial distribution and

\begin{matrix} Z_{1} & = \sum_{i = 1}^{d} \sum_{j = 2}^{n_{i}} Δ_{i, j - 1, j}^{- 1} {(log \frac{x_{i j}}{x_{i, j - 1}})}^{2}, \\ Φ_{\tilde{θ}} & = \sum_{i = 1}^{d} \sum_{j = 2}^{n_{i}} Δ_{i, j - 1, j}^{- 1} H_{\tilde{θ}}^{2} (t_{i, j - 1}, t_{i j}), \\ Γ_{\tilde{θ}} & = \sum_{i = 1}^{d} \sum_{j = 2}^{n_{i}} Δ_{i, j - 1, j}^{- 1} log \frac{x_{i j}}{x_{i, j - 1}} H_{\tilde{θ}} (t_{i, j - 1}, t_{i j}), \end{matrix}

where

Δ_{i, j - 1, j} : = t_{i j} - t_{i, j - 1}

(note that, in the case of equidistant times, this term becomes constant).

Provided the functional independence of parametric vectors

ζ

and

\tilde{θ}

, the estimate of the former is

{\hat{μ}}_{1} = \frac{1}{d} \sum_{i = 1}^{d} log x_{i 1}, {\hat{σ}}_{1}^{2} = \frac{1}{d} \sum_{i = 1}^{d} {(log x_{i 1} - {\hat{μ}}_{1})}^{2},

while the estimate of

\tilde{θ}

is obtained from the solution of the system of equations (see [25] for details)

\begin{matrix} Ξ_{θ} + \frac{σ^{2}}{2} Ω_{θ} & = 0, \\ σ^{2} (n + σ^{2} Z_{2} / 4) - X_{1}^{θ} + 2 X_{2}^{θ} - Z_{1} & = 0, \end{matrix}

(9)

where

\begin{matrix} Ξ_{θ} & = \sum_{i = 1}^{d} \sum_{j = 2}^{n_{i}} Δ_{i, j - 1, j}^{- 1} (log \frac{x_{i j}}{x_{i, j - 1}} - H_{θ} (t_{i, j - 1}, t_{i j})) \frac{\partial}{\partial θ} H_{θ} (t_{i, j - 1}, t_{i j}), \\ Ω_{θ} & = \sum_{i = 1}^{d} \sum_{j = 2}^{n_{i}} \frac{\partial}{\partial θ} H_{θ} (t_{i, j - 1}, t_{i j}), X_{1}^{θ} = \sum_{i = 1}^{d} \sum_{j = 2}^{n_{i}} Δ_{i, j - 1, j}^{- 1} H_{θ}^{2} (t_{i, j - 1}, t_{i j}), \\ X_{2}^{θ} & = \sum_{i = 1}^{d} \sum_{j = 2}^{n_{i}} Δ_{i, j - 1, j}^{- 1} log \frac{x_{i j}}{x_{i, j - 1}} H_{θ} (t_{i, j - 1}, t_{i j}), Z_{2} = \sum_{i = 1}^{d} \sum_{j = 2}^{n_{i}} Δ_{i, j - 1, j} . \end{matrix}

The maximum-likelihood equations use the vector derivative of

H_{θ}

, which is done by the following expressions, according to the compact form presented previously:

\frac{\partial}{\partial θ} H_{θ} (t_{i, j - 1}, t_{i j}) = (\begin{matrix} M_{θ}^{0} (t_{i j}) - M_{θ}^{0} (t_{i, j - 1}) \\ M_{θ}^{1} (t_{i j}) - M_{θ}^{1} (t_{i, j - 1}) \\ ⋮ \\ M_{θ}^{p} (t_{i j}) - M_{θ}^{p} (t_{i, j - 1}) \end{matrix})

where, for

k = 0, \dots, p

,

M_{θ}^{k} (t) = 𝟙_{{θ = θ_{w}}} {(t^{k} e^{- Q_{β (θ)} (t)})}^{1 - δ_{0 k}} \frac{\partial ψ_{θ} (r)}{\partial η} |_{r = e^{- Q_{β (θ)} (t)}} - 𝟙_{{θ = θ_{g}}} {(α t^{k})}^{1 - δ_{0 k}} \frac{\partial ψ_{θ} (r)}{\partial α} |_{r = e^{- Q_{β (θ)} (t)}},

where

𝟙_{{\cdot}}

and

δ_{\cdot \cdot}

are the indicator function and the Kronecker delta, respectively. Note that, provided the degree p of the polynomial, that is, the number of parameters of the curve, then

\frac{\partial}{\partial θ} H_{θ} \in R^{p + 1}

.

The system of Equation (9) can not be solved explicitly, and it is therefore necessary to use numerical methods for which an initial solution is required. However, it is not possible to carry out a general study of the system of equations in order to check the conditions of convergence of the chosen numerical method, since the system is dependent on sample data which may therefore lead to unforeseeable behavior. One alternative would be using optimization procedures on the log-likelihood (8), for which it is usually necessary to bound the parametric space. These two questions will be covered in Section 4.2.

4.1. About $[t_{0}, T]$

Before applying the inference procedure, it is necessary to discuss a modification involving the time interval where the processes are considered. Although such processes have been defined on a generic interval

[t_{0}, T]

, two main reasons motivate the standardization of such interval to

[0, 1]

. The first one is the analytical complexity of the multi-sigmoidal lognormal model. When

t_{0} = 0

, many exponential functions turn to 1, making calculus more tractable than the general case

t_{0} \neq 0

. The second reason is the numerical precision of computational operations. When

T = 1

, fluctuations of the objective function (derived from the log-likelihood function) allow for a good performance of the numerical algorithms.

Such standardization does not, nevertheless, modify the nature of the process. Indeed, the two multi-sigmoidal lognormal diffusion processes being considered remain the same when the parametric space is transformed from

[t_{0}, T]

to

[0, 1]

. In particular, infinitesimal moments fully characterizing the process are equal up to a term coming from the standardization procedure. A more formal explanation is shown in Appendix A.

4.2. Initial Solutions

As mentioned earlier, the use of numerical methods to solve the system of Equation (9), or of optimization methods in the case of maximizing (8), requires having initial solutions in the first case or limiting the parametric space in the second.

In order to obtain initial solutions, we propose a simple linear regression. Taking

t_{0} = 0

in agreement with the remark made in the previous subsection, from (4) it follows, for the Gompertz case, that

log log \frac{k_{g}}{f_{θ_{g}} (t)} = log α - Q_{β_{g}} (t),

(10)

where

k_{g} = f_{0} e^{α}

, whereas for the Weibull case it has

log (η - \frac{f_{θ_{w}} (t)}{f_{0}} (η - 1)) = - Q_{β_{w}} (t),

which, taking into account relation

η = {(1 - \frac{f_{0}}{k_{w}})}^{- 1}

, becomes

log \frac{k_{w} - f_{θ_{w}} (t)}{k_{w} - f_{0}} = - Q_{β_{w}} (t) .

(11)

Both Equations (10) and (11) can be viewed as linear regression models over

(t, z^{g})

, and

(t, z^{w})

, where

t

is the time vector and

z^{g}

is the vector of elements

z_{i}^{g} = log log ({\tilde{k}}_{g} / m_{i}^{g})

for the Gompertz case, and

z^{w}

the vector of elements

z_{i}^{w} = log (({\tilde{k}}_{w} - m_{i}^{w}) / ({\tilde{k}}_{w} - m_{1}^{w}))

for the Weibull case (

i = 1, \dots, d

). Here,

{\tilde{k}}_{g}, m_{i}^{g}, {\tilde{k}}_{w}, m_{i}^{w}

are the final and i-th observed value of the sample mean, respectively, to be modeled by Gompertz and Weibull processes. With all of this, initial values of polynomial parameters

β_{θ_{g}}

and

β_{θ_{w}}

follow from linear regression. The initial values for

α

and

η

are obtained from the relationships between these parameters and limit values

k_{g}

and

k_{w}

.

A similar procedure can be applied to obtain an initial value of

σ^{2}

. The equation for the regression comes from having a lognormal distribution. Indeed, it is known that, given a random sample from a lognormal distribution

Λ_{1} [η, δ]

, the quotient between the arithmetic mean and the geometric one provides an estimation of

δ

. By applying this to the distribution of

X (t)

, we obtain, for each

t_{i}

, an estimate of

σ_{0}^{2} + σ^{2} t_{i}

; that is,

σ_{i}^{2} = 2 log (m_{i} / g_{i})

, being

m_{i} \in {m_{i}^{g}, m_{i}^{w}}

and

g_{i} \in {g_{i}^{g}, g_{i}^{w}}

the geometric mean at time

t_{i}

for Gompertz and Weibull cases. The initial value of

σ^{2}

is calculated by performing a simple linear regression of the

σ_{i}^{2}

values against

t_{i}

. Note that if

X_{0}

is a degenerate distribution, then

σ_{0}^{2} = 0

. Otherwise,

σ_{0}^{2}

is previously estimated from the values of the sample paths at

t_{0}

.

Regarding the maximization of the likelihood function, not all specialized software requires bounding the parametric space. However, for those that do, we suggest the following procedure:

Since high values of $σ^{2}$ lead to sample paths with great variability around the mean of the process, we consider $0 < σ^{2} < 1$ , so that the multi-sigmoidal behavior is advisable.
For the Gompertz case, we propose considering confidence intervals provided by the linear regression previously performed in order to find the initial solutions. It is advisable to use a high level of confidence, i.e., 0.999.
For the Weibull case, we consider the confidence intervals for parameters $β_{j}$ , whereas for $η$ , and since it is verified that $η = {(1 - \frac{f_{0}}{k_{w}})}^{- 1}$ , we suggest taking interval $(a, b)$ , where

$a : = min_{1 \leq i \leq d} {(1 - \frac{x_{i, 1}}{x_{i, n_{i}}})}^{- 1}, and b : = max_{1 \leq i \leq d} {(1 - \frac{x_{i, 1}}{x_{i, n_{i}}})}^{- 1} .$

4.3. Degree of Polynomial

The choice of the degree of the polynomial included in the infinitesimal mean of the multi-sigmoidal Gompertz and Weibull models must obviously be based on the data. Thus, we propose setting a high degree of polynomial in advance, q, and selecting:

For the multi-sigmoidal Gompertz model, the optimal polynomial regression model in $M_{g}$ for the data set $(t, z^{g})$ being $M_{g}$ the class of polynomial regression models of degree less than or equal to q;
For the multi-sigmoidal Weibull model, the optimal polynomial regression model in $M_{w}$ for the data set $(t, z^{w})$ being $M_{w}$ the class of polynomial regression models of degree less than or equal to q without the intercept term.

In order to do this, we will employ the usual Bayesian procedure for variable selection in normal regression models using intrinsic priors for model parameters and the uniform prior for models (see Moreno et al. [26,27,28] for details). The optimal model will be the one having the highest posterior probability.

A slight adaptation of the Bayesian procedure for variable selection is required in our case:

For the multi-sigmoidal Gompertz model, by $M_{j}$ we denote the polynomial regression model of degree j, $j = 0, \dots, q$ .
Given data set $(t, z^{g})$ , which comes from a model in $M_{g}$ , the posterior probability of model $M_{j}$ is given by

$P (M_{j} | t, z^{g}) = \frac{B_{j 0} (t, z^{g}) π (M j)}{\sum_{i = 0}^{q} B_{i 0} (t, z^{g}) π (M_{i})}$

where $π (M_{i}) = 1 / (q + 1)$ and

$B_{i 0} (t, z^{g}) = \frac{2}{π} {(i + 2)}^{i / 2} \int_{0}^{π / 2} \frac{{sin}^{i} φ {(n + (i + 2) {sin}^{2} φ)}^{(n - i - 1) / 2}}{{(n B_{i 0} + (i + 2) {sin}^{2} φ)}^{(n - 1) / 2}} d φ$

is the Bayes factor for comparing models $M_{i}$ and $M_{0}$ for the intrinsic priors, which depends on $B_{i 0}$ , the ratio of the square sum of the residuals of models $M_{j}$ and $M_{0}$ .
For the multi-sigmoidal Weibull model, by $M_{j}$ we denote the polynomial regression model of degree j without the intercept term, $j = 1, \dots, q$ .
Given data set $(t, z^{w})$ , which comes from a model in $M_{w}$ , the posterior probability of model $M_{j}$ becomes

$P (M_{j} | t, z^{w}) = \frac{B_{j 1} (t, z^{w}) π (M j)}{\sum_{i = 1}^{q} B_{i 1} (t, z^{w}) π (M_{i})}$

where $π (M_{i}) = 1 / q$ and

$B_{i 1} (t, z^{w}) = \frac{2}{π} {(i + 1)}^{(i - 1) / 2} \int_{0}^{π / 2} \frac{{sin}^{i - 1} φ {(n + (i + 1) {sin}^{2} φ)}^{(n - i) / 2}}{{(n B_{i 1} + (i + 1) {sin}^{2} φ)}^{(n - 1) / 2}} d φ$

is the Bayes factor for comparing models $M_{i}$ and $M_{1}$ for the intrinsic priors, which depends on $B_{i 1}$ , the ratio of the square sum of the residuals of models $M_{j}$ and $M_{1}$ .

5. Application to the Description of the Evolution of the COVID-19 Pandemic

In this section, the stochastic models introduced earlier will be used to describe the evolution of COVID-19 in Spain during the first two waves of the pandemic. For the purposes of this analysis, which comprises two successive periods of infection, multi-sigmoidal models become particularly necessary.

5.1. About the Data

Since the beginning of the pandemic, the National Epidemiology Center (CNE), dependent on the Carlos III Health Institute, coordinates information related to the evolution of the disease in Spain. The CNE works at the service of public health, contributing to the control of diseases and risks in collaboration with the autonomous communities, the Ministry of Health, Consumer Affairs, and Social Welfare, and the rest of the national administrations with health-related attributions. Through the National Epidemiological Surveillance Network, this center collects the data obtained from the epidemiological survey that each Spanish autonomous community completes upon the identification of a COVID-19 case.

As is well known, keeping count of the number of positive cases posed many problems during the early stages of the epidemic because there was insufficient capacity for detection, mainly due to the shortage of diagnostic tests and the administrative chaos resulting from the collapse of the healthcare system. For this reason, the studies initially carried out were mostly based on data concerning hospitalizations and deaths, and such data were used to estimate the number of infected. This made it difficult to carry out a detailed analysis of the real evolution of the disease, and also to determine the occurrence of certain key moments such as the peak of infection. As time went by, the protocols to determine infected cases were improved, so that the data reported for the second and successive waves can be considered more reliable than those reported in the first.

The data considered are based on the number of infected and deceased individuals reported every 4 days by 15 Spanish regions between 8 March and 21 December 2020, a period that covers the first two waves of the pandemic as officially reported. Data have been extracted from https://cnecovid.isciii.es/covid19/#documentacion-y-datos (last accessed on 24 July 2021). Each data series was modified by dividing by the value observed at the first time instant, so that the data processed represent, at each t, how many times the value of confirmed cases and deaths multiplies the value initially observed.

Figure 2 shows the data. Note that the average value of the data (the solid black line) exhibits a multi-sigmoidal-type growth.

5.2. Fitting the Number of Infected Individuals

This section presents the application of the models introduced to adjust the data on the number of infected individuals. In the developed application we have addressed the following issues in sequence:

The choice of the degree of the polynomials included in the infinitesimal mean of the multi-sigmoidal Gompertz and Weibull models.
The maximum likelihood estimation of the parameters of the models.
The determination of the best model.
The study of the inflection time instants.

Results are presented next for each of these issues, according to the methodology described in previous sections. Likewise, and following the comments made in Section 4.1, the time instants have been rescaled to the

[0, 1]

interval.

5.2.1. The Choice of the Degree of Polynomials

This subsection follows the methodology presented in Section 4.3. For the Gompertz model, and after choosing a maximum degree

q = 8

, Table 2 summarizes the posterior probabilities of the polynomial regression model of degree j,

j = 0, \dots, 8

, given data set

(t, z^{g})

coming from a model in

M_{g}

. The highest posterior probability is 0.8167759, corresponding to the polynomial regression model of degree 5.

In the same way, for

q = 8

in the multi-sigmoidal Weibull model, Table 3 summarizes the posterior probabilities of the polynomial regression model of degree j without intercept term,

j = 1, \dots, 8

, given data set

(t, z^{w})

coming from a model in

M_{w}

. Again, the optimal regression polynomial model is that of degree 5, which presents a posterior probability of

0.9349964

.

5.2.2. The Maximum Likelihood Estimation of the Parameters in Each Model

We address now the maximum likelihood estimation of the multi-sigmoidal Gompertz and Weibull models. In agreement with the conclusions drawn earlier, in both cases a polynomial of degree 5 will be considered in the infinitesimal mean of each process. Although in both cases an attempt has been made to solve the non-linear system of likelihood equations, arithmetic overflows in the calculus have prevented the likelihood equations for the multi-sigmoidal Gompertz process to be established. For this reason, in this case we have chosen to directly maximize the log-likelihood function.

The estimation process has provided the following results, where

β_{i}^{g}

and

β_{i}^{w}

are the elements of vectors

β_{g}

and

β_{w}

, respectively:

By using the spectral projected gradient method, implemented in the spg R function of the BB R-package [29], the values of the parameters that maximize the logarithm of the likelihood for the multi-sigmoidal Gompertz model are: $\hat{α} = 8.0267883$ , ${\hat{β}}_{1}^{g} = 15.550577$ , ${\hat{β}}_{2}^{g} = - 71.903013$ , ${\hat{β}}_{3}^{g} = 147.217478$ , ${\hat{β}}_{4}^{g} = - 135.119470$ , ${\hat{β}}_{5}^{g} = 49.005728$ , and ${\hat{σ}}^{2} = 0.480871$ .
Applying the Newton method, implemented in the nleqslv R function of the nleqslv R-package [30], in order to solve the non-linear system of likelihood equations for the multi-sigmoidal Weibull model, the following estimates of the model parameters have been obtained: $\hat{η} = 1.0003498$ , ${\hat{β}}_{1}^{w} = 1.1278246$ , ${\hat{β}}_{2}^{w} = - 6.1992975$ , ${\hat{β}}_{3}^{w} = 19.1664358$ , ${\hat{β}}_{4}^{w} = - 30.4880009$ , ${\hat{β}}_{5}^{w} = 20.0609318$ , and ${\hat{σ}}^{2} = 0.5562937$ .

5.2.3. Determination of the Best Model

Once the models have been estimated from the observed data, the question arises of determining which of the two models is the most appropriate to describe the global behavior of the phenomenon under study. Figure 3 displays, for each model, the sample and estimated mean functions and the 95% confidence band for the mean function together with the sample paths.

In view of the plots in Figure 3, it is not easy to decide on one model or another, although we must point out that, while the multi-sigmoidal Weibull model does not replicate the data trend very well at the beginning, the multi-sigmoidal Gompertz model does. However, the confidence band plots also indicate that both models have difficulties in fitting the data at the beginning.

A global measure of how well the sample mean is fit by the mean of the estimated model is the absolute relative error given by

RAE = \frac{1}{n} \sum_{i = 1}^{n} \frac{| m_{i} - E (\hat{X} (t_{i})) |}{m_{i}} .

This measure presents values 0.06537265 and 0.2657263, respectively, for the multi-sigmoidal Gompertz and Weibull models. In this sense, the first model appears to be more suitable.

Another relevant question is which of the two models provides a better estimate of the sample distribution at each instant of time. This can be done by determining the resistor-average distance [31] between the sample and estimated distributions at each time instant.

The resistor-average distance is a symmetrized Kullback–Leibler distance defined as the harmonic sum (half the harmonic mean) of the component Kullback–Leibler distances, that is,

D_{R A} (f_{s} | | f_{e}) = \frac{D_{K L} (f_{s} | | f_{e}) D_{K L} (f_{e} | | f_{s})}{D_{K L} (f_{s} | | f_{e}) + D_{K L} (f_{e} | | f_{s})}

where

D_{K L} (f_{s} | | f_{e})

denotes the Kullback–Leibler distance between the sample distribution (

f_{s}

) and that of the estimated model (

f_{e}

).

For the models under consideration, the component Kullback–Leibler distances at time instant

t_{i}

are given by

D_{K L} (f_{s} | | f_{e}) = \frac{1}{2} [log (\frac{{\hat{σ}}^{2} (t_{i} - t_{0})}{{\hat{σ_{i}}}^{2}}) + \frac{{\hat{σ_{i}}}^{2}}{{\hat{σ}}^{2} (t_{i} - t_{0})} + \frac{{(log g_{i} - log \hat{E (X_{0})} - H_{\tilde{θ}} (t_{0}, t_{i}))}^{2}}{{\hat{σ}}^{2} (t_{i} - t_{0})} - 1]

and

D_{K L} (f_{e} | | f_{s}) = \frac{1}{2} [log (\frac{{\hat{σ_{i}}}^{2}}{{\hat{σ}}^{2} (t_{i} - t_{0})}) + \frac{{\hat{σ}}^{2} (t_{i} - t_{0})}{{\hat{σ_{i}}}^{2}} + \frac{{(log g_{i} - log \hat{E (X_{0})} - H_{\tilde{θ}} (t_{0}, t_{i}))}^{2}}{{\hat{σ_{i}}}^{2}} - 1]

where

{\hat{σ_{i}}}^{2} = 2 log (m_{i} / g_{i})

, being

m_{i}

and

g_{i}

, respectively, the sample mean and sample geometric mean at

t_{i}

.

The values of resistor-average distances allow us to appreciate the time periods in which the estimated distribution moves away or approaches the sample distribution, and how close or far it is from said distribution. Furthermore, measures such as the mean or median of the resistor-average distances values allow us to globally assess the goodness of fit and select the best model.

For the estimated models, Figure 4 shows the resistor-average distance between sample and estimated distributions as a function of time. An initial period of time is observed in which the Weibull model estimates the sample distributions much worse than the Gompertz model, and only at a later time does the Weibull model estimate the sample distributions somewhat better than the Gompertz model.

Table 4 collects some measures of interest to summarize the values of resistor-average distances. All such measures point at the multi-sigmoidal Gompertz model as the best choice.

5.2.4. The Study of Inflection Time Instants

A matter of great interest when describing the behavior of a set of growth-related data is to establish when the growing pattern changes, which is revealed by the inflection time instants. In the specific case that we are considering, these instants of time mark the moments in which infection peaks are reached in each wave.

Once we have determined and estimated an appropriate model

\hat{X} (t)

to explain the evolution of the COVID-19 data, inflection times can be estimated using the inflection points of the estimated mean function of the selected model,

\hat{m} (t) = E (\hat{X} (t))

.

By setting

\frac{d^{2}}{d t^{2}} \hat{m} (t) = 0

, we have been able to estimate three inflection time instants. Table 5 contains the estimated values of the inflections times,

{\hat{t}}_{I, j}

, together with values

S_{j} = \hat{m} ({\hat{t}}_{I, j})

for

j = 1, 2, 3

. We can conclude that changes in the average growth pattern occur on 2 April, 30 May, and 6 November.

Figure 5a depicts the second derivative of the estimated mean function. The vertical lines are located on the estimated inflection time instants. Figure 5b locates the estimated inflection time instants (vertical lines) on the graph of the estimated mean function. The horizontal lines are placed on the values of the estimated mean function at those time instants. It should be noted that the second inflection moment obtained does not seem to correspond to a peak of infection proper. Indeed, Figure 2 shows how between instants 0.2 and 0.4 (corresponding to the dates between 4 May and 30 June, approximately) the evolution of the pandemic slowed down considerably, in such a way that a plateau is observed on the graph. In that period of time, and despite fitting the observed average quite well, the estimated average shows a very slight decrease that motivates the appearance of this new inflection, without it being linked to a new wave of infection. Regarding the other two dates, they can be related to actions promoted by the Spanish government as well as the governments of the different autonomous communities. Indeed, on 14 March a state of alarm and nationwide lockdown and quarantine were imposed to control the spread of the virus, which meant that around 14 days later the peak of the first wave was reached. Similarly, around the third week of October an increase in the number of infected was observed. The reason for this must be found in the holidays that occurred in Spain on the occasion of the celebration of the National Day (12 October) and the mobility that this motivated among the population. For this reason, government actions took place, such as restricting attendance at university teaching, the closure of activities related to nightlife, and the limitation of opening hours for bars and restaurants. Again, around 14 days later, the peak of infection in this second wave was observed.

Furthermore, and regarding each of the inflection points, we can consider the first-passage-time variable, defined as the time required for data to reach the mean growth at the inflection point for the first time. By studying the distribution of this random variable we may determine, for example, the probability that the change in the growth pattern occurs in a certain period of time as well as the average or most frequent time instant in which the growth pattern changes.

The first-passage-time variable associated with each inflection time instant can be approximated by the first-passage-time variable defined as the time required for the estimated model to reach its mean growth at the estimated inflection point for the first time, that is, the time variable

T_{S_{j}} = inf_{t \geq t_{0}} \{t : \hat{X} (t) > S_{j}\} .

The probability density function of the first-passage-time variable can be obtained as the solution of a Volterra integral equation of the second kind (see Gutiérrez et al. [32,33]). Nevertheless, and apart from some particular processes and boundaries, closed-form solutions for the integral equation are not available, as is the one we are considering in this application. For this reason, for these cases numerical procedures are needed. The most usual methods are based on numerical quadrature procedures, as the composite trapezoid method (see [34,35]).

Considering the comment made earlier about the second inflection time obtained, we will now focus on the other two, for which we will obtain the first-passage-time variables using the fptdApprox R-package [36,37,38]. Figure 6 contains the density functions of random variables

T_{S_{j}}

,

j = 1, 3

, (note that we have kept the same notation used in the point estimation of the inflection time points).

The approximation procedure of the density functions provided the following information about changes in the growth pattern of the COVID-19 data:

The first change occurs, with a probability of 0.999, in [0.055306, 0.185625], that is, between 23 March and 30 April.
The second relevant change (remember the above remark) happens in time interval [0.251004, 1], that is, between 19 May and 21 December, with a probability of 0.656171. It should be noted that in this case there is a part of the range of the first-passage-time variable that exceeds the temporal limits considered in the data (see Figure 6b), which means that the complete probability mass is not confined to this interval. This is due to the fact that some regions reached the established level of contagion after 21 December.

Table 6 summarizes some of the main numerical characteristics of variables

T_{S_{j}}

,

j = 1, 3

, concretely the mean, variance, and mode, as well as some of the most relevant percentiles of the distributions. Note that in the case of variable

T_{S_{3}}

, neither the mean nor the variance have been calculated since the observation interval does not contain the complete probability mass. The percentiles falling into said interval are shown. From the values taken by these characteristics we can draw some conclusions, namely:

The first change in the growth pattern of the COVID-19 data happened, on average, at time instant 0.091806, that is, on 3 April, and more frequently at time instant 0.086358, that is, on 1 April. Of all regions, 50% reached the peak of infection before 2 April, while by 4 April they had already exceeded the peak by around 75%.
Regarding the second relevant change, it occurs more frequently at time instant 0.801248, that is, on 24 October, while 50% of regions peaked before 17 November.

5.3. Fitting the Number of Deaths

Next we show the results about the fitting of the number of deaths, for which the same methodology that has been considered with the number of infected has been followed. The study of the number of deaths has shown to be of great interest to epidemiologists. Indeed, as mentioned above, the collection of data on those infected during the first waves of the epidemic, especially during the first, has presented quite a few drawbacks meaning that the data collected does not show, most likely, the reality of the number of infected. This has given rise to an extensive literature in which procedures are shown to approximate the data of infected from that of deceased.

Since the methodology and procedures used have been described in detail in the previous application, below we summarize the main results obtained. Note that in this case it has not been necessary to rescale the time space associated with the process.

Regarding the degree of the polynomial, the selection method chooses degree 7 for both the Gompertz and Weibull models as can be deduced from Table 7.

Once the degree of the polynomial has been selected, the estimation of the parameters of both models provide the following results:

for Gompertz model, $\hat{α} = 2.869936$ , ${\hat{β}}_{1}^{g} = 6.346708 \times 10^{- 2}, {\hat{β}}_{2}^{g} = - 1.361939 \times 10^{- 3}$ , ${\hat{β}}_{3}^{g} = 1.641620 \times 10^{- 5}, {\hat{β}}_{4}^{g} = - 1.227131 \times 10^{- 7}, {\hat{β}}_{5}^{g} = 5.733615 \times 10^{- 10}, {\hat{β}}_{6}^{g} = - 1.527834 \times 10^{- 12}, {\hat{β}}_{7}^{g} = 1.774433 \times 10^{- 15}$ and ${\hat{σ}}^{2} = 7.484960 \times 10^{- 4}$ ,
for Weibull model, $\hat{η} = 1.059529, {\hat{β}}_{1}^{w} = 1.862481 \times 10^{- 2}, {\hat{β}}_{2}^{w} = - 2.421367 \times 10^{- 4}, {\hat{β}}_{3}^{w} = 1.866785 \times 10^{- 6}, {\hat{β}}_{4}^{w} = - 1.656611 \times 10^{- 8}, {\hat{β}}_{5}^{w} = 1.368341 \times 10^{- 10}, {\hat{β}}_{6}^{w} = - 5.895702 \times 10^{- 13}, {\hat{β}}_{7}^{w} = 9.569781 \times 10^{- 16}$ and ${\hat{σ}}^{2} = 7.48496 \times 10^{- 4}$ ,

from which the mean functions are estimated as well as the 95% confidence bands, whose graphs can be seen in Figure 7.

Regarding the selection of the optimal model, the results provided by the resistor-average distance (see Figure 8 and Table 8) indicate that both models fit the variable of interest quite well, although the Gompertz multi-sigmoidal model offers a better fit in the initial phase, which makes the measurements calculated from the resistor-average distance bias the decision towards said model.

Regarding the determination of the death peaks, the point estimation determines that these were reached on 31 March and 22 November, respectively. Finally, the probability density functions of the first-passage time of the process through the barriers defined by the value of the mean estimated in said time instants (

S_{1} = 2.476951

and

S_{2} = 14.392886

, respectively) have been approximated. Figure 9 shows the graph of such density functions.

Finally, by applying the proposed methodology, we can deduce the following:

The first-passage-time variable through $S_{1}$ is quite symmetric. In fact, it can be seen how the mean coincides with both the mode and the median. This leads to establishing 31 March as the date on which the peak of deceased was reached with a high probability. Furthermore, the variable is quite concentrated, observing that on 1 April, 75% of the regions had already reached the peak, while 97.5% did so on 2 April.
Regarding the second peak, which more frequent date is 20 November, while 50% of regions peaked before 27 November. Furthermore, at the end of the observed period, 64.4% of the regions have reached the peak.

6. Conclusions

In real phenomena governed by growth curves there are situations in which the maximum level of growth is reached after successive stages, in each of which there is a slowdown followed by an exponential explosion. A clear example of this is provided by the evolution of pandemics, as is the case of the current one caused by the SARS-CoV-2 virus, where the appearance of successive waves of infection has been (and continues to be) observed. This motivates the use of sigmoidal curves with more than one inflection point.

In this work, two stochastic diffusion processes are presented, their main characteristic being that their means are multi-sigmoidal growth curves derived from classical curves such as the Gompertz and the Weibull. In both cases, these curves have been generated by introducing polynomial functions in their classic expressions.

Starting from a global formulation for both processes, this paper studied the problem of estimation using the method of maximum likelihood. Parameter estimates have been obtained by solving the system of likelihood equations as well as by direct maximization of the likelihood function. This entails analyzing how to obtain initial solutions in the first case and the delimitation of the parametric space in the second. Determining the degree of the polynomial before approaching the estimation of the parameters is a fundamental question in real-world applications. To do this, a Bayesian approach to the model selection problem has been considered based on the methodology derived from intrinsic prior distributions.

The stochastic models introduced here have been applied to data concerning the evolution of the COVID-19 epidemic in Spain during the first two officially recognized waves. For this, sample trajectories have been considered corresponding to the data of infected and deceased individuals reported by 15 Spanish regions between 8 March and 21 December 2020. The method of selecting the degree of the polynomial leads to the choice of degree 5 as optimal in the case of the number of infected, while for that of the deceased the selected degree is 7. After estimating the models, we proceeded to select the optimal one for the description of the evolution of the pandemic. Although the two diffusion processes considered are good models to describe the phenomenon under study, one of them was selected as the best choice. For this, two criteria were used: the absolute relative error between the observed and the estimated mean, and the average resistor distance, which measures the discrepancy between the sample and the estimated distributions. Both criteria favor the choice of the model based on the multi-sigmoidal Gompertz curve for both the number of infected and the number of deaths.

The study of the inflection times of the estimated mean function provides an estimate of the moments when the peak of the epidemic was reached. Our conclusions were drawn considering the total number of infected and death at the national level. By including in our analysis data from several regions we can account for the variability derived from their specific characteristics. Regarding the point estimation of the inflection instants, the results indicate that, for the number of infected, the peak of the first wave was reached around 2 April, while the second peak occurred around 6 November. In the case of the number of deaths, such values are around 31 March and 22 November, respectively.

Making a comparison with the information provided by other studies, we can see how the already refined statistics from the National Epidemiology Center (CNE) of Spain indicate that the nation-wide first-wave infection peak occurred on 20 March, 6 days after the national lockdown was implemented, and the cases of COVID-19 steadily decreased after then (see [39] for details). Other studies place the peak on 26 March (Guirao, [40]), while in the work of Mora et al. [41] an interval is established that goes from 29 March to 3 April. Since the first week of July, Spain experienced an increasing trend in the cumulative incidence. The data provided by the CNE place the peak of the second wave around 27 October, two weeks after the National Day holidays (21 October), which led to a large increase in the mobility of the population. Regarding the number of deaths, the peak of the first wave is placed on 30 March, whereas the corresponding to the second wave is placed on 17 November.

Another perspective for the analysis of the tipping peaks is provided by studying the time required for data to reach, for the first time, the mean growth values at an inflection point. Our results indicate that 50% of regions reached the first peak of infection before 2 April, while by 4 April they had already exceeded the peak by around 75%. Regarding the second peak it occurred more frequently on 24 October, with 50% of regions reaching peak values before 17 November. In the case of the number of deaths, 50% of the regions reached the first peak before 31 March, while on 2 April it was 97.5% the regions that had reached it. On the other hand, 20 November is the most frequent date in which the regions reached the second peak, with 26 November being the date on which 50% of the regions had already reached it.

With the present work as a starting point, several lines of research can be suggested. Among them, the models may be modified by introducing exogenous variables that represent the therapeutic and non-therapeutic actions promoted by the authorities, and thereby explore their impact on the evolution of the pandemic. Other studies may look into how the number of death and/or hospitalized can be introduced as exogenous variables with the aim of improving the estimates of the real number of infected. Such research initiatives may be undertaken by introducing temporal variables in the infinitesimal moments of the diffusion processes being considered. Other research lines can be derived from the incorporation of this type of multisigmoidal curves to SIR-type models as well as considering diffusion processes with delays.

Finally, it is important to highlight the fact that the models presented in this work have allowed a good adjustment of both the number of infected and the number of deaths. This is of special interest given the vast literature that has been generated around the estimation of the number of infected from the number of deaths given the low quality of the data on infected during, especially, the first wave.

Author Contributions

Conceptualization, A.B., P.R.-R., J.J.S.-P. and F.T.-R.; methodology, A.B., P.R.-R., J.J.S.-P. and F.T.-R.; software, J.J.S.-P.; validation, A.B., P.R.-R., J.J.S.-P. and F.T.-R.; formal analysis, A.B., P.R.-R., J.J.S.-P. and F.T.-R.; investigation, A.B., P.R.-R., J.J.S.-P. and F.T.-R.; writing—original draft preparation, A.B., J.J.S.-P. and F.T.-R.; writing—review and editing, A.B., J.J.S.-P. and F.T.-R.; supervision, P.R.-R.; funding acquisition, F.T.-R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Ministerio de Economía, Industria y Competitividad, Spain, under Grant MTM2017-85568-P and by the FEDER, Consejería de Economía y Conocimiento de la Junta de Andalucía, Spain under Grant A-FQM-456-UGR18.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://cnecovid.isciii.es/covid19/#documentacion-y-datos (last accessed on 20 September 2021).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Standardization of [t₀, T]

For a fixed

u \in (0, T - t_{0})

, let us consider the diffusion process

\{Y (t^{'}); t^{'} \in [0, \frac{T - t_{0}}{u}]\}

, such that

Y (t^{'}) = X (t)

a.s., where

t^{'} \in [0, \frac{T - t_{0}}{u}]

and

X (t)

is the multi-sigmoidal diffusion process defined with

t \in [t_{0}, T]

. Note that

t^{'} = \frac{t - t_{0}}{u}

and then,

t = t^{'} u + t_{0}

.

Let

A_{m}^{Y} (x, t^{'})

be the infinitesimal moment of order

m > 0

of

Y (t^{'})

evaluated in a state x. Such moment can be expressed in terms of the moment of the process

X (t)

:

\begin{matrix} A_{m}^{Y} (x, t^{'}) & = lim_{h \to 0} \frac{1}{h} E [{(Y (t^{'} + h) - Y (t^{'}))}^{m} | Y (t^{'}) = x] \\ = lim_{h \to 0} \frac{1}{h} E [{(Y (\frac{t - t_{0}}{u} + h) - Y (\frac{t - t_{0}}{u}))}^{m} | Y (\frac{t - t_{0}}{u}) = x] \\ = lim_{h \to 0} \frac{1}{h} E [{(X (t + h u) - X (t))}^{m} | X (t) = x] \\ = u lim_{k \to 0} \frac{1}{k} E [{(X (t + k) - X (t))}^{m} | X (t) = x] = u A_{m}^{X} (x, t), \end{matrix}

where

k = h u

. Now, by considering the relation between t and

t^{'}

, it follows

A_{m}^{Y} (x, t^{'}) = u A_{m}^{X} (x, t^{'} u + t_{0}) .

Finally, if

u = T - t_{0}

, then

t^{'} \in [0, 1]

and

A_{m}^{Y} (x, t^{'}) = (T - t_{0}) A_{m}^{X} (x, t^{'} (T - t_{0}) + t_{0}) .

Note that the computed moments may depend on initial values such as

t_{0}

. In order to avoid this, parametric and functional modifications could be performed in some cases. In particular, the multi-sigmoidal processes with lognormal distributions are suitable to achieve this.

Indeed, after the parametric transformation with

u = T - t_{0}

, it can be defined a differentiable function

g_{θ^{'}} (t^{'}) = f_{θ} (t^{'} (T - t_{0}) + t_{0})

for

t^{'} \in [0, 1]

and a new parametric vector

θ^{'}

. Then, by the Malthusian expression (3), it follows

\begin{matrix} \frac{d}{d t^{'}} g_{θ^{'}} (t^{'}) & = \frac{d}{d t^{'}} f_{θ} (t^{'} (T - t_{0}) + t_{0}) (T - t_{0}) = f_{θ} (t^{'} (T - t_{0}) + t_{0}) h_{θ} (t^{'} (T - t_{0}) + t_{0}) (T - t_{0}) \\ = g_{θ^{'}} (t^{'}) h_{θ} (t^{'} (T - t_{0}) + t_{0}) (T - t_{0}) = g_{θ^{'}} (t^{'}) h_{θ^{'}}^{'} (t^{'}) (T - t_{0}) = g_{θ^{'}} (t^{'}) h_{θ^{'}}^{″} (t^{'}), \end{matrix}

where

h_{θ^{'}}^{'}

is a suitable modification (involving the parameter and the function itself) such that

h_{θ^{'}}^{'} (t^{'}) = h_{θ} (t^{'} (T - t_{0}) + t_{0}),

and

h_{θ^{'}}^{″}

is the new transformation in order to avoid the term

T - t_{0}

, that is,

h_{θ^{'}}^{″} (t^{'}) = h_{θ} (t^{'} (T - t_{0}) + t_{0}) (T - t_{0}) .

The reason for the distinction made by the introduction of

h^{'}

and

h^{″}

is that according to the previous relation between infinitesimal moments, the process Y is then characterized by

A_{1}^{Y} (x, t^{'}) = h_{θ^{'}}^{'} (t^{'}) (T - t_{0}) x, A_{2}^{Y} (x) = (T - t_{0}) σ^{2} x^{2} .

In order to avoid the term

T - t_{0}

, the second infinitesimal moment can be easily transformed by considering a new parameter

σ^{2^{'}} = (T - t_{0}) σ^{2}

. Analogously, the first infinitesimal moment might be transformed to

h_{θ^{'}}^{″} (t^{'}) x

. Nevertheless, such transformation is not always guaranteed and depends mainly in the functional form of

h^{'}

(or hence h).

As it has been said before, in the multi-sigmoidal case, such kind of transformation exists and can be obtained without modify the nature of the process (indeed, the base curve).

In both the multi-sigmoidal Gompertz and Weibull cases, it is necessary to consider the polynomial evaluated in the transformed parametric space. It follows that such function is indeed a polynomial with independent term. Indeed, by taking

t \in [t_{0}, T]

and

t^{'} \in [0, 1]

related as

t = t^{'} (T - t_{0}) + t_{0}

and a polynomial

Q_{β} (t) = \sum_{k}^{p} β_{k} t^{k}

of degree p, it follows

\begin{matrix} Q_{β} (t) & = Q_{β} (t^{'} (T - t_{0}) + t_{0}) = \sum_{k = 1}^{p} β_{k} {(t^{'} (T - t_{0}) + t_{0})}^{k} \\ = \sum_{k = 1}^{p} β_{k} \sum_{j = 0}^{k} (\binom{k}{j}) {t^{'}}^{j} {(T - t_{0})}^{j} t_{0}^{k - j} = γ_{0} + \sum_{m = 1}^{p} γ_{m} {t^{'}}^{m}, \end{matrix}

where

γ_{0} = \sum_{j = 1}^{p} β_{j} t_{0}^{j} = Q_{β} (t_{0}), γ_{m} = {(T - t_{0})}^{m} \sum_{j = m}^{p} β_{j} (\binom{j}{m}) t_{0}^{j - m}, m = 1, \dots, p .

Let us denote

Q_{γ}^{'} (t^{'}) : = \sum_{m = 1}^{p} γ_{m} {t^{'}}^{m}

. Summarizing, the following relation holds:

Q_{β} (t) = γ_{0} + Q_{γ}^{'} (t^{'}) .

Finally, the relation between the derivative polynomials follows easily from the chain rule:

P_{γ}^{'} (t^{'}) = \frac{d}{d t^{'}} Q_{γ}^{'} (t^{'}) = \frac{d}{d t^{'}} Q_{β} (t^{'} (T - t_{0}) + t_{0}) = P_{β} (t^{'} (T - t_{0}) + t_{0}) (T - t_{0}) = P_{β} (t) (T - t_{0}),

where

P_{β}

is the derivative of

Q_{β}

.

By applying these results to the Gompertz curve, it follows

f_{θ_{g}} (t) = f_{θ_{g}} (t^{'} (T - t_{0}) + t_{0}) = f_{0} exp (- α^{'} (e^{- Q_{γ}^{'} (t^{'})} - 1)) = v_{θ_{g}^{'}} (t^{'}),

where

α^{'} = α e^{- γ_{0}}

and

θ_{g}^{'} = {(α^{'}, γ^{T})}^{T}

. Then, from function

h_{θ_{g}} (t)

,

h_{θ_{g}^{'}}^{'} (t^{'}) = α^{'} \frac{P_{γ}^{'} (t^{'})}{T - t_{0}} e^{- Q_{γ}^{'} (t^{'})},

thus

\frac{d}{d t^{'}} v_{θ_{g}^{'}} (t^{'}) = v_{θ_{g}^{'}} (t^{'}) h_{θ_{g}^{'}} (t^{'}) .

Analogously for the Weibull model, we have

f_{θ_{w}} (t) = f_{θ_{w}} (t^{'} (T - t_{0}) + t_{0}) = f_{0} \frac{η^{'} - e^{- Q_{γ}^{'} (t^{'})}}{η^{'} - 1} = v_{θ_{w}^{'}} (t^{'}),

where

η^{'} = η e^{γ_{0}}

and

θ_{w}^{'} = {(η^{'}, γ^{T})}^{T}

. Then, from function

h_{θ_{w}} (t)

,

h_{θ_{w}^{'}}^{'} (t^{'}) = \frac{P_{γ}^{'} (t^{'})}{T - t_{0}} \frac{e^{- Q_{γ}^{'} (t^{'})}}{η^{'} - e^{- Q_{γ}^{'} (t^{'})}},

thus

\frac{d}{d t^{'}} v_{θ_{w}^{'}} (t^{'}) = v_{θ_{w}^{'}} (t^{'}) h_{θ_{w}^{'}} (t^{'}) .

References

Dong, E.; Du, H.; Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis. 2020, 20, 533–534. [Google Scholar] [CrossRef]
Verschuur, J.; Koks, E.E.; Hall, J.W. Global economic impacts of COVID-19 lockdown measures stand out in high-frequency shipping data. PLoS ONE 2021, 16, e0248818. [Google Scholar] [CrossRef] [PubMed]
Brauer, F.; Castillo-Chávez, C.; Feng, Z. Mathematical Models in Epidemiology; Springer: New York, NY, USA, 2019. [Google Scholar]
Gounane, S.; Barkouch, Y.; Atlas, A.; Bendahmane, M.; Karami, F.; Meskine, D. An adaptive social distancing SIR model for COVID-19 disease spreading and forecasting. Epidemiol. Meth. 2021, 10, 20200044. [Google Scholar] [CrossRef]
Khan, Z.; Van Bussel, F.; Hussain, F. A predictive model for Covid-19 spread—With application to eight US states and how to end the pandemic. Epidemiol. Infect. 2020, 148, E249. [Google Scholar] [CrossRef] [PubMed]
Ianni, A.; Rossi, N. Describing the COVID-19 outbreak during the lockdown: Fitting modified SIR models to data. Eur. Phys. J. Plus 2020, 135, 885. [Google Scholar] [CrossRef] [PubMed]
Alkahtani, B.S.T.; Koca, I. Fractional stochastic sır model. Results Phys. 2021, 24, 104124. [Google Scholar] [CrossRef]
Maleki, M.; Mahmoudi, M.R.; Heydari, M.H.; Pho, K.H. Modeling and forecasting the spread and death rate of coronavirus (COVID-19) in the world using time series models. Chaos Soliton Fractals 2020, 140, 110151. [Google Scholar] [CrossRef] [PubMed]
Ünlü1, R.; Namlh, E. Machine Learning and Classical Forecasting Methods Based Decision Support Systems for COVID-19. Comput. Mater. Contin. 2020, 64, 1383–1399. [Google Scholar] [CrossRef]
Lawson, A.B.; Kim, J. Space-time covid-19 Bayesian SIR modeling in South Carolina. PLoS ONE 2021, 16, e0242777. [Google Scholar] [CrossRef]
Acal, C.; Escabias, M.; Aguilera, A.M.; Valderrama, M.J. COVID-19 Data Imputation by Multiple Function-on-Function Principal Component Regression. Mathematics 2021, 9, 1237. [Google Scholar] [CrossRef]
Hsieh, Y.H.; Fisman, D.N.; Wu, J. On epidemic modeling in real time: An application to the 2009 Novel A (H1N1) influenza outbreak in Canada. BMC Res. Notes 2010, 3, 283. [Google Scholar] [CrossRef] [Green Version]
Wang, S.-H.; Wu, J.; Yang, Y. Richards model revisited: Validation by and application to infection dynamics. J. Theor. Biol. 2012, 313, 12–19. [Google Scholar] [CrossRef]
Català, M.; Alonso, S.; Alvarez-Lacalle, E.; López, D.; Cardona, P.-J.; Prats, C. Empirical model for short-time prediction of COVID-19 spreading. PLoS Comput. Biol. 2020, 16, e1008431. [Google Scholar] [CrossRef]
Li, Q.; Bedi, T.; Lehmann, C.U.; Xiao, G.; Xie, Y. Evaluating short-term forecasting of COVID-19 cases among different epidemiological models under a Bayesian framework. GigaScience 2021, 10, 1–11. [Google Scholar] [CrossRef]
Tovissodé, C.F.; Lokonon, B.E.; Glèlè Kakaï, R. On the use of growth models to understand epidemic outbreaks with application to COVID-19 data. PLoS ONE 2020, 15, e0240578. [Google Scholar] [CrossRef] [PubMed]
Turner, M.E.; Bradley, E.L.; Kirk, K.A.; Pruitt, K.M. A theory of growth. Math. Biosci. 1976, 29, 367–373. [Google Scholar] [CrossRef]
Román-Román, P.; Serrano-Pérez, J.J.; Torres-Ruiz, F. A Note on Estimation of Multi-Sigmoidal Gompertz Functions with Random Noise. Mathematics 2019, 7, 541. [Google Scholar] [CrossRef] [Green Version]
Di Crescenzo, A.; Paraggio, P.; Román-Román, P.; Torres-Ruiz, F. Applications of the multi-sigmoidal deterministic and stochastic logistic models for plant dynamics. Appl. Math. Model. 2021, 92, 884–904. [Google Scholar] [CrossRef]
Barrera, A.; Román-Román, P.; Torres-Ruiz, F. Hyperbolastic type-III diffusion process: Obtaining from the generalized Weibull diffusion process. Math. Biosci. Eng. 2020, 17, 814–833. [Google Scholar] [CrossRef]
Román-Román, P.; Torres-Ruiz, F. The nonhomogeneous lognormal diffusion process as a general process to model particular types of growth patterns. In Lecture Notes of Seminario Interdisciplinare di Matematica; Università degli Studi della Basilicata: Potenza, Italy, 2015; Volume XII, pp. 201–219. [Google Scholar]
Román-Román, P.; Torres-Ruiz, F. A stochastic model related to the Richards-type growth curve. Estimation by means of Simulated Annealing and Variable Neighborhood Search. Appl. Math. Comput. 2015, 266, 579–598. [Google Scholar] [CrossRef]
Da Luz Sant’Ana, I.; Román-Román, P.; Torres-Ruiz, F. Modeling oil production and its peak by means of a stochastic diffusion process based on the Hubbert curve. Energy 2017, 133, 455–470. [Google Scholar] [CrossRef]
Gutiérrez, R.; Rico, N.; Román, P.; Torres, F. Approximate and generalized confidence bands for some parametric functions of the lognormal diffusion process with exogenous factors. Sci. Math. Jpn. 2006, 64, 313–329. [Google Scholar]
Román-Román, P.; Román-Román, S.; Serrano-Pérez, J.J.; Torres-Ruiz, F. Some Notes about inference for the lognormal diffusion process with exogenous factors. Mathematics 2018, 6, 85. [Google Scholar] [CrossRef] [Green Version]
Moreno, E.; Girón, F.J.; Martínez, M.L.; Torres, F. Objective Testing Procedures in Linear Models: Calibration of the p-values. Scand. J. Statist. 2006, 33, 765–784. [Google Scholar] [CrossRef]
Moreno, E.; Girón, F.J.; Casella, G. Posterior Model Consistency in Variable Selection as the Model Dimension Grows. Stat. Sci. 2015, 30, 228–241. [Google Scholar] [CrossRef]
Moreno, E.; Vázquez-Polo, F.; Negrín, M. Bayesian Cost-Effectiveness Analysis of Medical Treatments, 1st ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2019; ISBN 9781138731738. [Google Scholar] [CrossRef]
Varadhan, R.; Gilbert, P. BB: An R Package for Solving a Large System of Nonlinear Equations and for Optimizing a High-Dimensional Nonlinear Objective Function. J. Stat. Softw. 2009, 32. [Google Scholar] [CrossRef] [Green Version]
Hasselman, B. nleqslv: Solve Systems of Nonlinear Equations. R Package Version 3.3.2. 2018. Available online: https://CRAN.R-project.org/package=nleqslv (accessed on 24 July 2021).
Johnson, D.H.; Sinanovic, S. Symmetrizing the Kullback-Leibler Distance. 2001. Available online: https://scholarship.rice.edu/handle/1911/19969 (accessed on 24 July 2021).
Gutiérrez, R.; Román, P.; Torres, F. A note on the Volterra integral equation for the first-passage-time probability density. J. Appl. Probab. 1995, 32, 635–648. [Google Scholar] [CrossRef]
Gutiérrez, R.; Ricciardi, L.; Román, P.; Torres, F. First-passage-time densities for time-non-homogeneous diffusion processes. J. Appl. Probab. 1997, 34, 623–631. [Google Scholar] [CrossRef] [Green Version]
Buonocore, A.; Nobile, A.; Ricciardi, L. A new integral equation for the evaluation of first-passage-time probability densities. Adv. Appl. Probab. 1987, 19, 784–800. [Google Scholar] [CrossRef]
Román, P.; Serrano, J.J.; Torres, F. First-passage-time location function: Application to determine first-passage-time densities in diffusion processes. Comput. Stat. Data Anal. 2008, 52, 4132–4146. [Google Scholar] [CrossRef]
Román-Román, P.; Serrano-Pérez, J.J.; Torres-Ruiz, F. An R package for an efficient approximation of first-passage-time densities for diffusion processes based on the FPTL function. Appl. Math. Comput. 2012, 218, 8408–8428. [Google Scholar] [CrossRef]
Román-Román, P.; Serrano-Pérez, J.J.; Torres-Ruiz, F. More general problems on first-passage-times for diffusion processes: A new version of the R Package fptdApprox. Appl. Math. Comput. 2014, 244, 432–446. [Google Scholar] [CrossRef]
Román-Román, P.; Serrano-Pérez, J.J.; Torres-Ruiz, F. fptdApprox: Approximation of First-Passage-Time Densities for Diffusion Processes. R Package Version 2.2. 2020. Available online: https://cran.r-project.org/package=fptdApprox (accessed on 24 July 2021).
Working Group for the Surveillance and Control of COVID-19 in Spain. The first wave of the COVID-19 pandemic in Spain: Characterisation of cases and risk factors for severe outcomes, as at 27 April 2020. Eurosurveillance 2020, 25, 2001431. [Google Scholar] [CrossRef]
Guirao, A. The Covid-19 outbreak in Spain. A simple dynamics model, some lessons, and a theoretical framework for control response. Infect. Dis. Model. 2020, 5, 652–669. [Google Scholar] [CrossRef] [PubMed]
Mora, J.C.; Pérez, S.; Dvorzhak, A. Application of a Semi-Empirical Dynamic Model to Forecast the Propagation of the COVID-19 Epidemics in Spain. Forecasting 2020, 2, 452–469. [Google Scholar] [CrossRef]

Figure 1. Examples of multi-sigmoidal curves and their inflections. Next to every curve, its first and second derivatives are shown. The first two columns correspond to the multi-sigmoidal Gompertz model, and the others to the multi-sigmoidal Weibull. The rows show, from top to bottom, the cases with no inflection, one and two inflections, respectively, (marked with a red vertical dashed line in the plot of the derivatives).

Figure 2. Number of infected individuals (left) and deaths (right) corresponding to the first two waves of the pandemic in Spain. The solid black lines represent the sample means.

Figure 3. Study of infected: Sample and estimated mean functions (left) and 95% confidence band for the mean function together with the sample paths (right) for (a) the multi-sigmoidal Gompertz model and (b) the multi-sigmoidal Weibull model.

Figure 4. Study of infected: Resistor-average distance between sample and estimated distributions as a function of time for the two estimated models.

Figure 5. Study of infected: Estimated inflection time instants from (a) the second derivative of the estimated mean function and (b) the estimated mean function for the multi-sigmoidal Gompertz model. The vertical lines are located on the estimated values.

Figure 6. Study of infected: Probability density functions of

T_{S_{1}}

(a) and

T_{S_{3}}

(b) corresponding to the first-passage times of the estimated multi-sigmoidal diffusion process through the constant boundaries

S_{1}

and

S_{3}

shown in Table 5.

Figure 6. Study of infected: Probability density functions of

T_{S_{1}}

(a) and

T_{S_{3}}

(b) corresponding to the first-passage times of the estimated multi-sigmoidal diffusion process through the constant boundaries

S_{1}

and

S_{3}

shown in Table 5.

Figure 7. Study of deceased: Sample and estimated mean functions (left) and 95% confidence band for the mean function together with the sample paths (right) for (a) the multi-sigmoidal Gompertz model and (b) the multi-sigmoidal Weibull model.

Figure 8. Study of deceased: Resistor-average distance between sample and estimated distributions as a function of time for the two estimated models.

Figure 9. Study of deceased: Probability density functions corresponding to the first-passage times of the estimated multi-sigmoidal Gompertz diffusion process through the constant boundaries

S_{1}

(a) and

S_{2}

(b).

Figure 9. Study of deceased: Probability density functions corresponding to the first-passage times of the estimated multi-sigmoidal Gompertz diffusion process through the constant boundaries

S_{1}

(a) and

S_{2}

(b).

Table 1. Polynomial coefficients for examples in Figure 1.

Number of Inflections	$β_{g}$	$β_{w}$
0	$(0.4, - 0.005, 0.0002)$	$(0.3, - 0.003, 0.0001)$
1	$(0.01, 0.003, 0.001)$	$(0.001, 0.0005, 0.001)$
2	$(0.025, - 0.005, 0.0015)$	$(0.03, - 0.003, 0.0005)$

Table 2. Study of infected: Posterior probabilities of the polynomial regression model of degree less than or equal to 8 for the multi-sigmoidal Gompertz model.

Degree	Posterior Probability
0	$8.150422 \times 10^{- 80}$
1	$1.392585 \times 10^{- 60}$
2	$1.652217 \times 10^{- 46}$
3	$1.092432 \times 10^{- 7}$
4	$7.498777 \times 10^{- 9}$
5	$8.167759 \times 10^{- 1}$
6	$9.750507 \times 10^{- 2}$
7	$7.297665 \times 10^{- 2}$
8	$1.274224 \times 10^{- 2}$

Table 3. Study of infected: Posterior probabilities of the polynomial regression model of degree less than or equal to 8 for the multi-sigmoidal Weibull model.

Degree	Posterior Probability
1	$2.690540 \times 10^{- 67}$
2	$3.882898 \times 10^{- 50}$
3	$1.321680 \times 10^{- 21}$
4	$1.125084 \times 10^{- 2}$
5	$9.349964 \times 10^{- 1}$
6	$3.856249 \times 10^{- 2}$
7	$1.366770 \times 10^{- 3}$
8	$1.382346 \times 10^{- 2}$

Table 4. Study of infected: Measures of interest from the values of resistor-average distances for each model.

	Minimum	Mean	Median	Maximum	Variance
Gompertz model	0.0000031	0.1406574	0.0288676	1.1691629	0.0554912
Weibull model	0.0000914	0.4240050	0.0349878	13.7878014	3.2147742

Table 5. Study of infected: Estimated inflection time instants.

${\hat{t}}_{I, j}$	$S_{j}$
0.08893562	120.8690
0.28901782	271.8643
0.84546491	1925.691

Table 6. Study of infected: Numerical characteristics of the inflection time random variables for the COVID-19 data.

	$T_{S_{1}}$	$T_{S_{3}}$
Mean	0.091806
Variance	0.000305
Modes	0.086358	0.801248
0.1%	0.065957	0.551516
2.5%	0.072643	0.645670
25%	0.082497	0.778273
Median	0.089167	0.883515
75%	0.097491
97.5%	0.122761
99.9%	0.214438

Table 7. Study of deceased: Posterior probabilities of the polynomial regression model of degree less than or equal to 8.

	Posterior Probability
Degree	Gompertz Model	Weibull Model
0	$7.389174 \times 10^{- 106}$
1	$1.476089 \times 10^{- 92}$	$2.685957 \times 10^{- 98}$
2	$1.831829 \times 10^{- 87}$	$1.682021 \times 10^{- 94}$
3	$2.687253 \times 10^{- 59}$	$2.830093 \times 10^{- 70}$
4	$1.356825 \times 10^{- 56}$	$3.805252 \times 10^{- 60}$
5	$4.724887 \times 10^{- 16}$	$3.222198 \times 10^{- 36}$
6	$1.365893 \times 10^{- 10}$	$2.136724 \times 10^{- 6}$
7	$9.999999 \times 10^{- 1}$	$8.852732 \times 10^{- 1}$
8	$1.592107 \times 10^{- 34}$	$1.147246 \times 10^{- 1}$

Table 8. Study of deceased: Measures of interest from the values of resistor-average distances for each model.

	Minimum	Mean	Median	Maximum	Variance
Gompertz model	0.000424	0.041400	0.011593	0.274669	0.004803
Weibull model	0.000121	0.060799	0.011898	1.223415	0.027666

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Barrera, A.; Román-Román, P.; Serrano-Pérez, J.J.; Torres-Ruiz, F. Two Multi-Sigmoidal Diffusion Models for the Study of the Evolution of the COVID-19 Pandemic. Mathematics 2021, 9, 2409. https://doi.org/10.3390/math9192409

AMA Style

Barrera A, Román-Román P, Serrano-Pérez JJ, Torres-Ruiz F. Two Multi-Sigmoidal Diffusion Models for the Study of the Evolution of the COVID-19 Pandemic. Mathematics. 2021; 9(19):2409. https://doi.org/10.3390/math9192409

Chicago/Turabian Style

Barrera, Antonio, Patricia Román-Román, Juan José Serrano-Pérez, and Francisco Torres-Ruiz. 2021. "Two Multi-Sigmoidal Diffusion Models for the Study of the Evolution of the COVID-19 Pandemic" Mathematics 9, no. 19: 2409. https://doi.org/10.3390/math9192409

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Two Multi-Sigmoidal Diffusion Models for the Study of the Evolution of the COVID-19 Pandemic

Abstract

1. Introduction

2. Gompertz and Weibull Multi-Sigmoidal Curves

3. Gompertz and Weibull Multi-Sigmoidal Diffusion Processes

4. Inference

4.1. About $[t_{0}, T]$

4.2. Initial Solutions

4.3. Degree of Polynomial

5. Application to the Description of the Evolution of the COVID-19 Pandemic

5.1. About the Data

5.2. Fitting the Number of Infected Individuals

5.2.1. The Choice of the Degree of Polynomials

5.2.2. The Maximum Likelihood Estimation of the Parameters in Each Model

5.2.3. Determination of the Best Model

5.2.4. The Study of Inflection Time Instants

5.3. Fitting the Number of Deaths

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Standardization of [t₀, T]

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Two Multi-Sigmoidal Diffusion Models for the Study of the Evolution of the COVID-19 Pandemic

Abstract

1. Introduction

2. Gompertz and Weibull Multi-Sigmoidal Curves

3. Gompertz and Weibull Multi-Sigmoidal Diffusion Processes

4. Inference

4.1. About [ t 0 , T ]

4.2. Initial Solutions

4.3. Degree of Polynomial

5. Application to the Description of the Evolution of the COVID-19 Pandemic

5.1. About the Data

5.2. Fitting the Number of Infected Individuals

5.2.1. The Choice of the Degree of Polynomials

5.2.2. The Maximum Likelihood Estimation of the Parameters in Each Model

5.2.3. Determination of the Best Model

5.2.4. The Study of Inflection Time Instants

5.3. Fitting the Number of Deaths

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Standardization of [t0, T]

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.1. About $[t_{0}, T]$

Appendix A. Standardization of [t₀, T]