Prior Sensitivity Analysis in a Semi-Parametric Integer-Valued Time Series Model

Graziadei, Helton; Lijoi, Antonio; Lopes, Hedibert F.; Marques F., Paulo C.; Prünster, Igor

doi:10.3390/e22010069

Open AccessArticle

Prior Sensitivity Analysis in a Semi-Parametric Integer-Valued Time Series Model

by

Helton Graziadei

^1,*

,

Antonio Lijoi

²,

Hedibert F. Lopes

³,

Paulo C. Marques F.

³

and

Igor Prünster

²

¹

Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo 05508-090, Brazil

²

Department of Decision Sciences and BIDSA, Bocconi University, via Röntgen 1, 20136 Milano, Italy

³

Insper Institute of Education and Research, Rua Quatá 300, São Paulo 04546-042, Brazil

^*

Author to whom correspondence should be addressed.

Entropy 2020, 22(1), 69; https://doi.org/10.3390/e22010069

Submission received: 26 November 2019 / Revised: 23 December 2019 / Accepted: 3 January 2020 / Published: 6 January 2020

(This article belongs to the Special Issue Data Science: Measuring Uncertainties)

Download

Browse Figures

Versions Notes

Abstract

:

We examine issues of prior sensitivity in a semi-parametric hierarchical extension of the INAR(p) model with innovation rates clustered according to a Pitman–Yor process placed at the top of the model hierarchy. Our main finding is a graphical criterion that guides the specification of the hyperparameters of the Pitman–Yor process base measure. We show how the discount and concentration parameters interact with the chosen base measure to yield a gain in terms of the robustness of the inferential results. The forecasting performance of the model is exemplified in the analysis of a time series of worldwide earthquake events, for which the new model outperforms the original INAR(p) model.

Keywords:

time series of counts; Bayesian hierarchical modeling; Bayesian nonparametrics; Pitman–Yor process; prior sensitivity; clustering; Bayesian forecasting

1. Introduction

Integer-valued time series are relevant to many fields of knowledge, ranging from finance and econometrics to ecology and meteorology. An extensive number of models for this kind of data has been proposed since the introduction of the INAR(1) model in the pioneering works of McKenzie [1] and Al-Osh and Alzaid [2] (see also the book by Weiss [3]). A higher-order INAR(p) model was considered in the work of Du and Li [4].

In this paper, we generalize the Bayesian version of the INAR(p) model studied by Neal and Kypraios [5]. In our model, the innovation rates are allowed to vary through time, with the distribution of the innovation rates being modeled hierarchically by means of a Pitman–Yor process [6]. In this way, we account for potential heterogeneity in the innovation rates as the process evolves through time, and this feature is automatically incorporated in the Bayesian forecasting capabilities of the model.

The semi-parametric form of the model demands a robustness analysis of our inferential conclusions as we vary the hyperparameters of the Pitman–Yor process. We investigate this prior sensitivity issue carefully and find ways to control the hyperparameters in order to achieve robust results.

This paper is organized as follows. In Section 2, we construct a generalized INAR(p) model with variable innovation rates. The likelihood function of the generalized model is derived and a data augmentation scheme is developed, which gives a specification of the model in terms of conditional distributions. This data augmented representation of the model enables the derivation in Section 4 of full conditional distributions in simple analytical form, which are essential for the stochastic simulations in Section 5. Section 3 recollects the main properties of the Pitman–Yor process which are used to define the PY-INAR(p) model in Section 4, including its clustering properties. In building the PY-INAR(p), we propose a form for the prior distribution of the thinning parameters vector which improves on the choice made for the Bayesian INAR(p) model studied in [5]. In Section 5, we investigate the robustness of the inference with respect to changes in the Pitman–Yor process hyperparameters. Using the full conditional distributions of the innovation rates derived in Section 4, we inspect the behavior of the model as we concentrate or spread the mass of the Pitman–Yor base measure. This leads us to a graphical criterion that identifies an elbow in the posterior expectation of the number of clusters as we vary the hyperparameters of the base measure. Once we have control over the base measure, we study its interaction with the concentration and discount hyperparameters, showing how to make choices that yield robust results. In the course of this development, we use geometrical tools to inspect the clustering of the innovation rates produced by the model. Section 6 puts the graphical criterion to work for simulated data. In Section 7, using a time series of worldwide earthquake events, we finish the paper comparing the forecasting performance of the PY-INAR(p) model against the original INAR(p) model, with favorable results.

2. A Generalization of the INAR(p) Model

We begin by generalizing the original INAR(p) model of Du and Li [4] as follows.

Let

{Y_{t}}_{t \geq 1}

be an integer-valued time series, and, for some integer

p \geq 1

, let the innovations

{Z_{t}}_{t \geq p + 1}

, given positive parameters

{λ_{t}}_{t \geq p + 1}

, be a sequence of conditionally independent

Poisson (λ_{t})

random variables. For a given vector of parameters

α = (α_{1}, \dots, α_{p}) \in {[0, 1]}^{p}

, let

F_{i} = {B_{i j} (t) : j \geq 0, t \geq 2}

be a family of conditionally independent and identically distributed

Bernoulli (α_{i})

random variables. For

i \neq k

, suppose that

F_{i}

and

F_{k}

are conditionally independent, given

α

. Furthermore, assume that the innovations

{Z_{t}}_{t \geq p + 1}

and the families

F_{1}, \dots, F_{p}

are conditionally independent, given

α

and

λ

. The generalized INAR(p) model is defined by the functional relation

Y_{t} = α_{1} \circ Y_{t - 1} + \dots + α_{p} \circ Y_{t - p} + Z_{t},

for

t \geq p + 1

, in which ∘ denotes the binomial thinning operator, defined by

α_{i} \circ Y_{t - i} = \sum_{j = 1}^{Y_{t - i}} B_{i j} (t)

, if

Y_{t - i} > 0

, and

α_{i} \circ Y_{t - i} = 0

, if

Y_{t - i} = 0

. In the homogeneous case, when all the

λ_{t}

’s are assumed to be equal, we recover the original INAR(p) model.

When

p = 1

, this model can be interpreted as specifying a birth-and-death process, in which, at epoch t, the number of cases

Y_{t}

is equal to the new cases

Z_{t}

plus the cases that survived from the previous epoch; the role of the binomial thinning operator being to remove a random number of the

Y_{t - 1}

cases present at the previous epoch

t - 1

(see [7] for an interpretation of the order p case as a birth-and-death process with immigration).

Let

y = (y_{1}, \dots, y_{T})

denote the values of an observed time series. For simplicity, we assume that

Y_{1} = y_{1}, \dots, Y_{p} = y_{p}

with probability one. The joint distribution of

Y_{1}, \dots, Y_{T}

, given parameters

α

and

λ = (λ_{p + 1}, \dots, λ_{T})

, can be factored as

Pr {Y_{1} = y_{1}, \dots, Y_{T} = y_{T} ∣ α, λ} = \prod_{t = p + 1}^{T} Pr {Y_{t} = y_{t} ∣ Y_{t - 1} = y_{t - 1}, \dots, Y_{t - p} = y_{t - p}, α, λ_{t}} .

Since, with probability one,

α_{i} \circ Y_{t - i} \leq Y_{t - i}

and

Z_{t} \geq 0

, the likelihood function of the generalized INAR(p) model is given by

\begin{matrix} L_{y} (α, λ) = \prod_{t = p + 1}^{T} \sum_{m_{1, t} = 0}^{min {y_{t}, y_{t - 1}}} \dots \sum_{m_{p, t} = 0}^{min {y_{t} - \sum_{j = 1}^{p - 1} m_{j, t}, y_{t - p}}} & (\prod_{i = 1}^{p} (\binom{y_{t - i}}{m_{i, t}}) α_{i}^{m_{i, t}} {(1 - α_{i})}^{y_{t - i} - m_{i, t}}) \times (\frac{e^{- λ_{t}} λ_{t}^{y_{t} - \sum_{j = 1}^{p} m_{j, t}}}{(y_{t} - \sum_{j = 1}^{p} m_{j, t})!}) . \end{matrix}

For some epoch t and

i = 1, \dots, p

, suppose that we could observe the values of the latent maturations

M_{i, t}

. Postulate that

M_{i, t} ∣ Y_{t - i} = y_{t - i}, α_{i} \sim Binomial (y_{t - i}, α_{i})

, so that the conditional probability function of

M_{i, t}

is given by

\begin{matrix} p (m_{i, t} ∣ y_{t - i}, α_{i}) & = Pr {M_{i, t} = m_{i, t} ∣ Y_{t - i} = y_{t - i}, α_{i}} \\ = (\binom{y_{t - i}}{m_{i, t}}) α_{i}^{m_{i, t}} {(1 - α_{i})}^{y_{t - i} - m_{i, t}} I_{{0, \dots, y_{t - i}}} (m_{i, t}) . \end{matrix}

Furthermore, suppose that

\begin{matrix} p (y_{t} ∣ m_{1, t}, \dots, m_{p, t}, λ_{t}) & = Pr {Y_{t} = y_{t} ∣ M_{1, t} = m_{1, t}, \dots, M_{p, t} = m_{p, t}, λ_{t}} \\ = \frac{e^{- λ_{t}} λ_{t}^{y_{t} - \sum_{j = 1}^{p} m_{j, t}}}{(y_{t} - \sum_{j = 1}^{p} m_{j, t})!} I_{{\sum_{j = 1}^{p} m_{j, t}, \sum_{j = 1}^{p} m_{j, t} + 1, \dots}} (y_{t}) . \end{matrix}

Using the law of total probability and the product rule, we have that

\begin{matrix} p (y_{t} ∣ y_{t - 1}, \dots, y_{t - p}, α, λ_{t}) & = \sum_{m_{1, t} = 0}^{y_{t - 1}} \dots \sum_{m_{p, t} = 0}^{y_{t - p}} p (y_{t}, m_{1, t}, \dots, m_{p, t} ∣ y_{t - 1}, \dots, y_{t - p}, α, λ_{t}) \\ = \sum_{m_{1, t} = 0}^{y_{t - 1}} \dots \sum_{m_{p, t} = 0}^{y_{t - p}} p (y_{t} ∣ m_{1, t}, \dots, m_{p, t}, λ_{t}) \times \prod_{i = 1}^{p} p (m_{i, t} ∣ y_{t - i}, α_{i}) . \end{matrix}

Since

\begin{matrix} I_{{\sum_{j = 1}^{p} m_{j, t}, \sum_{j = 1}^{p} m_{j, t} + 1, \dots}} (y_{t}) & = I_{{0, \dots, y_{t}}} (\sum_{j = 1}^{p} m_{j, t}) \\ = I_{{0, \dots, y_{t}}} (m_{1, t}) \times \dots \times I_{{0, \dots, y_{t} - \sum_{j = 1}^{p - 1} m_{j, t}}} (m_{p, t}) \end{matrix}

and

I_{{\sum_{j = 1}^{p} m_{j, t}, \sum_{j = 1}^{p} m_{j, t} + 1, \dots}} (y_{t}) \times I_{{0, \dots, y_{t - i}}} (m_{i, t}) = I_{{0, 1, \dots, min {y_{t} - \sum_{j \neq i} m_{j, t}, y_{t - i}}}} (m_{i, t}),

we recover the original likelihood of the generalized INAR(p), showing that the introduction of the latent maturations

M_{i, t}

with the specified distributions is a valid data augmentation scheme (see [8,9] for a general discussion of data augmentation techniques).

In the next section, we review the needed definitions and properties of the Pitman–Yor process.

3. Pitman–Yor Process

Let the random probability measure

G \sim DP (τ, G_{0})

be a Dirichlet process [10,11,12] with concentration parameter

τ

and base measure

G_{0}

. If the random variables

X_{1}, \dots, X_{n}

, given

G = G

, are conditionally independent and identically distributed as G, then it follows that

Pr {X_{n + 1} \in B ∣ X_{1} = x_{1}, \dots, X_{n} = x_{n}} = \frac{τ}{τ + n} G_{0} (B) + \frac{1}{τ + n} \sum_{i = 1}^{n} I_{B} (x_{i}),

for every Borel set B. If we imagine the sequential generation of the

X_{i}

’s, for

i = 1, \dots, n

, the former expression shows that a value is generated anew from

G_{0}

with probability proportional to

τ

, or we repeat one the previously generated values with probability proportional to its multiplicity. Therefore, almost surely, realizations of a Dirichlet process are discrete probability measures, perhaps with denumerable infinite support, depending on the nature of

G_{0}

. Also, this data-generating process, known as the Pólya–Blackwell–MacQueen urn, implies that the

X_{i}

’s are “softly clustered”, in the sense that in one realization of the process the elements of a subset of the

X_{i}

’s may have exactly the same value.

The Pitman–Yor process [6] is a generalization of the Dirichlet process which results in a model with added flexibility. Essentially, the Pitman–Yor process modifies the expression of the probability associated with the Pólya-Blackwell-MacQueen urn introducing a new parameter so that the posterior predictive probability becomes

Pr {X_{n + 1} \in B ∣ X_{1} = x_{1}, \dots, X_{n} = x_{n}} = \frac{τ + k σ}{τ + n} G_{0} (B) + \frac{1}{τ + n} \sum_{i = 1}^{n} (1 - \frac{σ}{n_{i}}) I_{B} (x_{i}),

in which

0 \leq σ < 1

is the discount parameter,

τ > - σ

, k is the number of distinct elements in

{X_{1}, \dots, X_{n}}

, and

n_{i}

is the number of elements in

{X_{1}, \dots, X_{n}}

which are equal to

X_{i}

, for

i = 1, \dots, n

. It is well known that

E [G (B)] = G_{0} (B)

and

Var [G (B)] = (\frac{1 - σ}{τ + 1}) G_{0} (B) (1 - G_{0} (B)),

for every Borel set B. Hence,

G

is centered on the base probability measure

G_{0}

, while

τ

and

σ

control the concentration of

G

around

G_{0}

. We use the notation

G \sim PY (τ, σ, G_{0})

. When

σ = 0

, we recover the Dirichlet process as a special case. The PY process is also defined for

σ < 0

and

τ = | σ | m

, for some positive integer m. For our purposes, it is enough to consider the case of non-negative

σ

.

Pitman [6] derived the distribution of the number of clusters K (the number of distinct

X_{i}

’s), conditionally on both the concentration parameter

τ

and the discount parameter

σ

, as

Pr {K = k ∣ τ, σ} = \frac{\prod_{i = 1}^{k - 1} (τ + i σ)}{σ^{k} \times {(τ + 1)}_{n - 1}} \times C (n, k; σ),

in which

{(x)}_{n} = Γ (x + n) / Γ (x)

is the rising factorial and

C (n, k; σ)

is the generalized factorial coefficient [13].

In the next section, we use a Pitman–Yor process to model the distribution of the innovation rates in the generalized INAR(p) model.

4. PY-INAR(p) Model

The PY-INAR(p) model is as a hierarchical extension of the generalized INAR(p) model defined in Section 2. Given a random measure

G \sim PY (τ, σ, G_{0})

, in which

G_{0}

is a

Gamma (a_{0}, b_{0})

distribution, let the innovation rates

λ_{p + 1}, \dots, λ_{T}

be conditionally independent and identically distributed with distribution

Pr {λ_{t} \in B ∣ G = G} = G (B)

.

To complete the PY-INAR(p) model, we need to specify the form of the prior distribution for the vector of thinning parameters

α = (α_{1}, \dots, α_{p})

. By comparison with standard results from the theory of the AR(p) model [14], Du and Li [4] found that in the INAR(p) model the constraint

\sum_{i = 1}^{p} α_{i} < 1

must be fulfilled to guarantee the non-explosiveness of the process. In their Bayesian analysis of the INAR(p) model, Neal and Kypraios [5] considered independent beta distributions for the

α_{i}

’s. Unfortunately, this choice is problematic. For example, in the particular case when the

α_{i}

’s have independent uniform distributions, it is possible to show that

Pr {\sum_{i = 1}^{p} α_{i} < 1} = 1 / p!

, implying that we would be concentrating most of the prior mass on the explosive region even for moderate values of the model order p. We circumvent this problem using a prior distribution for

α

that places all of its mass on the nonexplosive region and still allows us to derive the full conditional distributions of the

α_{i}

’s in simple closed form. Specifically, we take the prior distribution of

α

to be a Dirichlet distribution with hyperparameters

(a_{1}, \dots, a_{p}; a_{p + 1})

, and corresponding density

π (α) = \frac{Γ (\sum_{i = 1}^{p + 1} a_{i})}{\prod_{i = 1}^{p + 1} Γ (a_{i})} \prod_{i = 1}^{p + 1} α_{i}^{a_{i} - 1},

in which

a_{i} > 0

, for

i = 1, \dots, p + 1

, and

α_{p + 1} = 1 - \sum_{i = 1}^{p} α_{i}

.

Let

m = {m_{i, t}

:

i = 1, \dots, p

,

t = p + 1, \dots, T}

denote the set of all maturations, and let

μ_{G}

be the distribution of

G

. Our strategy to derive the full conditionals distributions of the model parameters and latent variables is to consider the marginal distribution

\begin{matrix} p (y, m, α, λ) & = \int p (y, m, α, λ ∣ G) d μ_{G} (G) \\ = \{\prod_{t = p + 1}^{T} p (y_{t} ∣ m_{1, t}, \dots, m_{p, t}, λ_{t}) \prod_{i = 1}^{p} p (m_{i, t} ∣ y_{t - i}, α_{i})\} \\ \times π (α) \times \int \prod_{t = p + 1}^{T} p (λ_{t} ∣ G) d μ_{G} (G) . \end{matrix}

From this expression, using the results in Section 3, the derivation of the full conditional distributions is straightforward. In the following expressions, the symbol ∝ denotes proportionality up to a suitable normalization factor, and the label “all others” designate the observed counts y and all the other latent variables and model parameters, with the exception of the one under consideration.

Let

λ_{\ t}

denote the set

{λ_{p + 1}, \dots, λ_{T}}

with the element

λ_{t}

removed. Then, for

t = p + 1, \dots, T

, we have

λ_{t} ∣ all others \sim w_{t} \times Gamma (y_{t} - m_{t} + a_{0}, b_{0} + 1) + \sum_{r \neq t} (1 - \frac{σ}{n_{r}}) λ_{r}^{y_{t} - m_{t}} e^{- λ_{r}} δ_{{λ_{r}}},

in which the weight

w_{t} = \frac{(τ + k_{\ t} σ) \times b_{0}^{a_{0}} \times Γ (y_{t} - m_{t} + a_{0})}{Γ (a_{0}) \times {(b_{0} + 1)}^{y_{t} - m_{t} + a_{0}}},

n_{r}

is the number of elements in

λ_{\ t}

which are equal to

λ_{r}

, and

k_{\ t}

is the number of distinct elements in

λ_{\ t}

. In this mixture, we suppressed the normalization constant that makes all weights add up to one.

Making the choice

a_{p + 1} = 1

, we have

α_{i} ∣ all others \sim TBeta (a_{i} + \sum_{t = p + 1}^{T} m_{i, t}, 1 + \sum_{t = p + 1}^{T} (y_{t - i} - m_{i, t}), 1 - \sum_{j \neq i} α_{j}),

for

i = 1, \dots, p

, in which TBeta denotes the right truncated Beta distribution with support

(0, 1 - \sum_{j \neq i}^{p} α_{j})

.

For the latent maturations, we find

\begin{matrix} p (m_{i, t} ∣ all others) & \propto \frac{1}{(m_{i, t})! (y_{t} - \sum_{j = 1}^{p} m_{j, t})! (y_{t - i} - m_{i, t})!} {(\frac{α_{i}}{λ_{t} (1 - α_{i})})}^{m_{i, t}} \\ \times I_{{0, 1, \dots, min {y_{t} - \sum_{j \neq i} m_{j, t}, y_{t - i}}}} (m_{i, t}) . \end{matrix}

To explore the posterior distribution of the model, we build a Gibbs sampler [15] using these full conditional distributions. Escobar and West [16] showed, in a similar context, that we can improve mixing by resampling simultaneously the values of all

λ_{t}

’s inside the same cluster at the end of each iteration of the Gibbs sampler. Letting

(λ_{1}^{*}, \dots, λ_{k}^{*})

be the k unique values among

(λ_{p + 1}, \dots, λ_{T})

, define the number of occupants of cluster j by

ν_{j} = \sum_{t = p + 1}^{T} I_{{λ_{j}^{*}}} (λ_{t})

, for

j = 1, \dots, k

. It follows that

λ_{j}^{*} ∣ all others \sim Gamma (a_{0} + \sum_{t = p + 1}^{T} (y_{t} - \sum_{i = 1}^{p} m_{i, t}) \cdot I_{{λ_{j}^{*}}} (λ_{t}), b_{0} + ν_{j}) .

for

j = 1, \dots, k

. At the end of each iteration of the Gibbs sampler, we update the values of all

λ_{t}

’s inside each cluster by the corresponding

λ_{j}^{*}

using this distribution.

5. Prior Sensitivity

As it is often the case for Bayesian models with nonparametric components, a choice of the prior parameters for the PY-INAR(p) model which yields robustness of the posterior distribution is nontrivial [17].

The first aspect to be considered is the fact that the base measure

G_{0}

plays a crucial role in the determination of the posterior distribution of the number of clusters K. This can be seen directly by inspecting the form of the full conditional distributions derived in Section 4. Recalling that

G_{0}

is a gamma distribution with mean

a_{0} / b_{0}

and variance

a_{0} / b_{0}^{2}

, from the full conditional distribution of

λ_{t}

one may note that the probability of generating, on each iteration of the Gibbs sampler, a value for

λ_{t}

anew from

G_{0}

is proportional to

\frac{(τ + k_{\ t} σ) \times b_{0}^{a_{0}} \times Γ (y_{t} - m_{t} + a_{0})}{Γ (a_{0}) {(b_{0} + 1)}^{y_{t} - m_{t} + a_{0}}} .

Therefore, supposing that all the other terms are fixed, if we concentrate the mass of

G_{0}

around zero by making

b_{0} \to \infty

, this probability decreases to zero. This is not problematic, because it is hardly the case that we want to make such a drastic choice for

G_{0}

. The behavior in the other direction is more revealing, since taking

b_{0} ↓ 0

, in order to spread the mass of

G_{0}

, also makes the limit of this probability to be zero. Due to this behavior, we need to establish a criterion to choose the hyperparameters of the base measure which avoids these extreme cases.

In our analysis, it is convenient to have a single hyperparameter regulating how the mass of

G_{0}

is spread over its support. For a given

λ_{\max} > 0

, we find numerically the values of

a_{0}

and

b_{0}

which minimize the Kullback-Leibler divergence between

G_{0}

and a uniform distribution on the interval

[0, λ_{max}]

. This Kullback-Leibler divergence can be computed explicitly as

- log λ_{\max} - a_{0} log b_{0} + log Γ (a_{0}) - (a_{0} - 1) (log λ_{\max} - 1) + \frac{b_{0} λ_{\max}}{2} .

In this new parameterization, our goal is to make a sensible choice for

λ_{\max}

. It is worth emphasizing that by this procedure we are not truncating the support of

G_{0}

, but only using the uniform distribution on the interval

[0, λ_{max}]

as a reference for our choice of the base measure hyperparameters

a_{0}

and

b_{0}

.

Our proposal to choose

λ_{\max}

goes as follows. We fix some value

0 \leq σ < 1

for the discount parameter and choose an integer

k_{0}

as the prior expectation of the number of clusters K, which, using the results at the end of Section 3, can be computed explicitly as

E [K] = \{\begin{matrix} τ \times (ψ (τ + T - p) - ψ (τ)) & if σ = 0; \\ ({(τ + σ)}_{T - p} / (σ \times {(τ + 1)}_{T - p - 1})) - τ / σ & if σ > 0, \end{matrix}

in which

ψ (x)

is the digamma function (see [6] for a derivation of this result). Next, we find the value of the concentration parameter

τ

by solving

E [K] = k_{0}

numerically. After this, for each

λ_{\max}

in a grid of values, we run the Gibbs sampler and compute the posterior expectation of the number of clusters

E [K ∣ y]

. Finally, in the corresponding graph, we look for the value of

λ_{\max}

located at the “elbow” of the curve, that is, the value of

λ_{\max}

at which the values of

E [K ∣ y]

level off.

6. Simulated Data

As an explicit example of the graphical criterion in action, we used the functional form of a first-order model with thinning parameter

α = 0.15

to simulate a time series of length

T = 1000

, for which the distribution of the innovations is a symmetric mixture of three Poisson distributions with parameters 1, 8, and 15. Figure 1 shows the formations of the elbows for two values of the discount parameter:

σ = 0.5

and

σ = 0.75

.

For the simulated time series, Figure 2, Figure 3, Figure 4 and Figure 5 display the behavior of the posterior distributions obtained using the elbow method for

(k_{0}, σ) \in {4, 10, 16, 30} \times {0, 0.25, 0.5, 0.75}

. These figures make the relation between the choice of the value of the discount parameter

σ

and the achieved robustness of the posterior distribution quite explicit: as we increase the value of the discount parameter

σ

, the posterior becomes insensitive to the choice of

k_{0}

. In particular, for

σ = 0.75

, the posterior mode is always near 3, which is the number of components used in the distribution of the innovations of the simulated time series.

Once we understand the influence of the prior parameters on the robustness of the posterior distribution, an interesting question is how to get a point estimate for the distribution of clusters, in the sense that each

λ_{t}

, for

t = p + 1, \dots, T

, would be assigned to one of the available clusters.

From the Gibbs sampler, we can easily get a Monte Carlo approximation for the probabilities

d_{r t} = Pr {λ_{r} \neq λ_{t} ∣ y}

, for

r, t = p + 1, \dots, T

. These probabilities define a dissimilarity matrix

D = (d_{r t})

among the innovation rates. Although D is not a distance matrix, we can use it as a starting point to represent the innovation rates in a two-dimensional Euclidean space using the technique of metric multidimensional scaling (see [18] for a general discussion). From this two-dimensional representation, we use hierarchical clustering techniques to build a dendrogram, which is appropriately cut in order to define three clusters, allowing us to assign a single cluster label to each innovation rate.

Table 1 displays the confusion matrix of this assignment, showing that

83 %

of the innovations were grouped correctly in the clusters which correspond to the mixture components used to simulate the time series.

7. Earthquake Data

In this section, we analyze a time series of yearly worldwide earthquakes events of substantial magnitude (equal or greater than 7 points on the Richter scale) from 1900 to 2018 (http://www.usgs.gov/natural-hazards/earthquake-hazards/earthquakes).

The forecasting performances of the INAR(p) and the PY-INAR(p) models are compared using a cross-validation procedure in which the models are trained with data ranging from the beginning of the time series up to a certain time, and predictions are made for epochs outside this training range.

Using this cross-validation procedure, we trained the INAR(p) and the PY-INAR(p) models with orders

p = 1, 2, and 3

, and made one-step-ahead predictions. Table 2 shows the out-of-sample mean absolute errors (MAE) for the INAR(p) and the PY-INAR(p) models. In this table, the MAE’s are computed predicting the counts for the last 36 months. For the three model orders, the PY-INAR(p) model yields a smaller MAE than the original INAR(p) model.

Author Contributions

Theoretical development: H.G., A.L., H.F.L., P.C.M.F, I.P. Software development: H.G., P.C.M.F. All authors have read and agreed to the published version of the manuscript.

Funding

Helton Graziadei and Hedibert F. Lopes thank FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo) for financial support through grants numbers 2017/10096-6 and 2017/22914-5. Antonio Lijoi and Igor Prünster are partially supported by MIUR, PRIN Project 2015SNS29B.

Conflicts of Interest

The authors declare no conflict of interest.

References

McKenzie, E. Some simple models for discrete variate time series. J. Am. Water Resour. Assoc. 1985, 21, 645–650. [Google Scholar] [CrossRef]
Al-Osh, M.; Alzaid, A. First-order integer-valued autoregressive (INAR(1)) process: Distributional and regression properties. Stat. Neerl. 1988, 42, 53–61. [Google Scholar] [CrossRef]
Weiß, C. An Introduction to Discrete-Valued Time Series; John Wiley & Sons: Hoboken, NJ, USA, 2018. [Google Scholar]
Du, J.G.; Li, Y. The integer-valued autoregressive (INAR(p)) model. J. Time Ser. Anal. 1991, 12, 129–142. [Google Scholar] [CrossRef]
Neal, P.; Kypraios, T. Exact Bayesian inference via data augmentation. Stat. Comput. 2015, 25, 333–347. [Google Scholar] [CrossRef] [Green Version]
Pitman, J. Combinatorial stochastic processes. Technical Report 621; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar] [CrossRef] [Green Version]
Dion, J.; Gauthier, G.; Latour, A. Branching processes with immigration and integer-valued time series. Serdica Math. J. 1995, 21, 123–136. [Google Scholar]
Van Dyk, D.; Meng, X.L. The art of data augmentation. J. Comput. Graph. Stat. 2001, 10, 1–50. [Google Scholar] [CrossRef]
Tanner, M.; Wong, W. The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 1987, 82, 528–540. [Google Scholar] [CrossRef]
Ferguson, T. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1973, 1, 209–230. [Google Scholar] [CrossRef]
Schervish, M.J. Theory of Statistics; Springer: Berlin/Heidelberg, Germany, 1995; pp. 52–60. [Google Scholar]
Hjort, N.; Holmes, C.; Müller, P.; Walker, S. Bayesian Nonparametrics; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
Lijoi, A.; Mena, R.H.; Prünster, I. Bayesian nonparametric estimation of the probability of discovering new species. Biometrika 2007, 94, 769–786. [Google Scholar] [CrossRef]
Hamilton, J. Time Series Analysis; Princeton University Press: Princeton, NJ, USA, 1994; Volume 2, pp. 43–71. [Google Scholar]
Gamerman, D.; Lopes, H. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference; Chapman & Hall/CRC: Boca Raton, FL, USA, 2006. [Google Scholar]
Escobar, M.; West, M. Computing nonparametric hierarchical models. In Practical Nonparametric and Semiparametric Bayesian Statistics; Dey, D., Müller, P., Sinha, D., Eds.; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar] [CrossRef]
Canale, A.; Prünster, I. Robustifying Bayesian nonparametric mixtures for count data. Biometrics 2017, 73, 174–184. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Friedman, J.; Hastie, T.; Tibshirani, R. The Elements of Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2009; pp. 570–572. [Google Scholar]

Figure 1. Formation of the elbows for

σ = 0.5

(left) and

σ = 0.75

(right). The red dotted lines indicate the chosen values of

λ_{\max}

.

Figure 1. Formation of the elbows for

σ = 0.5

(left) and

σ = 0.75

(right). The red dotted lines indicate the chosen values of

λ_{\max}

.

Figure 2. Posterior distributions of the number of clusters K for the simulated time series with

σ = 0

and

k_{0} = 4, 10, 16, 30

. The red dotted lines indicate the value of

k_{0}

.

Figure 2. Posterior distributions of the number of clusters K for the simulated time series with

σ = 0

and

k_{0} = 4, 10, 16, 30

. The red dotted lines indicate the value of

k_{0}

.

Figure 3. Posterior distributions of the number of clusters K for the simulated time series with

σ = 0.25

and

k_{0} = 4, 10, 16, 30

. The red dotted lines indicate the value of

k_{0}

.

Figure 3. Posterior distributions of the number of clusters K for the simulated time series with

σ = 0.25

and

k_{0} = 4, 10, 16, 30

. The red dotted lines indicate the value of

k_{0}

.

Figure 4. Posterior distributions of the number of clusters K for the simulated time series with

σ = 0.5

and

k_{0} = 4, 10, 16, 30

. The red dotted lines indicate the value of

k_{0}

.

Figure 4. Posterior distributions of the number of clusters K for the simulated time series with

σ = 0.5

and

k_{0} = 4, 10, 16, 30

. The red dotted lines indicate the value of

k_{0}

.

Figure 5. Posterior distributions of the number of clusters K for the simulated time series with

σ = 0.75

and

k_{0} = 4, 10, 16, 30

. The red dotted lines indicate the value of

k_{0}

.

Figure 5. Posterior distributions of the number of clusters K for the simulated time series with

σ = 0.75

and

k_{0} = 4, 10, 16, 30

. The red dotted lines indicate the value of

k_{0}

.

Table 1. Confusion matrix for the cluster assignments.

	True
Predicted	1	2	3
1	297	32	0
2	11	217	42
3	0	84	316

Table 2. Out-of-sample MAE’s for the INAR(p) and the PY-INAR(p) models, with orders

p = 1, 2, and 3

. The last column shows the relative variations of the MAE’s for the PY-INAR(p) models with respect to the corresponding MAE’s for the INAR(p) models.

Table 2. Out-of-sample MAE’s for the INAR(p) and the PY-INAR(p) models, with orders

p = 1, 2, and 3

. The last column shows the relative variations of the MAE’s for the PY-INAR(p) models with respect to the corresponding MAE’s for the INAR(p) models.

	INAR	PY-INAR	$Δ_{PY - INAR}$
$p = 1$	3.861	3.583	−0.072
$p = 2$	3.583	3.417	−0.046
$p = 3$	3.972	3.305	−0.202

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Graziadei, H.; Lijoi, A.; Lopes, H.F.; Marques F., P.C.; Prünster, I. Prior Sensitivity Analysis in a Semi-Parametric Integer-Valued Time Series Model. Entropy 2020, 22, 69. https://doi.org/10.3390/e22010069

AMA Style

Graziadei H, Lijoi A, Lopes HF, Marques F. PC, Prünster I. Prior Sensitivity Analysis in a Semi-Parametric Integer-Valued Time Series Model. Entropy. 2020; 22(1):69. https://doi.org/10.3390/e22010069

Chicago/Turabian Style

Graziadei, Helton, Antonio Lijoi, Hedibert F. Lopes, Paulo C. Marques F., and Igor Prünster. 2020. "Prior Sensitivity Analysis in a Semi-Parametric Integer-Valued Time Series Model" Entropy 22, no. 1: 69. https://doi.org/10.3390/e22010069

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prior Sensitivity Analysis in a Semi-Parametric Integer-Valued Time Series Model

Abstract

1. Introduction

2. A Generalization of the INAR(p) Model

3. Pitman–Yor Process

4. PY-INAR(p) Model

5. Prior Sensitivity

6. Simulated Data

7. Earthquake Data

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI