Next Article in Journal
A New Logic, a New Information Measure, and a New Information-Based Approach to Interpreting Quantum Mechanics
Previous Article in Journal
Effect of Longitudinal Fluctuations of 3D Weizsäcker–Williams Field on Pressure Isotropization of Glasma
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Circumstance-Driven Bivariate Integer-Valued Autoregressive Model

by
Huiqiao Wang
1,2,*,† and
Christian H. Weiß
1
1
Department of Mathematics and Statistics, Helmut Schmidt University, Holstenhofweg 85, 22043 Hamburg, Germany
2
Department of Statistics, Southwestern University of Finance and Economics, Chengdu 611130, China
*
Author to whom correspondence should be addressed.
Current address: Biogas Institute of Ministry of Agriculture and Rural Affairs, Chengdu 610041, China.
Entropy 2024, 26(2), 168; https://doi.org/10.3390/e26020168
Submission received: 20 December 2023 / Revised: 29 January 2024 / Accepted: 13 February 2024 / Published: 15 February 2024
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
The novel circumstance-driven bivariate integer-valued autoregressive (CuBINAR) model for non-stationary count time series is proposed. The non-stationarity of the bivariate count process is defined by a joint categorical sequence, which expresses the current state of the process. Additional cross-dependence can be generated via cross-dependent innovations. The model can also be equipped with a marginal bivariate Poisson distribution to make it suitable for low-count time series. Important stochastic properties of the new model are derived. The Yule–Walker and conditional maximum likelihood method are adopted to estimate the unknown parameters. The consistency of these estimators is established, and their finite-sample performance is investigated by a simulation study. The scope and application of the model are illustrated by a real-world data example on sales counts, where a soap product in different stores with a common circumstance factor is investigated.

1. Introduction

Integer-valued time series data are encountered in many fields in practice, such as epidemiology, insurance, finance, and quality control (see [1] for a comprehensive survey). There are many approaches to model such count data. One pioneering approach is to use a random thinning operator as a substitute of the multiplication in the traditional autoregressive (AR) model to construct an integer-valued autoregressive (INAR) model (see [2,3]). The first-order INAR (INAR(1)) model is defined as follows:
X t = α X t 1 + ε t ,
t N + , where α X t 1 is defined as α X t 1 = i = 1 X t 1 Y i , with { Y i } being a sequence of independent and identically distributed (i. i. d.) Bernoulli random variables with parameter α [ 0 , 1 ) . The innovations ε t are i. i. d. count random variables, i.e., having the range N 0 = { 0 , 1 , } , where the default choice is a Poisson distribution for ε t [3]. Many researchers have generalized the basic INAR(1) model to better fit real data. To handle overdispersion or zero-inflation features in data, the negative-binomial, geometric or zero-inflated Poisson distribution have been proposed to replace the Poisson distribution of innovation term ε t (see [1] for details and references). Also, different types of thinning operator have been proposed in the literature (see [4] for a survey). Other proposals generalize the model from the view of the model structure. Thyregod et al. [5] first proposed the self-exciting threshold (SET) integer-valued model, which is also studied in [6]. A comprehensive introduction to SET-INAR models can be found in [7].
The aforementioned articles focus on stationary count time series. To handle the non-stationary case, ref. [8] applied the difference method to non-stationary count data and introduced the signed binomial thinning operator to allow for negative values after differencing. Nastić et al. [9] constructed the random-environment INAR(1) model to characterize non-stationarity in integer-valued time series, where the parameters in the model are influenced by different states of the environment, the evolution of which is defined through a selection mechanism from a Markov chain. Laketa et al. [10] generalized this work to a p-th order model.
Nowadays, there is increasing interest in multivariate integer-valued time series models, where most contributions focus on the bivariate case. Such types of data are commonly encountered in real-world applications. For example, ref. [11] consider the number of daytime and nighttime accidents in a certain area, which are at distinct levels but present cross-correlation due to the same road conditions. Latour [12] first proposed a general multivariate INAR(1) model and proved the existence and relevant properties of the model. Further model properties have been studied by [13]. The bivariate INAR(1) model introduced by [11] is defined as
X t = A X t 1 + R t = α 1 0 0 α 2 X 1 , t 1 X 2 , t 1 + ε 1 , t ε 2 , t ,
where “∘” is the binomial thinning operator defined as in (1), with α 1 , α 2 [ 0 , 1 ) . ε i , t is the innovation term of the i-th series, i = 1 , 2 . The role of the A is the usual matrix multiplication, and it also keeps the properties of the binomial thinning operation. Regarding further research on bivariate INAR(1) models, we refer to the work of [14,15], while we refer to the work of [16,17,18] for research on bivariate integer-valued moving average (INMA(1)) models. In addition, ref. [19] introduced a bivariate model for integer-valued time series with a finite range of counts. Yu et al. [20] introduced the new bivariate random-coefficient integer-valued autoregressive (BRCINAR(1)) model to allow the coefficients to be random. Although the present article concentrates on thinning-based models for count time series (which allow us to specify the marginal distribution of ( X t ) ), it should also be briefly mentioned that different approaches have been proposed in the literature. Regression-type models for (bivariate) count time series (where the conditional distributions of X t | X t 1 , are specified rather than the marginal ones) have been proposed by, e.g., [21,22,23]. In contrast, ref. [24] derive a multivariate count time series model with Poisson marginal distributions from underlying multivariate Gaussian time series.
Non-stationarity is an important feature of real-world time series data, whether for one-dimensional or multi-dimensional data. The change in external factors may change the structure or level in the data. There has not been much work to explore the non-stationarity of bivariate count time series. One main problem is the distribution of the model, which becomes complicated with increasing dimension. We propose a new model to characterize the non-stationarity in bivariate integer-valued time series. Inspired by [9,25], we suppose the parameters in the model to be affected by the different states of the circumstance, to characterize the intrinsic nature of non-stationarity in the data. In contrast to [25], the novel model is able to incorporate additional cross-dependence, and it is also suitable for low-count time series having a bivariate Poisson marginal distribution (see Remark 3 for further details). In Section 2, we propose the new first-order circumstance-driven bivariate INAR (CuBINAR(1)) model, and establish its stochastic properties. Estimation methods and their asymptotic properties are discussed in Section 3. In Section 4, the performance of the estimators is evaluated by a simulation study. A real-data application is presented in Section 5. Summary and conclusions are given in Section 6.

2. Model Construction

In this section, we introduce the new non-stationary CuBINAR(1) model, where the bivariate count random variable X t at time t is not only influenced by X t 1 , but also by the underlying circumstance state s t as we define in (3).
Definition 1.
The CuBINAR(1) process ( X t ) with range N 0 2 is defined by the recursive scheme
X t ( s t ) = A X t 1 ( s t 1 ) + ε t ( s t , s t 1 ) .
The model can be rewritten in matrix form as
X 1 , t ( s t ) X 2 , t ( s t ) = α 1 0 0 α 2 X 1 , t 1 ( s t 1 ) X 2 , t 1 ( s t 1 ) + ε 1 , t ( s t , s t 1 ) ε 2 , t ( s t , s t 1 ) ,
i.e., the i-th component of the vector X t ( s t ) satisfies
X i , t ( s t ) = α i X i , t ( s t 1 ) + ε i , t ( s t , s t 1 ) , i = 1 , 2 , t = 2 , , n .
Here, s t represents the state at time t with possible values in S = { 1 , 2 , , S } , where S 2 is total number of states. X i , t ( s t ) is the t-th observation of the i-th series depending on the state s t , and ε i , t ( s t , s t 1 ) is the corresponding innovation term depending on the states s t and s t 1 . In addition, ε t ( s t , s t 1 ) is independent of A X t 1 ( s t 1 ) and X k ( s k ) for k < t , where the definition of the binomial thinning operator “∘” is given after (1).
Remark 1.
We assume the different states of the circumstance are already realized. In the simulation part, we first need to generate the sample path of the states. To characterize the variation of the states, we adopt a Markov chain to generate it: given the initial probability vector p 0 = ( p 1 , , p S ) and transition matrix P = p 11 p 1 S p S 1 p S S , the sample path of the states can be obtained. In real-data analysis, we first need to know the states of the observations. In the data example discussed in Section 5, the sequence of states is defined according to a possible sales promotion.
In the subsequent Proposition 1, we introduce the important special case of a CuBINAR(1) process having a bivariate Poisson marginal distribution (thus abbreviated as Poi-CuBINAR(1)). More precisely, ( X t ) is said to follow the Poi-CuBINAR(1) model if { X 1 , t , X 2 , t } BPoi ( λ 1 ( s t ) , λ 2 ( s t ) , ϕ ) for appropriately chosen parameter values (see Remark 2 below). Here, we use the same definition of the BPoi-distribution as in [11,13], i.e., the parameters of X BPoi ( λ 1 , λ 2 , ϕ ) are defined as the mean of X 1 , mean of X 2 , and covariance between X 1 , X 2 , respectively. So the probability generating function (PGF) of X would be given by
E [ a 1 X 1 · a 2 X 2 ] = exp { λ 1 ( a 1 1 ) + λ 2 ( a 2 1 ) + ϕ ( a 1 1 ) ( a 2 1 ) } .
It shall be shown that, in analogy to the univariate Poi-INAR(1) model [4], BPoi-observations are achieved by assuming BPoi-innovations.
Proposition 1.
Let ( X t ) be a CuBINAR(1) process according to Definition 1. Then, ( X t ) constitutes a Poi-CuBINAR(1) process with { X 1 , t , X 2 , t } BPoi ( λ 1 ( s t ) , λ 2 ( s t ) , ϕ ) if the distribution of the model’s innovation term satisfies
{ ε 1 , t ( s t , s t 1 ) , ε 2 , t ( s t , s t 1 ) } BPoi ( λ 1 ( s t ) λ 1 ( s t 1 ) α 1 , λ 2 ( s t ) λ 2 ( s t 1 ) α 2 , ϕ * ) ,
where ϕ * = ϕ ( 1 α 1 α 2 ) .
For the detailed proof, we refer to Appendix B. Note that for the derivation of Proposition 1, it is crucial that A is a diagonal matrix. While Definition 1 could generally be extended to a non-diagonal A , we would lose the marginal BPoi-property (see also [13] for analogous results in the stationary case). In fact, the components would then not follow univariate INAR(1) models any more.
Remark 2.
We must ensure the parameters of the BPoi ( λ 1 ( s t ) , λ 2 ( s t ) , ϕ ) -distribution in Proposition 1 are truly positive, i.e., λ i ( s ) λ i ( r ) α i ϕ + ϕ α 1 α 2 > 0 holds for i = 1 , 2 and r , s S at same time, where s and r are the realizations of s t and s t 1 , respectively. Hence, there are 2 · S 2 inequalities that need to be satisfied simultaneously.
Remark 3.
As already indicated in Section 1, the novel CuBINAR(1) model is constructed in a similar way as the bivariate “random environment INAR(1) model” proposed by [25], referred to as RE-BINAR(1) hereafter. But, there are also noteworthy differences between these two models. First, for the BRrNGINAR(1) model of [25], cross-correlation between the two series is solely caused by the common state while their innovation sequences are mutually independent. Our CuBINAR(1) model, by contrast, allows for additional cross-correlation being caused by the cross-correlated innovation term. For example, in case of the Poi-CuBINAR(1) model, the innovation term { ε 1 , t ( s t , s t 1 ) , ε 2 , t ( s t , s t 1 ) } stems from a bivariate Poisson distribution, also leading to a bivariate Poisson distribution for { X 1 , t ( s t ) , X 2 , t ( s t ) } (see Proposition 1). Then, choosing ϕ > 0 leads to additional cross-correlation, while mutually independent innovations series are included as the special case ϕ = 0 . Altogether, the user has more flexibility to fit the model to given time series data.
Second, ref. [25] construct their model based on the negative-binomial thinning operator and geometric marginal distributions, so the model is particularly useful for overdispersed counts. Our CuBINAR(1) model, by contrast, uses binomial thinnings. As discussed by [4], binomial thinnings can also be used to generate common overdispersed marginal distributions (including the geometric one). But, in addition, the equidispersed Poisson distribution is also possible, as it is often observed for low-counts time series. In the special case of the Poi-CuBINAR(1) model introduced in Proposition 1, the process is equipped with a marginal bivariate Poisson distribution. Altogether, we believe that our novel CuBINAR(1) model constitutes a valuable complement to existing models for non-stationary bivariate count time series.
The following proposition provides some (conditional) moment properties of the Poi-CuBINAR(1) model, which shall useful to obtain the Yule–Walker estimators.
Proposition 2.
Let ( X t ) be the Poi-CuBINAR(1) process according to Proposition 1. Let us denote the means of X i , t ( s t ) , X i , t 1 ( s t 1 ) , and ε i , t ( s t , s t 1 ) as λ i ( s t ) , λ i ( s t 1 ) , and μ i ( s t , s t 1 ) , respectively, for i = 1 , 2 . Then, the following assertions hold:
(i) 
E [ X i , t ( s t ) X i , t 1 ( s t 1 ) ] = α i · X i , t 1 ( s t 1 ) + μ i ( s t , s t 1 ) ;
E [ X i , t ( s t ) ] = λ i ( s t ) ;
(ii) 
V a r ( X i , t ( s t ) ) = λ i ( s t ) ;
c o v ( X i , t ( s t ) , X i , t 1 ( s t 1 ) ) = α i · λ i ( s t 1 ) ;
(iii) 
c o v ( X 1 , t ( s t ) , X 2 , t ( s t ) ) = ϕ ;
c o v ( X 1 , t ( s t ) , X 2 , t k ( s t k ) ) = α 1 k · ϕ .
For the proof of Proposition 2, see Appendix C.

3. Parameter Estimation

In this section, we consider the Yule–Walker (YW) method and the conditional maximum likelihood (CML) method to estimate the parameter values of the Poi-CuBINAR(1) model.

3.1. Yule–Walker Estimation

From now on, let us use the following notations, for i = 1 , 2 and r , s S :
μ i ( s ) = E [ X i , t ( s t ) s t = s ] , γ i i , 0 ( s ) = V a r ( X i t ( s t ) s t = s ) , γ 12 , 0 ( s ) = c o v ( X 1 , t ( s t ) , X 2 , t ( s t ) s t = s ) , γ i ( r , s ) = c o v ( X i , t ( s t ) , X i , t 1 ( s t 1 ) s t 1 = r , s t = s ) .
For the Poi-CuBINAR(1) model, the X i , t are Poisson-distributed, so μ i ( s ) is equal to γ i i , 0 ( s ) .
Given the realized states, the corresponding sample moments are as follows:
μ ^ i ( s ) = 1 n s t = 1 n X i , t ( s t ) 1 { s t = s } ,
γ ^ i j , 0 ( s ) = 1 n s t = 1 n ( X i , t ( s t ) μ ^ i ( s ) ) ( X j , t ( s t ) μ ^ j ( s ) ) 1 { s t = s } ,
γ ^ i ( r , s ) = 1 n r , s t = 2 n ( X i , t ( s t ) μ ^ i ( s ) ) ( X i , t 1 ( s t 1 ) μ ^ i ( r ) ) 1 { s t 1 = r , s t = s } .
In Equation (6), i = j leads to γ ^ i i , 0 ( s ) , which is the empirical conditional variance given the state s, and which estimates γ i i , 0 ( s ) . Otherwise, it equals the empirical conditional cross-covariance and thus estimates γ 12 , 0 ( s ) . n s = t = 1 n 1 { s t = s } is the sample size under state s, n r , s = t = 2 n 1 { s t 1 = r , s t = s } the one under the condition that the state at t equals s and that at t 1 equals r. 1 A denotes the indicator function, which is equal to 1 (0) if A is true (false).
Remark 4.
The parameters ϕ and α i can be expressed by the following equations:
ϕ = s = 1 S n s n · γ 12 , 0 ( s ) ,
α i = r = 1 S s = 1 S n r , s n · γ i ( r , s ) γ i i , 0 ( r ) ,
see Appendix D for the proof. Equations (8) and (9) can be used to define estimators of ϕ and α i , respectively.
Following Remark 4, we define the Yule–Walker estimators as follows:
α ^ i y w = r = 1 S s = 1 S n r , s n 1 γ ^ i ( r , s ) γ ^ i i , 0 ( r ) ,
λ ^ i ( s ) y w = μ ^ i ( s ) , s S ,
ϕ ^ y w = s = 1 S n s n γ ^ 12 , 0 ( s ) .
In next theorem, we prove that these Yule–Walker estimators are consistent.
Theorem 1.
The Yule–Walker estimators α ^ i y w , λ ^ i ( s ) y w and ϕ ^ y w , i = 1 , 2 , s S defined in Equations (10)–(12) are consistent.
The proof of Theorem 1 is provided by Appendix E.

3.2. Conditional Maximum Likelihood Estimation

From the first-order Markov property of the CuBINAR(1) model according to Definition 1, the conditional log-likelihood function is expressed as
L = t = 2 n log P ( X t ( s t ) X t 1 ( s t 1 ) ) ,
where P ( X t ( s t ) X t 1 ( s t 1 ) ) is the transition probability given the realized states. It has the following expression, where, for simplicity, we omit the states s t , s t 1 in parentheses after X t , X t 1 :
P ( X t X t 1 ) = P ( X 1 , t = x 1 , t , X 2 , t = x 2 , t X 1 , t 1 = x 1 , t 1 , X 2 , t 1 = x 2 , t 1 ) = k = 0 min ( x 1 , t , x 1 , t 1 ) l = 0 min ( x 2 , t , x 2 , t 1 ) x 1 , t 1 k · α 1 k · ( 1 α 1 ) x 1 , t 1 k · x 2 , t 1 l · α 2 l · ( 1 α 2 ) x 2 , t 1 l · f ( x 1 , t k , x 2 , t l ) ,
where f ( x 1 , t k , x 2 , t l ) = e ( λ 1 * + λ 2 * + ϕ * ) i = 0 min ( x 1 , t k , x 2 , t l ) λ 1 * ( x 1 , t k i ) λ 2 * ( x 2 , t l i ) ϕ * i ( x 1 , t k i ) ! ( x 2 , t l i ) ! i ! . Here, λ 1 * = λ 1 ( s t ) λ 1 ( s t 1 ) α 1 ϕ * , λ 2 * = λ 2 ( s t ) λ 2 ( s t 1 ) α 2 ϕ * , and ϕ * = ϕ ϕ α 1 α 2 . The CML estimates are computed by applying a numerical optimization routine to the log-likelihood function L . Based on the inverse of the numerical Hessian, one can then calculate approximate standard errors (see Remark B.2.1.2 in [1] for details.)

4. Simulation Study

In this section, we conduct a simulation study to evaluate the performance of the YW and CML estimators. The standard errors and biases of the estimators are calculated based on 10,000 replications, where the sample sizes are n { 300 , 900 , 1500 , 2100 } . Since we assume that the states of the circumstance are already realized, we first need to generate the circumstance states for each replication run, which is performed via the Markov chain approach described in Remark 1. In order to implement this, the initial probability vector and the transition probability matrix need to be specified.
Two different scenarios are considered. In the first scenario, we assume that the observations are driven by three different states of the circumstance, in which case we also consider two different types of transition probability matrix: one with initial probability vector p 0 ( 1 ) = ( 0.33 , 0.33 , 0.34 ) and transition probability matrix 0.4 0.3 0.3 0.3 0.4 0.3 0.3 0.3 0.4 , the other with p 0 ( 2 ) = ( 0.6 , 0.3 , 0.1 ) and 0.6 0.3 0.1 0.1 0.6 0.3 0.3 0.1 0.6 . Furthermore, three different parameter groups are considered:
(a)
( α 1 , α 2 , ϕ , ϕ * , λ 1 ( 1 ) , λ 1 ( 2 ) , λ 1 ( 3 ) , λ 2 ( 1 ) , λ 2 ( 2 ) , λ 3 ( 3 ) )
= ( 0.15 , 0.2 , 0.5 , 0.485 , 1 , 2 , 3 , 4 , 5 , 6 ) ;
(b)
( α 1 , α 2 , ϕ , ϕ * , λ 1 ( 1 ) , λ 1 ( 2 ) , λ 1 ( 3 ) , λ 2 ( 1 ) , λ 2 ( 2 ) , λ 3 ( 3 ) )
= ( 0.15 , 0.5 , 0.5 , 0.4625 , 1 , 2 , 3 , 4 , 5 , 6 ) ;
(c)
( α 1 , α 2 , ϕ , ϕ * , λ 1 ( 1 ) , λ 1 ( 2 ) , λ 1 ( 3 ) , λ 2 ( 1 ) , λ 2 ( 2 ) , λ 3 ( 3 ) )
= ( 0.4 , 0.25 , 1 , 0.9 , 3 , 4 , 5 , 2 , 3 , 4 ) .
In the second scenario, we suppose the circumstance to only have two states. Two different transition probability matrices are set, namely 0.4 0.6 0.6 0.4 and 0.8 0.2 0.2 0.8 , respectively, and the corresponding initial probability vectors are ( 0.5 , 0.5 ) and ( 0.3 , 0.7 ) .
(d)
( α 1 , α 2 , ϕ , ϕ * , λ 1 ( s 1 ) , λ 1 ( s 2 ) , λ 2 ( s 1 ) , λ 2 ( s 2 ) )
= ( 0.15 , 0.2 , 0.5 , 0.4625 , 1 , 3 , 3 , 5 ) ;
(e)
( α 1 , α 2 , ϕ , ϕ * , λ 1 ( s 1 ) , λ 1 ( s 2 ) , λ 2 ( s 1 ) , λ 2 ( s 2 ) )
= ( 0.25 , 0.5 , 1 , 0.875 , 2 , 4 , 5 , 7 ) .
Note that the α 1 , α 2 in all parameters groups satisfy the constraints in Remark 2.
The estimation results of the YW and CML estimators are presented in Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9 and Table A10 in Appendix A. It can be seen that if the sample size increases, all estimates converge to the true parameter values: the standard errors and biases decrease towards to 0, confirming the consistency of the estimators. Comparing the finite-sample properties among the YW and CML approaches, it becomes clear that the additional computational effort required for CML also leads to an improved performance. The CML estimates are less biased and less dispersed, where the additional gain in performance is particularly large for the dependence parameters α 1 , α 2 , and ϕ . So if possible, the CML approach should be preferred for parameter estimation.

5. A Real-Data Example

In this section, we analyze data referring to the number of sold items of a soap product (category “wsoa” in Dominick’s Data (https://www.chicagobooth.edu/research/kilts/datasets/dominicks, accessed on 10 November 2021) from the James M. Kilts Center, University of Chicago, Booth School of Business), which are counted on a weekly basis. We focus on the product “Level 200 Bath 6 BA” (code number 1111132012) in the soap category, and we consider the bivariate count time series for stores 54 and 88 in the period April 14, 1994, to 4 May 1995 (weeks 240–295, n = 56 ). The movement files also provide information on a sales promotion for the product. There are three types of promotion (labeled ‘B’, ‘C’ and ‘S’), which we summarized into one category, namely “sales promotion—yes or no” (yes: state 1; no: state 2). As the number of sold items might be affected by whether the product is under promotion or not, the action of promotion can be seen as a potential circumstance-driving factor. It is also worth mentioning that ref. [26] analyzed count data from the soap category (product “Zest White Water”), but using a Hidden-Markov model instead (i.e., they did not utilize the information about sales promotion).
The data are shown in Figure 1, with the counts in state 1 being plotted in gray color. The PACFs indicate an AR(1)-like autocorrelation structure, and we are also concerned with a substantial extent of cross-correlation. However, it can be seen that for both sub-series, the counts under sales promotion are at a higher level than without sales promotion. This indicates that the sales promotion is helpful to stimulate the number of sold items, and might thus be a relevant circumstance state. If computing the state-dependent sample means and variances (as required for YW estimation anyway, recall Section 3.1), one gets the values of Table 1. Comparing the means across the states, the visual impression from Figure 1 is confirmed, which is that counts are larger (in the mean) in state 1 (promotion) than in state 2. But, it is also interesting to compare the corresponding means and variances. Keeping in mind that the sub-series are rather short, such that variations are natural, the overall impression is that means and variances are reasonably close to each other, i.e., a model with state-dependent equidispersion could be suitable for the data. Together with the aforementioned substantial extent of cross-correlation, it is thus reasonable to try the novel CuBINAR(1) model for the sales counts data.
To evaluate the performance of our new model, we fit the CuBINAR(1) model to the data, and as competitors, we consider the classical (stationary) Poi-BINAR(1) model (2) of [11] on the one hand, and the RE-BINAR(1) model of [25] (recall Remark 3) on the other hand. Model fitting is performed via the CML approach, where the numerical optimization is initialized by the YW estimates (recall Section 3 and Section 4). The estimation results are summarized in Table 2. We also computed approximate standard errors as described in Section 3.2, but since the time series is rather short, the dependence parameters α ^ 1 , α ^ 2 , ϕ ^ are not significant on a 5%-level. As the estimates λ ^ i ( s t ) in the CuBINAR(1) model refer to the marginal mean of X i , t ( s t ) , we convert the means of the BINAR(1)’s innovation terms ε i , t into the marginal mean of X i , t in order to make the results comparable. That is, the estimates λ ^ 1 and λ ^ 2 of the BINAR(1) model represent the marginal means of X i , t . We can see that the CuBINAR(1)’s estimates λ ^ i ( 1 ) , i = 1 , 2 , of both sub-series under state 1 are larger than those under state 2, which confirms that the sales promotion increases the number of sold items. The corresponding λ ^ i of the BINAR(1) model are located between λ ^ i ( 1 ) and λ ^ i ( 2 ) , whereas the RE-BINAR(1)’s estimates differ quite a lot in some cases (and also deviate from the sample means in Table 1). It is also interesting to note that the values of α ^ 1 , α ^ 2 , ϕ ^ are smaller for CuBINAR(1) than for BINAR(1), which is reasonable as part of the CuBINAR(1)’s dependence is explained by the circumstance states. Furthermore, the estimate ϕ ^ is clearly larger than zero, i.e., the ability of the CuBINAR(1) model to incorporate additional cross-dependence (recall Remark 3) turns out to be beneficial in view of the substantial extent of cross-correlation observed in Figure 1.
To assess the performance of the fitted models, we first compare the root mean square errors (RMSEs) between the observations and predicted values. More precisely, the RMSE values are the square-roots of sums of the form t x i , t E [ X i , t ( s t ) | x i , t 1 ( s t 1 ) ] 2 , divided by the number of summands. Here, we distinguish two cases. The in-sample RMSE is computed by using the model fits of Table 2 and by summing for t = 2 , , n . For the out-of-sample RMSEs, we omitted the last 10 observations during model fitting, and then the sum was taken about t = n 9 , , n . Obviously, the RMSE performances of the CuBINAR(1) model are better than those of both the RE-BINAR(1) and BINAR(1) model.
In addition to the RMSE, we also adopt scoring rules and Akaike’s information criterion (AIC) for model choice. Regarding the scoring rules, we use the logarithmic score defined as
S l s ( p · x t 1 , x t ) : = ln p x t x t 1 .
The mean score 1 n 1 t = 2 n S l s ( p · x t 1 , x t ) is used to assess the overall performance of the model. Smaller score values indicate that the predictive distribution provided by the fitted model is in better agreement with the true predictive distribution, which implies a better fit of the model. Analogously, smaller values of the AIC indicate a better model. From Table 3, we recognize that both the AIC and the logarithmic score of the CuBINAR(1) model are smaller than those of the competing models. Altogether, the CuBINAR(1) model clearly outperforms both competitors. Regarding the BINAR(1) model, our newly proposed model can better fit the sales count data by utilizing the dependence to the underlying circumstance. The superior performance compared to the RE-BINAR(1) model can be explained from two types of sample properties noted in the beginning of this section. First, the data exhibit notable cross-correlation, but only the CuBINAR(1) model has an additional cross-dependence parameter. Second, conditioned on the different states, the sales counts are close to equidispersion, which is accounted for by the CuBINAR(1)’s Poisson distributions. The RE-BINAR(1) model with its geometric distributions, by contrast, is designed for strongly overdispersed data, but which does not apply to the sales counts data.
While CuBINAR(1) model performs best among the candidate models, it remains to assess its overall model adequacy. First, we analyzed the corresponding standardized Pearson residuals defined by
x i , t E [ X i , t ( s t ) | x i , t 1 ( s t 1 ) ] V a r ( X i , t ( s t ) | x i , t 1 ( s t 1 ) ) for i = 1 , 2 and t = 2 , 3 ,
A summary of results is provided by Figure 2. As explained in Section 2.4 of [1], the residuals of an adequate model should have a mean close to zero, a variance close to one, and they should not be autocorrelated. From Figure 2, we conclude that these criteria are satisfied in good approximation. It is also worth noting that there exist no significant cross-correlations between the residuals series. We also computed the PIT histograms for the fitted CuBINAR(1) model, as these are another common approach for checking the model adequacy (see Section 2.4 in [1]). But, since the sample size is rather short, the PIT histograms in Figure 3 look a bit “spiky”. Nevertheless, they exhibit no systematic deviation from uniformity, such as a (inverse) U-shape. Therefore, they do not contradict to the fitted CuBINAR(1) model. Altogether, our novel CuBINAR(1) model appears to adequately describe the bivariate sales counts data.

6. Conclusions

In this paper, we proposed the new circumstance-driven bivariate INAR(1) model, which can be applied to bivariate count time series that have different marginal means caused by an underlying circumstance factor. Important stochastic properties of the new model were discussed. We applied and analyzed the Yule–Walker and conditional maximum likelihood method to estimate the unknown parameter values. The consistency of the estimators was also confirmed by our simulation study where the estimation results converge quickly to the true parameter values with increasing sample size. For the presented real-data application on sales counts, our new model outperforms the ordinary BINAR(1) model. As a possible direction for future research, we suggest equipping models for multivariate count time series with a self-exciting threshold mechanism, similar to the recent work by [27]. Another important topic would be the case where the states cannot be observed (latent states, in analogy to the Hidden-Markov model [26]). Then, the CuBINAR(1)’s model definition and estimation approaches need to be adapted, which should be performed in future research.

Author Contributions

H.W. and C.H.W. both contributed to the theoretical analysis and simulation study, and they both performed the validation. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here: https://www.chicagobooth.edu/research/kilts/datasets/dominicks, accessed on 10 November 2021.

Acknowledgments

The authors thank the three referees for their useful comments on an earlier draft of this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Tabulated Simulation Results for Section 4

Table A1. Parameter estimation under three states; see Section 4.
Table A1. Parameter estimation under three states; see Section 4.
Transition Matrix 1:    0.4 0.3 0.3 0.3 0.4 0.3 0.3 0.3 0.4
( α 1 , α 2 , ϕ , ϕ * , λ 1 ( 1 ) , λ 1 ( 2 ) , λ 1 ( 3 ) , λ 2 ( 1 ) , λ 2 ( 2 ) , λ 3 ( 3 ) ) = ( 0.15 , 0.2 , 0.5 , 0.485 , 1 , 2 , 3 , 4 , 5 , 6 )
Yule–Walker
n α 1 α 2 ϕ λ 1 ( 1 ) λ 1 ( 2 ) λ 1 ( 3 ) λ 2 ( 1 ) λ 2 ( 2 ) λ 2 ( 3 )
3000.14480.19130.49561.00161.99833.00003.99945.00046.0016
sd0.06470.06360.19700.10700.15150.18550.21850.24750.2705
bias−0.0052−0.0087−0.00440.0016−0.00170.0000−0.00060.00040.0016
9000.14750.19710.49910.99981.99793.00053.99924.99886.0012
sd0.03950.03700.11630.06130.08680.10690.12830.14250.1548
bias−0.0025−0.0029−0.0009−0.0002−0.00210.0005−0.0008−0.00120.0012
15000.14810.19870.49891.00012.00123.00143.99935.00096.0023
sd0.03080.02920.08930.04750.06800.08270.09700.10940.1192
bias−0.0019−0.0013−0.00110.00010.00120.0014−0.00070.00090.0023
21000.14870.19880.49930.99981.99943.00034.00015.00106.0009
sd0.02580.02470.07630.04020.05670.07020.08220.09190.1004
bias−0.0013−0.0012−0.0007−0.0002−0.00060.00030.00010.00100.0009
Conditional Maximum Likelihood
n α 1 α 2 ϕ λ 1 ( 1 ) λ 1 ( 2 ) λ 1 ( 3 ) λ 2 ( 1 ) λ 2 ( 2 ) λ 2 ( 3 )
3000.15190.19840.49490.99971.99943.00173.99865.00086.0011
sd0.05190.05480.15080.10420.14900.18340.21330.24140.2652
bias0.0019−0.00160.0099−0.0003−0.00060.0017−0.00140.00080.0011
9000.15020.19910.48860.99941.99783.00123.99944.99846.0014
sd0.02940.03060.08330.05900.08500.10450.12490.13890.1515
bias0.0002−0.00090.0036−0.0006−0.00220.0012−0.0006−0.00160.0014
15000.14990.20010.48720.99962.00163.00153.99915.00126.0023
sd0.02250.02410.06400.04540.06630.08130.09420.10670.1163
bias−0.00010.00010.0022−0.00040.00160.0015−0.00090.00120.0023
21000.15000.19980.48630.99951.99963.00064.00005.00096.0012
sd0.01870.02010.05410.03880.05550.06860.08000.08960.0974
bias0.0000−0.00020.0013−0.0005−0.00040.00060.00000.00090.0012
Table A2. Parameter estimation under three states; see Section 4.
Table A2. Parameter estimation under three states; see Section 4.
Transition Matrix 1:    0.4 0.3 0.3 0.3 0.4 0.3 0.3 0.3 0.4
( α 1 , α 2 , ϕ , ϕ * , λ 1 ( 1 ) , λ 1 ( 2 ) , λ 1 ( 3 ) , λ 2 ( 1 ) , λ 2 ( 2 ) , λ 3 ( 3 ) ) = ( 0.15 , 0.5 , 0.5 , 0.4625 , 1 , 2 , 3 , 4 , 5 , 6 )
Yule–Walker
n α 1 α 2 ϕ λ 1 ( 1 ) λ 1 ( 2 ) λ 1 ( 3 ) λ 2 ( 1 ) λ 2 ( 2 ) λ 2 ( 3 )
3000.14400.48280.49490.99962.00033.00163.99774.99446.0038
sd0.06420.09250.20130.10590.15060.18480.26180.29710.3273
bias−0.0060−0.0172−0.0051−0.00040.00030.0016−0.0023−0.00560.0038
9000.14740.49470.49810.99932.00172.99913.99785.00015.9993
sd0.03960.05400.12090.06120.08720.10640.15160.17020.1882
bias−0.0026−0.0053−0.0019−0.00070.0017−0.0009−0.00220.0001−0.0007
15000.14840.49560.49780.99962.00063.00143.99935.00146.0018
sd0.03000.04130.09440.04770.06760.08330.11710.13200.1435
bias−0.0016−0.0044−0.0022−0.00040.00060.0014−0.00070.00140.0018
21000.14900.49740.49970.99981.99912.99924.00125.00116.0006
sd0.02550.03550.07970.04070.05690.07020.10090.11140.1226
bias−0.0010−0.0026−0.0003−0.0002−0.0009−0.00080.00120.00110.0006
Conditional Maximum Likelihood
n α 1 α 2 ϕ * λ 1 ( 1 ) λ 1 ( 2 ) λ 1 ( 3 ) λ 2 ( 1 ) λ 2 ( 2 ) λ 2 ( 3 )
3000.15110.50040.47040.99732.00173.00343.99714.99806.0027
sd0.05120.03870.13150.10310.14840.18200.23600.26850.2935
bias0.00110.00040.0079−0.00270.00170.0034−0.0030−0.00200.0027
9000.15010.50040.46560.99882.00172.99993.99875.00025.9983
sd0.02890.02120.07290.05880.08530.10410.13550.15420.1683
bias0.00010.00040.0031−0.00120.0017−0.0001−0.00130.0002−0.0017
15000.15020.49990.46371.00021.99983.00033.99974.99975.9996
sd0.02190.01660.05620.04620.06610.08080.10470.11820.1296
bias0.0002−0.00010.00120.0002−0.00020.0003−0.0003−0.0003−0.0004
21000.14950.50000.46300.99921.99873.00104.00004.99986.0004
sd0.01920.01410.04770.03990.05760.07120.09210.10560.1133
bias−0.00050.00000.0005−0.0008−0.00130.00090.0000−0.00020.0004
Table A3. Parameter estimation under three states; see Section 4.
Table A3. Parameter estimation under three states; see Section 4.
Transition Matrix 1:    0.4 0.3 0.3 0.3 0.4 0.3 0.3 0.3 0.4
( α 1 , α 2 , ϕ , ϕ * , λ 1 ( 1 ) , λ 1 ( 2 ) , λ 1 ( 3 ) , λ 2 ( 1 ) , λ 2 ( 2 ) , λ 3 ( 3 ) ) = ( 0.4 , 0.25 , 1 , 0.9 , 3 , 4 , 5 , 2 , 3 , 4 )
Yule–Walker
n α 1 α 2 ϕ λ 1 ( 1 ) λ 1 ( 2 ) λ 1 ( 3 ) λ 2 ( 1 ) λ 2 ( 2 ) λ 2 ( 3 )
3000.38460.23920.98483.00394.00365.00401.99942.99873.9987
sd0.08010.06810.24210.21210.24710.27680.15820.19540.2278
bias−0.0154−0.0108−0.01520.00390.00360.0040−0.0006−0.0013−0.0013
9000.39470.24650.99583.00154.00025.00162.00052.99954.0022
sd0.04690.04030.13930.12250.14140.15600.09210.11270.1296
bias−0.0053−0.0035−0.00420.00150.00020.00160.0005−0.00050.0022
15000.39710.24850.99573.00024.00085.00091.99972.99993.9994
sd0.03670.03110.10870.09520.10950.12210.07110.08590.1009
bias−0.0029−0.0015−0.00430.00020.00080.0009−0.0003−0.0001−0.0006
21000.39800.24860.99723.00133.99944.99922.00092.99963.9993
sd0.03100.02630.09250.07980.09240.10240.06040.07370.0838
bias−0.0020−0.0014−0.00280.0013−0.0006−0.00080.0009−0.0004−0.0007
Conditional Maximum Likelihood
n α 1 α 2 ϕ * λ 1 ( 1 ) λ 1 ( 2 ) λ 1 ( 3 ) λ 2 ( 1 ) λ 2 ( 2 ) λ 2 ( 3 )
3000.40340.25020.91022.98644.00135.01111.98982.99974.0060
sd0.04450.05120.14900.20260.24160.26450.15670.19270.2260
bias0.00340.00020.0102−0.01360.00130.0111−0.0102−0.00030.0060
9000.40150.24980.90562.99584.00115.00591.99642.99994.0020
sd0.02470.02760.08160.11600.13720.15200.08860.11140.1295
bias0.0015−0.00020.0056−0.00420.00110.0059−0.0036−0.00010.0020
15000.40070.25040.90362.99914.00105.00582.00002.99954.0015
sd0.01900.02130.06410.08850.10530.11820.06760.08650.0995
bias0.00070.00040.0036−0.00090.00100.00580.0000−0.00050.0015
21000.40050.24990.90392.99804.00005.00211.99902.99984.0023
sd0.01570.01830.05400.07440.08980.09920.05720.07310.0841
bias0.0005−0.00010.0039−0.00200.00000.0021−0.0010−0.00020.0023
Table A4. Parameter estimation under three states; see Section 4.
Table A4. Parameter estimation under three states; see Section 4.
Transition Matrix 2:    0.6 0.3 0.1 0.1 0.6 0.3 0.3 0.1 0.6
( α 1 , α 2 , ϕ , ϕ * , λ 1 ( 1 ) , λ 1 ( 2 ) , λ 1 ( 3 ) , λ 2 ( 1 ) , λ 2 ( 2 ) , λ 3 ( 3 ) ) = ( 0.15 , 0.2 , 0.5 , 0.485 , 1 , 2 , 3 , 4 , 5 , 6 )
Yule–Walker
n α 1 α 2 ϕ λ 1 ( 1 ) λ 1 ( 2 ) λ 1 ( 3 ) λ 2 ( 1 ) λ 2 ( 2 ) λ 2 ( 3 )
3000.14190.18940.49421.00012.00122.99943.99895.00015.9982
sd0.06300.06480.19560.11070.15470.18960.22900.25470.2769
bias−0.0081−0.0106−0.00580.00010.0012−0.0006−0.00110.0001−0.0018
9000.14710.19690.49971.00002.00082.99983.99905.00095.9996
sd0.03800.03720.11460.06360.08860.10920.12990.14500.1605
bias−0.0029−0.0031−0.00030.00000.0008−0.0002−0.00100.0009−0.0004
15000.14820.19820.49721.00002.00153.00094.00104.99996.0002
sd0.02940.02870.08940.04920.06890.08510.10110.11460.1254
bias−0.0018−0.0018−0.00280.00000.00150.00090.0010−0.00010.0002
21000.14840.19850.50000.99972.00053.00104.00024.99926.0008
sd0.02500.02460.07610.04130.05930.07250.08520.09550.1036
bias−0.0016−0.00150.0000−0.00030.00050.00100.0002−0.00080.0008
Conditional Maximum Likelihood
n α 1 α 2 ϕ * λ 1 ( 1 ) λ 1 ( 2 ) λ 1 ( 3 ) λ 2 ( 1 ) λ 2 ( 2 ) λ 2 ( 3 )
3000.15040.19610.49530.99712.00273.00243.99745.00065.9996
sd0.05370.05590.15140.10810.15280.18670.22340.24990.2706
bias0.0004−0.00390.0103−0.00290.00270.0024−0.00260.0006−0.0004
9000.14920.19850.48710.99922.00083.00133.99954.99846.0016
sd0.02950.03170.08330.06140.08820.10940.12960.14120.1553
bias−0.0008−0.00150.0021−0.00080.00080.0013−0.0005−0.00160.0016
15000.15020.19970.48730.99961.99933.00054.00045.00026.0016
sd0.02300.02410.06470.04740.06870.08460.09960.11110.1211
bias0.0002−0.00030.0023−0.0004−0.00070.00050.00040.00020.0016
21000.14960.19960.48650.99972.00063.00094.00045.00025.9995
sd0.01920.02030.05400.04020.05740.07040.08340.09360.1017
bias−0.0004−0.00040.0015−0.00030.00060.00090.00040.0002−0.0005
Table A5. Parameter estimation under three states; see Section 4.
Table A5. Parameter estimation under three states; see Section 4.
Transition Matrix 2:    0.6 0.3 0.1 0.1 0.6 0.3 0.3 0.1 0.6
( α 1 , α 2 , ϕ , ϕ * , λ 1 ( 1 ) , λ 1 ( 2 ) , λ 1 ( 3 ) , λ 2 ( 1 ) , λ 2 ( 2 ) , λ 3 ( 3 ) ) = ( 0.15 , 0.5 , 0.5 , 0.4625 , 1 , 2 , 3 , 4 , 5 , 6 )
Yule–Walker
n α 1 α 2 ϕ λ 1 ( 1 ) λ 1 ( 2 ) λ 1 ( 3 ) λ 2 ( 1 ) λ 2 ( 2 ) λ 2 ( 3 )
3000.14260.47860.49320.99831.99892.99823.99645.00046.0066
sd0.06360.09150.20390.10930.15920.19070.27810.31750.3457
bias−0.0074−0.0214−0.0068−0.0017−0.0011−0.0018−0.00360.00040.0066
9000.14660.49320.49560.99951.99953.00103.99974.99925.9999
sd0.03820.05390.11980.06380.08930.11080.16020.18200.1986
bias−0.0034−0.0068−0.0044−0.0005−0.00050.0010−0.0003−0.0008−0.0001
15000.14820.49540.49791.00022.00053.00124.00015.00055.9997
sd0.02960.04210.09380.04890.06930.08390.12600.14150.1535
bias−0.0018−0.0046−0.00210.00020.00050.00120.00010.0005−0.0003
21000.14870.49710.49790.99992.00002.99884.00074.99866.0005
sd0.02480.03530.07880.04200.05930.07180.10590.11960.1290
bias−0.0013−0.0029−0.0021−0.00010.0000−0.00120.0007−0.00140.0005
Conditional Maximum Likelihood
n α 1 α 2 ϕ * λ 1 ( 1 ) λ 1 ( 2 ) λ 1 ( 3 ) λ 2 ( 1 ) λ 2 ( 2 ) λ 2 ( 3 )
3000.14990.50000.47110.99752.00193.00423.99475.00206.0053
sd0.05350.03960.13230.10690.15230.18980.24770.28050.3040
bias−0.00010.00000.0086−0.00250.00190.0042−0.00530.00200.0053
9000.14970.50030.46420.99892.00022.99983.99875.00156.0001
sd0.02970.02190.07420.06060.08790.10920.14020.16140.1723
bias−0.00030.00030.0017−0.00110.0002−0.0002−0.00130.00150.0001
15000.14980.49990.46320.99952.00043.00144.00025.00076.0029
sd0.02270.01680.05680.04740.06720.08280.11040.12330.1349
bias−0.0002−0.00010.0007−0.00050.00040.00140.00020.00070.0029
21000.14960.50000.46400.99981.99942.99904.00005.00056.0002
sd0.01920.01420.04830.03940.05750.07050.09390.10700.1155
bias−0.00040.00000.0015−0.0002−0.0006−0.00100.00000.00050.0002
Table A6. Parameter estimation under three states; see Section 4.
Table A6. Parameter estimation under three states; see Section 4.
Transition Matrix 2:    0.6 0.3 0.1 0.1 0.6 0.3 0.3 0.1 0.6
( α 1 , α 2 , ϕ , ϕ * , λ 1 ( 1 ) , λ 1 ( 2 ) , λ 1 ( 3 ) , λ 2 ( 1 ) , λ 2 ( 2 ) , λ 3 ( 3 ) ) = ( 0.4 , 0.25 , 1 , 0.9 , 3 , 4 , 5 , 2 , 3 , 4 )
Yule–Walker
n α 1 α 2 ϕ λ 1 ( 1 ) λ 1 ( 2 ) λ 1 ( 3 ) λ 2 ( 1 ) λ 2 ( 2 ) λ 2 ( 3 )
3000.38400.23660.98303.00224.00365.00121.99863.00064.0033
sd0.08010.06750.23920.22460.26110.29500.16450.20400.2350
bias−0.0160−0.0134−0.01700.00220.00360.0012−0.00140.00060.0033
9000.39480.24600.99482.99964.00045.00202.00002.99904.0024
sd0.04660.04050.13970.12990.15240.16810.09630.11760.1351
bias−0.0052−0.0040−0.0052−0.00040.00040.00200.0000−0.00100.0024
15000.39730.24730.99733.00084.00045.00041.99913.00074.0008
sd0.03660.03080.10860.10040.11530.12900.07360.08970.1041
bias−0.0027−0.0027−0.00270.00080.00040.0004−0.00090.00070.0008
21000.39730.24800.99683.00034.00125.00021.99943.00093.9995
sd0.03090.02620.09150.08460.09810.11000.06230.07660.0887
bias−0.0027−0.0020−0.00320.00030.00120.0002−0.00060.0009−0.0005
Conditional Maximum Likelihood
n α 1 α 2 ϕ * λ 1 ( 1 ) λ 1 ( 2 ) λ 1 ( 3 ) λ 2 ( 1 ) λ 2 ( 2 ) λ 2 ( 3 )
3000.40340.25020.91022.98644.00135.01111.98982.99974.0060
sd0.04450.05120.14900.20260.24160.26450.15670.19270.2260
bias0.00340.00020.0102−0.01360.00130.0111−0.0102−0.00030.0060
9000.40150.24950.90442.99504.00055.00601.99692.99964.0018
sd0.02480.02780.08230.11590.13730.15200.08830.11050.1273
bias0.0015−0.00050.0044−0.00500.00050.0060−0.0031−0.00040.0018
15000.40100.24950.90242.99764.00115.00441.99723.00064.0021
sd0.01860.02130.06340.08940.10540.11680.06830.08670.1000
bias0.0010−0.00050.0024−0.00240.00110.0044−0.00290.00060.0021
21000.40060.25020.90252.99763.99935.00231.99802.99923.9995
sd0.01590.01810.05440.07410.08990.09830.05660.07330.0835
bias0.00060.00020.0025−0.0024−0.00070.0023−0.0020−0.0008−0.0005
Table A7. Parameter estimation under two states; see Section 4.
Table A7. Parameter estimation under two states; see Section 4.
Transition Matrix 1:    0.4 0.6 0.6 0.4
( α 1 , α 2 , ϕ , ϕ * , λ 1 ( 1 ) , λ 1 ( 2 ) , λ 2 ( 1 ) , λ 2 ( 2 ) ) = ( 0.15 , 0.5 , 0.5 , 0.4625 , 1 , 3 , 3 , 5 )
Yule–Walker
α 1 α 2 ϕ λ 1 ( 1 ) λ 1 ( 2 ) λ 2 ( 1 ) λ 2 ( 2 )
3000.14870.48890.49970.99882.99962.99595.0047
sd0.06840.09380.18770.08630.15130.19620.2518
bias−0.0013−0.0111−0.0002−0.0012−0.0004−0.00410.0047
9000.14830.49600.49891.00003.00142.99784.9989
sd0.04180.05450.11260.05020.08720.11390.1461
bias−0.0017−0.0040−0.00100.00000.0014−0.0022−0.0011
15000.14900.49800.49941.00013.00192.99935.0025
sd0.03270.04290.08630.03900.06690.08780.1126
bias−0.0010−0.0020−0.00050.00010.0019−0.00070.0025
21000.14910.49830.49970.99982.99992.99945.0009
sd0.02760.03610.07390.03260.05760.07370.0949
bias−0.0009−0.0017−0.0002−0.0002−0.0001−0.00060.0009
Conditional Maximum Likelihood
α 1 α 2 ϕ * λ 1 ( 1 ) λ 1 ( 2 ) λ 2 ( 1 ) λ 2 ( 2 )
3000.15030.51590.48020.99553.00262.98785.0140
sd0.04600.03680.08670.08440.14980.18500.2362
bias0.00030.01590.0177−0.00450.0026−0.01220.0140
9000.14910.51120.47040.99872.99992.99225.0088
sd0.02460.02040.04860.04930.08500.10650.1370
bias−0.00090.01120.0079−0.0013−0.0001−0.00780.0088
15000.14810.50910.46760.99773.00122.99415.0087
sd0.01850.01640.03650.03830.06470.08210.1058
bias−0.00190.00910.0051−0.00230.0012−0.00590.0087
21000.14850.50780.46580.99873.00042.99185.0041
sd0.01550.01390.02900.03170.05780.07030.0909
bias−0.00150.00780.0033−0.00130.0004−0.00820.0041
Table A8. Parameter estimation under two states; see Section 4.
Table A8. Parameter estimation under two states; see Section 4.
Transition Matrix 1:    0.4 0.6 0.6 0.4
( α 1 , α 2 , ϕ , ϕ * , λ 1 ( 1 ) , λ 1 ( 2 ) , λ 2 ( 1 ) , λ 2 ( 2 ) ) = ( 0.25 , 0.5 , 1 , 0.875 , 2 , 4 , 5 , 7 )
Yule–Walker
α 1 α 2 ϕ λ 1 ( 1 ) λ 1 ( 2 ) λ 2 ( 1 ) λ 2 ( 2 )
3000.24240.48720.98631.99784.00184.99787.0024
sd0.07160.09270.29950.13090.18580.25240.2967
bias−0.0076−0.0128−0.0137−0.00220.0018−0.00220.0024
9000.24790.49600.99542.00023.99955.00187.0022
sd0.04140.05460.17320.07570.10670.14530.1713
bias−0.0021−0.0040−0.00460.0002−0.00050.00180.0022
15000.24890.49700.99671.99934.00054.99916.9997
sd0.03170.04180.13320.05890.08180.11320.1347
bias−0.0011−0.0030−0.0033−0.00070.0005−0.0009−0.0003
21000.24950.49820.99921.99993.99994.99966.9995
sd0.02710.03540.11390.05010.06960.09550.1138
bias−0.0005−0.0018−0.0008−0.0001−0.0001−0.0004−0.0005
Conditional Maximum Likelihood
α 1 α 2 ϕ * λ 1 ( 1 ) λ 1 ( 2 ) λ 2 ( 1 ) λ 2 ( 2 )
3000.25080.49880.88481.99744.00124.99677.0017
sd0.04470.03530.16920.12500.17600.23990.2843
bias0.0008−0.00120.0098−0.00260.0012−0.00330.0017
9000.25030.49970.87911.99984.00194.99917.0012
sd0.02450.02000.09370.07170.10240.13740.1640
bias0.0003−0.00030.0041−0.00020.0019−0.00090.0012
15000.24970.49930.87721.99874.00174.99646.9972
sd0.00480.01260.07620.029690.03480.00970.0202
bias−0.0002−0.00060.0023−0.00120.0017−0.0035−0.0028
21000.25020.49950.87691.99984.00024.99917.0001
sd0.00230.02030.01050.01820.02150.07260.0499
bias0.0002−0.00050.0019−0.00020.0002−0.00090.0001
Table A9. Parameter estimation under two states; see Section 4.
Table A9. Parameter estimation under two states; see Section 4.
Transition Matrix 2:    0.8 0.2 0.2 0.8
( α 1 , α 2 , ϕ , ϕ * , λ 1 ( 1 ) , λ 1 ( 2 ) , λ 2 ( 1 ) , λ 2 ( 2 ) ) = ( 0.15 , 0.5 , 0.5 , 0.4625 , 1 , 3 , 3 , 5 )
Yule–Walker
α 1 α 2 ϕ λ 1 ( 1 ) λ 1 ( 2 ) λ 2 ( 1 ) λ 2 ( 2 )
3000.14470.48240.49571.00082.99752.99544.9984
sd0.06390.09230.19080.09370.16100.22060.2863
bias−0.0053−0.0176−0.00420.0008−0.0025−0.0046−0.0016
9000.14720.49480.49840.99992.99862.99954.9985
sd0.03840.05430.11040.05290.09220.12730.1643
bias−0.0028−0.0052−0.0016−0.0001−0.0014−0.0005−0.0015
15000.14840.49660.49830.99963.00003.00185.0017
sd0.02980.04170.08580.04120.07170.09810.1273
bias−0.0016−0.0034−0.0016−0.00040.00000.00180.0017
21000.14920.49740.49970.99963.00052.99835.0005
sd0.02530.03540.07330.03480.06100.08350.1075
bias−0.0008−0.0026−0.0002−0.00040.0005−0.00170.0005
Conditional Maximum Likelihood
α 1 α 2 ϕ * λ 1 ( 1 ) λ 1 ( 2 ) λ 2 ( 1 ) λ 2 ( 2 )
3000.15080.51560.48430.99623.00382.98735.0166
sd0.04550.03710.08390.08430.14920.18700.2393
bias0.00080.01560.0218−0.00380.0038−0.01270.0166
9000.14920.51090.47200.99873.00322.99165.0075
sd0.02430.02010.04620.04900.08580.10770.1385
bias−0.00080.01090.0095−0.00130.0032−0.00840.0075
15000.14890.50910.46670.99913.00332.99395.0094
sd0.01850.01620.03330.03800.06560.08310.1061
bias−0.00110.00910.0042−0.00090.0033−0.00610.0094
21000.14880.50830.46510.99893.00122.99515.0066
sd0.01540.01420.02660.03190.05660.06980.0896
bias−0.00120.00830.0026−0.00110.0012−0.00490.0066
Table A10. Parameter estimation under two states; see Section 4.
Table A10. Parameter estimation under two states; see Section 4.
Transition Matrix 2:    0.8 0.2 0.2 0.8
( α 1 , α 2 , ϕ , ϕ * , λ 1 ( 1 ) , λ 1 ( 2 ) , λ 2 ( 1 ) , λ 2 ( 2 ) ) = ( 0.25 , 0.5 , 1 , 0.875 , 2 , 4 , 5 , 7 )
Yule–Walker
α 1 α 2 ϕ λ 1 ( 1 ) λ 1 ( 2 ) λ 2 ( 1 ) λ 2 ( 2 )
3000.24010.48410.98382.00014.00004.99636.9980
sd0.06910.09190.29690.14260.20230.28270.3425
bias−0.0099−0.0159−0.01620.00010.0000−0.0037−0.0020
9000.24630.49400.99401.99954.00024.99967.0014
sd0.03990.05390.17180.08210.11380.16340.1942
bias−0.0037−0.0060−0.0060−0.00050.0002−0.00040.0014
15000.24760.49640.99711.99964.00065.00017.0009
sd0.03100.04130.13400.06330.08990.12780.1515
bias−0.0024−0.0036−0.0029−0.00040.00060.00010.0009
21000.24860.49770.99831.99993.99994.99916.9987
sd0.02660.03530.11250.05340.07650.10690.1277
bias−0.0014−0.0023−0.0017−0.0001−0.0001−0.0009−0.0013
Conditional Maximum Likelihood
α 1 α 2 ϕ * λ 1 ( 1 ) λ 1 ( 2 ) λ 2 ( 1 ) λ 2 ( 2 )
3000.25020.49970.88641.99744.00254.99947.0021
sd0.04690.03620.17190.12710.18050.24060.2851
bias0.0002−0.00030.0114−0.00260.0025−0.00060.0021
9000.25030.49970.87952.00023.99965.00247.0019
sd0.02550.02030.09640.07330.10350.13890.1627
bias0.0003−0.00030.00450.0002−0.00040.00240.0019
15000.25030.49950.87621.99944.00044.99986.9992
sd0.01970.01570.07330.05660.07980.10780.1285
bias0.0003−0.00050.0012−0.00060.0004−0.0002−0.0008
21000.25020.49970.87641.99984.00014.99936.9999
sd0.01660.01330.06170.04850.06780.09070.1090
bias0.0002−0.00030.0014−0.00020.0001−0.0007−0.0001

Appendix B. Proof of Proposition 1

The bivariate PGF of { X 1 , t ( s t ) , X 2 , t ( s t ) } equals
E [ a 1 X 1 , t ( s t ) · a 2 X 2 , t ( s t ) ] = E [ a 1 α 1 X 1 , t 1 ( s t 1 ) + ε 1 , t ( s t , s t 1 ) · a 2 α 2 X 2 , t 1 ( s t 1 ) + ε 2 , t ( s t , s t 1 ) ] = E [ a 1 α 1 X 1 , t 1 ( s t 1 ) · a 1 ε 1 , t ( s t , s t 1 ) · a 2 α 2 X 1 , t 1 ( s t 1 ) · a 2 ε 2 , t ( s t , s t 1 ) ] = E [ ( 1 α 1 + α 1 a 1 ) X 1 , t 1 ( s t 1 ) · ( 1 α 2 + α 2 a 2 ) X 2 , t 1 ( s t 1 ) ] · E [ a 1 ε 1 , t ( s t , s t 1 ) · a 2 ε 2 , t ( s t , s t 1 ) ] = exp { λ 1 ( s t 1 ) · ( α 1 + α 1 a 1 ) + λ 2 ( s t 1 ) · ( α 2 + α 2 a 2 ) + ϕ ( α 1 + α 1 a 1 ) ( α 2 + α 2 a 2 ) } · E [ a 1 ε 1 , t ( s t , s t 1 ) · a 2 ε 2 , t ( s t , s t 1 ) ] .
According to the expression of the bivariate Poisson PGF, we know that
E [ a 1 X 1 , t ( s t ) · a 2 X 2 , t ( s t ) ] = exp { λ 1 ( s t ) ( a 1 1 ) + λ 2 ( s t ) ( a 2 1 ) + ϕ ( a 1 1 ) ( a 2 1 ) } .
For simplicity, we omit the states index in the parentheses. Thus, the PGF of { ε 1 , t , ε 2 , t } is derived as
E [ a 1 ε 1 , t 1 · a 2 ε 2 , t 1 ] = exp { λ 1 ( s t ) ( a 1 1 ) + λ 2 ( s t ) ( a 2 1 ) + ϕ ( a 1 1 ) ( a 2 1 ) λ 1 ( s t 1 ) ( α 1 + α 1 a 1 ) λ 2 ( s t 1 ) ( α 2 + α 2 a 2 ) ϕ ( α 1 + α 1 a 1 ) ( α 2 + α 2 a 2 ) } ,
which implies that the bivariate PGF of { ε 1 , t , ε 2 , t } is
E [ a 1 ε 1 , t 1 · a 2 ε 2 , t 1 ] = exp { ( λ 1 ( s t ) λ 1 ( s t 1 ) · α 1 ) · ( a 1 1 ) + ( λ 2 ( s t ) λ 2 ( s t 1 ) · α 2 ) · ( a 2 1 ) + ( ϕ ϕ · α 1 · α 2 ) ( a 1 1 ) ( a 2 1 ) } .
Thus, { ε 1 , t , ε 2 , t } BPoi ( λ 1 ( s t ) λ 1 ( s t 1 ) α 1 , λ 2 ( s t ) λ 2 ( s t 1 ) α 2 , ϕ ϕ α 1 α 2 ) . Finally, with ϕ * = ϕ ϕ α 1 α 2 , we write { ε 1 , t , ε 2 , t } BPoi ( λ 1 ( s t ) λ 1 ( s t 1 ) α 1 , λ 2 ( s t ) λ 2 ( s t 1 ) α 2 , ϕ * ) , which completes the proof of Proposition 1.

Appendix C. Proof of Proposition 2

The considered (conditional) means are computed as follows:
E [ X i , t ( s t ) X i , t 1 ( s t 1 ) ] = E [ α i X i , t 1 ( s t 1 ) + ε i , t ( s t , s t 1 ) X i , t 1 ( s t 1 ) ] = α i · X i , t 1 ( s t 1 ) + μ i ( s t , s t 1 ) ,
and
E [ X i , t ( s t ) ] = E [ E [ X i , t ( s t ) X i , t 1 ( s t 1 ) ] ] = E [ E [ α i X i , t 1 ( s t 1 ) + ε i , t ( s t , s t 1 ) X i , t 1 ( s t 1 ) ] ] = E [ α i · X i , t 1 ( s t 1 ) + μ i ( s t , s t 1 ) ] = α i · E [ X i , t 1 ( s t 1 ) ] + μ i ( s t , s t 1 ) = α i · λ i ( s t 1 ) + μ i ( s t , s t 1 ) .
From the bivariate Poisson distribution of { ε 1 , t ( s t , s t 1 ) , ε 2 , t ( s t , s t 1 ) } according to Proposition 1, we have μ i ( s t , s t 1 ) = λ i ( s t ) λ i ( s t 1 ) · α i . Thus,
E [ X i , t ( s t ) ] = α i · λ i ( s t 1 ) + λ i ( s t ) λ i ( s t 1 ) · α i = λ i ( s t ) .
From the Poisson’s equidispersion property, we obtain V a r ( X i , t ( s t ) ) = λ i ( s t ) . This is used to compute
c o v ( X i , t ( s t ) , X i , t 1 ( s t 1 ) ) = c o v ( α i X i , t 1 ( s t 1 ) + ε i , t ( s t , s t 1 ) , X i , t 1 ( s t 1 ) ) = c o v ( α i X i , t 1 ( s t 1 ) , X i , t 1 ( s t 1 ) ) + c o v ( X i , t 1 ( s t 1 ) , ε i , t ( s t , s t 1 ) ) = α i · V a r ( X i , t 1 ( s t 1 ) ) = α i · λ i ( s t 1 ) .
According to Proposition 1, we assume the marginal distribution of { X 1 , t ( s t ) , X 2 , t ( s t ) } to be BPoi ( λ 1 ( s t ) , λ 2 ( s t ) , ϕ ) , so
c o v ( X 1 , t ( s t ) , X 2 , t ( s t ) ) = ϕ .
Hence,
c o v ( X 1 , t ( s t ) , X 2 , t k ( s t k ) ) = c o v ( α 1 k X 1 , t k ( s t k ) + j = 0 k 1 α 1 j ε t j , X 2 , t k ( s t k ) ) = c o v ( α 1 k X 1 , t k ( s t k ) , X 2 , t k ( s t k ) ) + c o v ( j = 0 k 1 α 1 j ε t j , X 2 , t k ( s t k ) ) = c o v ( α 1 k X 1 , t k ( s t k ) , X 2 , t k ( s t k ) ) + 0 = α 1 k · c o v ( X 1 , t k ( s t k ) , X 2 , t k ( s t k ) ) = α 1 k · ϕ .
This completes the proof of Proposition 2.

Appendix D. Proof of Remark 4

We rewrite ϕ = c o v ( X 1 , t ( s t ) , X 2 , t ( s t ) ) as follows:
c o v ( X 1 , t ( s t ) , X 2 , t ( s t ) ) = E [ X 1 , t ( s t ) · X 2 , t ( s t ) ] E [ X 1 , t ( s t ) ] · E [ X 2 , t ( s t ) ] = s = 1 S E X 1 , t ( s t ) X 2 , t ( s t ) 1 { s t = s } s = 1 S E X 1 , t ( s t ) 1 { s t = s } · E X 2 , t ( s t ) 1 { s t = s } = s = 1 S { E [ X 1 , t ( s ) X 2 , t ( s ) ] 1 { s t = s } E [ X 1 , t ( s ) ] 1 { s t = s } · E [ X 2 , t ( s ) ] 1 { s t = s } } = s = 1 S { E [ X 1 , t ( s ) X 2 , t ( s ) ] E [ X 1 , t ( s ) ] · E [ X 2 , t ( s ) ] } 1 { s t = s } = s = 1 S c o v ( X 1 , t ( s ) , X 2 , t ( s ) ) 1 { s t = s } = s = 1 S c o v ( X 1 , t ( s t ) , X 2 , t ( s t ) s t = s ) 1 { s t = s } .
Summing over t on both sides, we have
n · ϕ = t = 1 n c o v ( X 1 , t ( s t ) , X 2 , t ( s t ) ) = t = 1 n s = 1 S c o v ( X 1 , t ( s t ) , X 2 , t ( s t ) s t = s ) 1 { s t = s } = t = 1 n s = 1 S γ 12 , 0 ( s ) 1 { s t = s } = s = 1 S t = 1 n γ 12 , 0 ( s ) 1 { s t = s } = s = 1 S n s · γ 12 , 0 ( s ) .
Thus, c o v ( X 1 , t ( s t ) , X 2 , t ( s t ) ) can be expressed as as in Equation (8).
According to Proposition 2(ii), the α i are expressed as follows:
α i = c o v ( X i , t ( s t ) , X i , t 1 ( s t 1 ) ) λ i ( s t 1 ) = r = 1 S s = 1 S c o v ( X i , t ( s t ) , X i , t 1 ( s t 1 ) s t 1 = r , s t = s ) 1 { s t 1 = r , s t = s } r = 1 S λ i ( s t 1 ) 1 { s t 1 = r } .
Summing over t on both sides, we have
n · α i = t = 1 n α i = t = 1 n r = 1 S s = 1 S c o v ( X i , t ( s t ) , X i , t 1 ( s t 1 ) s t 1 = r , s t = s ) 1 { s t 1 = r , s t = s } r = 1 S λ i ( s t 1 ) 1 { s t 1 = r } = t = 1 n r = 1 S s = 1 S γ i ( r , s ) 1 { s t = s , s t 1 = r } r = 1 S γ i i , 0 ( r ) 1 { s t 1 = r } = r = 1 S s = 1 S n r , s γ i ( r , s ) r = 1 S γ i i , 0 ( r ) , so α i = r = 1 S s = 1 S n r , s n γ i ( r , s ) γ i i , 0 ( r ) .
Hence, we obtain the expression in Equation (9), which completes the proof of Remark 4.

Appendix E. Proof of Theorem 1

To prove the consistency of the YW estimators, we argue from the view of asymptotically uncorrelated sequences. According to Equations (5) and (6), μ ^ i ( s ) only uses the sample collected under the same state s, which is { X i , t ( s ) 1 { s t = s } } . For the corresponding sub-sample from { X i , t ( s t ) } , the correlation coefficient is
ρ τ = c o r r ( X i , t τ ( s ) 1 { s t τ = s } , X i , t ( s ) 1 { s t = s } ) = α i τ , if s t τ = s , s t = s 0 , otherwise .
Thus,
τ = 0 ρ τ τ = 0 α i τ = 1 1 α i .
In addition to τ = 0 ρ τ < , we also know that V a r ( X i , t ( s t ) s t = s ) < for all t. Thus, according to Definition 3.55 in [28], we know that { X i , t ( s ) 1 { s t = s } } is an asymptotically uncorrelated sequence.
Regarding the expression of γ ^ i ( r , s ) , we obtain
γ ^ i ( r , s ) = 1 n r , s t = 2 n ( X i , t ( s t ) μ ^ i ( s ) ) ( X i , t 1 ( s t 1 ) μ ^ i ( r ) ) 1 { s t 1 = r , s t = s } = 1 n r , s t = 2 n X i , t ( s t ) X i , t 1 ( s t 1 ) 1 { s t 1 = r , s t = s } 1 n r , s t = 2 n X i , t ( s t ) μ ^ i ( r ) 1 { s t 1 = r , s t = s } 1 n r , s t = 2 n X i , t 1 ( s t 1 ) μ ^ i ( s ) 1 { s t 1 = r , s t = s } + 1 n r , s t = 2 n μ ^ i ( r ) μ ^ i ( s ) 1 { s t 1 = r , s t = s } = 1 n r , s t = 2 n X i , t ( s t ) X i , t 1 ( s t 1 ) 1 { s t 1 = r , s t = s } 1 n r , s μ ^ i ( r ) t = 2 n X i , t ( s t ) 1 { s t = s } 1 n r , s μ ^ i ( s ) t = 2 n X i , t 1 ( s t 1 ) 1 { s t 1 = r } + μ ^ i ( r ) μ ^ i ( s ) = 1 n r , s t = 2 n X i , t ( s t ) X i , t 1 ( s t 1 ) 1 { s t 1 = r , s t = s } μ ^ i ( r ) μ ^ i ( s ) μ ^ i ( r ) μ ^ i ( s ) + μ ^ i ( r ) μ ^ i ( s ) = 1 n r , s t = 2 n X i , t ( s t ) X i , t 1 ( s t 1 ) 1 { s t 1 = r , s t = s } μ ^ i ( r ) μ ^ i ( s ) .
Defining Y i , t = X i , t ( s t ) X i , t 1 ( s t 1 ) , our next step is to derive an explicit expression for E [ Y i , t · Y i , t + τ ] . Here, we omit the term ( s t , s t 1 ) after ε i , t for simplicity:
E [ Y i , t · Y i , t + τ ] = E [ X i , t 1 ( s t 1 ) · X i , t ( s t ) · X i , t + τ ( s t + τ ) · X i , t + τ 1 ( s t + τ 1 ) ] = E [ X i , t 1 ( s t 1 ) · X i , t ( s t ) · ( α i τ X i , t ( s t ) + j = 0 τ 1 α i j ε i , t + τ j ) · ( α i τ X i , t 1 ( s t 1 ) + q = 0 τ 1 α i q ε i , t + τ 1 q ) ] = E [ E [ X i , t 1 ( s t 1 ) · X i , t ( s t ) · ( α i τ X i , t ( s t ) + j = 0 τ 1 α i j ε i , t + τ j ) · ( α i τ X i , t 1 ( s t 1 ) + q = 0 τ 1 α i q ε i , t + τ 1 q ) F t ] ] = E [ X i , t 1 ( s t 1 ) · X i , t ( s t ) · E [ ( α i τ X i , t ( s t ) + j = 0 τ 1 α i j ε i , t + τ j ) · ( α i τ X i , t 1 ( s t 1 ) + q = 0 τ 1 α i q ε i , t + τ 1 q ) F t ] ] = E [ X i , t 1 ( s t 1 ) · X i , t ( s t ) · E [ α i τ X i , t ( s t ) · α i τ X i , t 1 ( s t 1 ) + α i τ X i , t ( s t ) · ( q = 0 τ 1 α i q ε i , t + τ 1 q ) + α i τ X i , t 1 ( s t 1 ) · ( j = 0 τ 1 α i j ε i , t + τ j ) + ( j = 0 τ 1 α i j ε i , t + τ j ) · ( q = 0 τ 1 α i q ε i , t + τ 1 q ) F t ] ] = E [ X i , t 1 ( s t 1 ) · X i , t ( s t ) · α i τ · X i , t ( s t ) · α i τ · X i , t 1 ( s t 1 ) + X i , t 1 ( s t 1 ) · X i , t ( s t ) · α i τ · X i , t ( s t ) · ( q = 0 τ 1 α i q · E [ ε i , t + τ 1 q ] ) + X i , t 1 ( s t 1 ) · X i , t ( s t ) · α i τ · X i , t 1 ( s t 1 ) · ( j = 0 τ 1 α i j · E [ ε i , t + τ j ] ) + X i , t 1 ( s t 1 ) · X i , t ( s t ) · ( j = 0 τ 1 α i j · E [ ε i , t + τ j ] ) · ( q = 0 τ 1 α i q · E [ ε i , t + τ 1 q ] ) ] = α i 2 τ · E [ X i , t 1 2 ( s t 1 ) · X i , t 2 ( s t ) ] + α i τ · E [ X i , t 2 ( s t ) · X i , t 1 ( s t 1 ) ] · ( q = 0 τ 1 α i q · E [ ε i , t + τ 1 q ] ) + α i τ · E [ X i , t 1 2 ( s t 1 ) · X i , t ( s t ) ] · ( j = 0 τ 1 α i j · E [ ε i , t + τ j ] ) + E [ X i , t ( s t ) · X i , t 1 ( s t 1 ) ] · ( q = 0 τ 1 α i q · E [ ε i , t + τ 1 q ] ) · ( j = 0 τ 1 α i j · E [ ε i , t + τ j ] ) .
As the Poisson distribution has existing moments, there are bounds M 4 , M ε i such that E [ X i , t 4 ( s t ) ] < M 4 < and E [ ε i , t ] < M ε i = m a x { λ i ( s ) , s = 1 , , S } . Then, we have
j = 0 τ 1 α i j · E [ ε i , t + τ j ] < M ε i · ( 1 α i τ 1 ) 1 α i , q = 0 τ 1 α i q · E [ ε i , t + τ 1 q ] < M ε i · ( 1 α i τ 1 ) 1 α i .
According to the Hölder inequality, we obtain
0 < E [ X i , t 1 2 ( s t 1 ) · X i , t 2 ( s t ) ] [ E X i , t 1 ( s t 1 ) 4 ] 1 2 · [ E X i , t ( s t ) 4 ] 1 2 M 4 4 , 0 < E [ X i , t 1 2 ( s t 1 ) · X i , t ( s t ) ] [ E X i , t 1 ( s t 1 ) 4 ] 1 2 · [ E X i , t ( s t ) 2 1 2 M 4 3 , 0 < E [ X i , t 1 ( s t 1 ) · X i , t 2 ( s t ) ] [ E X i , t 1 ( s t 1 ) 2 ] 1 2 · [ E X i , t ( s t ) 4 ] 1 2 M 4 3 , 0 < E [ X i , t ( s t ) · X i , t 1 ( s t 1 ) ] [ E X i , t 1 ( s t 1 ) 2 ] 1 2 · [ E X i , t ( s t ) 2 ] 1 2 M 4 2 .
In addition, E [ Y i , t ] = E [ X i , t ( s t ) · X i , t 1 ( s t 1 ) ] and
E [ Y i , t + τ ] = E X i , t + τ ( s t ) · X i , t 1 ( s t + τ 1 ) = E [ E [ X i , t + τ ( s t ) · X i , t 1 ( s t + τ 1 ) F t ] ] = E [ E [ ( α i τ X i , t ( s t ) + j = 0 τ 1 α i j ε i , t + τ j ) · ( α i τ X i , t 1 ( s t 1 ) + q = 0 τ 1 α i q ε i , t + τ 1 q ) F t ] ] = E [ E [ α i τ X i , t ( s t ) · α i τ X i , t 1 ( s t 1 ) + α i τ X i , t ( s t ) · ( q = 0 τ 1 α i q ε i , t + τ 1 q ) + α i τ X i , t 1 ( s t 1 ) · ( j = 0 τ 1 α i j ε i , t + τ j ) + ( j = 0 τ 1 α i j ε i , t + τ j ) · ( q = 0 τ 1 α i q ε i , t + τ 1 q ) F t ] ] = E [ α i τ · X i , t ( s t ) · α i τ · X i , t 1 ( s t 1 ) + α i τ · X i , t ( s t ) · ( q = 0 τ 1 α i q · E [ ε i , t + τ 1 q ] ) + α i τ · X i , t 1 ( s t 1 ) · ( j = 0 τ 1 α i j · E [ ε i , t + τ j ] ) + ( j = 0 τ 1 α i j · E [ ε i , t + τ j ] ) · ( q = 0 τ 1 α i q · E [ ε i , t + τ 1 q ] ) ] = α i 2 τ · E [ X i , t 1 ( s t 1 ) · X i , t ( s t ) ] + α i τ · E [ X i , t ( s t ) ] · ( q = 0 τ 1 α i q · E [ ε i , t + τ 1 q ] ) + α i τ · E [ X i , t 1 ( s t 1 ) ] · ( j = 0 τ 1 α i j · E [ ε i , t + τ j ] ) + ( q = 0 τ 1 α i q · E [ ε i , t + τ 1 q ] ) · ( j = 0 τ 1 α i j · E [ ε i , t + τ j ] )
Then,
c o v ( Y i , t , Y i , t + τ ) = E [ Y i , t · Y i , t + τ ] E [ Y i , t ] · E [ Y i , t + τ ] = { α i 2 τ · E [ X i , t 1 2 ( s t 1 ) · X i , t 2 ( s t ) ] + α i τ · E [ X i , t 1 2 ( s t 1 ) · X i , t ( s t ) ] · q = 0 τ 1 α i q · E [ ε i , t + τ 1 q ] + α i τ · E [ X i , t 2 ( s t ) · X i , t 1 ( s t 1 ) ] · j = 0 τ 1 α i j · E [ ε i , t + τ j ] + E [ X i , t ( s t ) · X i , t 1 ( s t 1 ) ] · q = 0 τ 1 α i q · E [ ε i , t + τ 1 q ] · j = 0 τ 1 α i j · E [ ε i , t + τ j ] } E [ X i , t ( s t ) · X i , t 1 ( s t 1 ) ] · { α i 2 τ · E [ X i , t 1 ( s t 1 ) · X i , t ( s t ) ] + α i τ · E [ X i , t ( s t ) ] · q = 0 τ 1 α i q · E [ ε i , t + τ 1 q ] + α i τ · E [ X i , t 1 ( s t 1 ) ] · j = 0 τ 1 α i j · E [ ε i , t + τ j ] + q = 0 τ 1 α i q · E [ ε i , t + τ 1 q ] · j = 0 τ 1 α i j · E [ ε i , t + τ j ] } = α i 2 τ · E [ X i , t 1 2 ( s t 1 ) · X i , t 2 ( s t ) ] E [ X i , t 1 ( s t 1 ) · X i , t ( s t ) ] 2 + α i τ · q = 0 τ 1 α i q · E [ ε i , t + τ 1 q ] { E [ X i , t 1 2 ( s t 1 ) · X i , t ( s t ) ] E [ X i , t ( s t ) ] · E [ X i , t 1 ( s t 1 ) · X i , t ( s t ) ] } + α i τ · j = 0 τ 1 α i j · E [ ε i , t + τ j ] { E [ X i , t 2 ( s t ) · X i , t 1 ( s t 1 ) ] E [ X i , t 1 ( s t 1 ) ] · E [ X i , t 1 ( s t 1 ) · X i , t ( s t ) ] } < α i 2 τ · E [ X i , t 1 2 ( s t 1 ) · X i , t 2 ( s t ) ] + α i τ · M ε i · ( 1 α i τ 1 ) 1 α i · E [ X i , t 1 2 ( s t 1 ) · X i , t ( s t ) ] + α i τ · M ε i · ( 1 α i τ 1 ) 1 α i · E [ X i , t 1 ( s t 1 ) · X i , t 2 ( s t ) ] < α i τ · M 4 4 + α i τ · M ε i · ( 1 α i τ 1 ) 1 α i · M 4 3 + α i τ · M ε i · ( 1 α i τ 1 ) 1 α i · M 4 3 .
Altogether, we have
c o v ( Y i , t , Y i , t + τ ) = E [ Y i , t · Y i , t + τ ] E [ Y i , t ] · E [ Y i , t + τ ] < α i τ · M 4 4 + 2 α i τ · M ε i · ( 1 α i τ 1 ) 1 α i · M 4 3 < α i τ · M 4 4 + 2 α i τ · M ε i · M 4 3 .
Thus,
τ = 0 ρ τ Y i = τ = 0 c o r r ( Y i , t , Y i , t + τ ) = τ = 0 c o v ( Y i , t , Y i , t + τ ) V a r ( Y i , t ) V a r ( Y i , t + τ ) < τ = 0 { α i τ · M 4 4 + 2 α i τ · M ε i · M 4 3 } V a r ( Y i , t ) V a r ( Y i , t + τ ) = M 4 4 · 1 1 α i + 2 M 4 3 · M ε i · 1 1 α i V a r ( Y i , t ) V a r ( Y i , t + τ ) = M 4 3 · 1 1 α i · ( M 4 4 + 2 M ε i ) V a r ( Y i , t ) V a r ( Y i , t + τ ) < .
It is obvious that V a r ( Y i , t ) < for all t. Hence, { Y i , t } are asymptotically uncorrelated sequences. Using Theorem 3.57 in [28], we have
1 n s t = 1 n X i , t ( s t ) 1 { s t = s } d E [ X i , t ( s t ) s t = s ]
which means that μ ^ i ( s ) d μ i ( s ) . Analogously,
1 n r , s t = 2 n X i , t ( s t ) X i , t 1 ( s t 1 ) 1 { s t 1 = r , s t = s } d E [ X i , t 1 ( s t 1 ) X i , t ( s t ) s t 1 = r , s t = s ]
which actually is γ ^ i ( r , s ) d γ i ( r , s ) . Thus, the consistency of the estimator λ ^ i ( s ) y w can be proved. The way of getting γ ^ i i , 0 ( r ) d γ i i , 0 ( r ) is similar to proof for μ ^ i ( s ) d μ i ( s ) , so we omit here.
According to the expression for α i in Remark 4, together with Slutsky’s Theorem, the consistency of the estimators α ^ i y w and ϕ ^ y w can be concluded, completing the proof of Theorem 1.

References

  1. Weiß, C. An Introduction to Discrete-Valued Time Series; Wiley: Chichester, UK, 2018. [Google Scholar]
  2. McKenzie, E. Some simple models for discrete variate time series. JAWRA J. Am. Water Resour. Assoc. 1985, 21, 645–650. [Google Scholar] [CrossRef]
  3. Al-Osh, M.; Alzaid, E. First-order integer-valued autoregressive (INAR(1)) process. J. Time Ser. Anal. 1987, 8, 314–324. [Google Scholar] [CrossRef]
  4. Weiß, C. Thinning operations for modeling time series of counts—A survey. AStA Adv. Stat. Anal. 2008, 92, 319–334. [Google Scholar] [CrossRef]
  5. Thyregod, P.; Carstensen, N.; Madsen, H.; Arnbjerg-Nielsen, K. Integer-valued autoregressive models for tipping bucket rainfall measurements. Environmetrics 1999, 10, 395–411. [Google Scholar] [CrossRef]
  6. Monteiro, M.; Scotto, M.G.; Pereira, I. Integer-valued self-exciting threshold autoregressive processes. Commun. Stat.-Theory Methods 2012, 41, 2717–2737. [Google Scholar] [CrossRef]
  7. Möller, T.; Weiß, C. Threshold models for integer-valued time series with infinite or finite range. In Stochastic Models, Statistics and Their Applications; Steland, A., Rafajłowicz, E., Szajowski, K., Eds.; Springer: Wrocław, Poland, 2015; pp. 327–334. [Google Scholar]
  8. Kim, H.; Park, Y. A non-stationary integer-valued autoregressive model. Stat. Pap. 2008, 49, 485–502. [Google Scholar] [CrossRef]
  9. Nastić, A.; Laketa, P.; Ristić, M. Random environment integer-valued autoregressive process. J. Time Ser. Anal. 2016, 37, 267–287. [Google Scholar] [CrossRef]
  10. Laketa, P.; Nastić, A.; Ristić, M. Generalized random environment INAR models of higher order. Mediterr. J. Math. 2016, 15, 9. [Google Scholar] [CrossRef]
  11. Pedeli, X.; Karlis, D. A bivariate INAR(1) process with application. Stat. Model. 2011, 11, 325–349. [Google Scholar] [CrossRef]
  12. Latour, A. The multivariate GINAR(p) process. Adv. Appl. Probab. 1997, 29, 228–248. [Google Scholar] [CrossRef]
  13. Pedeli, X.; Karlis, D. Some properties of multivariate INAR(1) processes. Comput. Stat. Data Anal. 2013, 67, 213–225. [Google Scholar] [CrossRef]
  14. Karlis, D.; Pedeli, X. Flexible Bivariate INAR(1) Processes Using Copulas. Commun. Stat.-Theory Methods 2013, 42, 723–740. [Google Scholar] [CrossRef]
  15. Santos, C.; Pereira, I.; Scotto, M. On the theory of periodic multivariate INAR processes. Stat. Pap. 2019, 69, 1291–1348. [Google Scholar]
  16. Khan, N.; Sunecher, Y.; Jowaheer, V. Modelling a non-stationary BINAR(1) Poisson process. J. Stat. Comput. Simul. 2016, 86, 3106–3126. [Google Scholar] [CrossRef]
  17. Sunecher, Y.; Khan, N.; Jowaheer, V. BINMA(1) model with COM-Poisson innovations: Estimation and application. Commun. Stat.-Simul. Comput. 2018, 49, 1631–1652. [Google Scholar] [CrossRef]
  18. Silva, I.; Silva, M.E.; Torres, C. Inference for bivariate integer-valued moving average models based on binomial thinning operation. J. Appl. Stat. 2020, 47, 2546–2564. [Google Scholar] [CrossRef] [PubMed]
  19. Scotto, M.; Weiß, C.; Silva, M.; Pereira, I. Bivariate binomial autoregressive models. J. Multivariate Anal. 2014, 125, 233–251. [Google Scholar] [CrossRef]
  20. Yu, M.; Wang, D.; Yang, K.; Liu, Y. Bivariate first-order random coefficient integer-valued autoregressive processes. J. Stat. Plan. Inference 2020, 204, 153–176. [Google Scholar] [CrossRef]
  21. Cui, Y.; Zhu, F. A new bivariate integer-valued GARCH model allowing for negative cross-correlation. Test 2018, 27, 428–452. [Google Scholar] [CrossRef]
  22. Silva, R.B.; Barreto-Souza, W. Flexible and robust mixed Poisson INGARCH models. J. Time Ser. Anal. 2019, 40, 788–814. [Google Scholar] [CrossRef]
  23. Piancastelli, L.S.C.; Barreto-Souza, W.; Ombao, H. Flexible bivariate INGARCH process with a broad range of contemporaneous correlation. J. Time Ser. Anal. 2023, 44, 206–222. [Google Scholar] [CrossRef]
  24. Livsey, J.; Lund, R.; Kechagias, S.; Pipiras, V. Multivariate integer-valued time series with flexible autocovariances and their application to major hurricane counts. Ann. Appl. Stat. 2018, 12, 408–431. [Google Scholar] [CrossRef]
  25. Popović, P.; Laketa, P.; Nasti, A. Forecasting with two generalized integer-valued autoregressive processes of order one in the mutual random environment. SORT-Stat. Oper. Res. Trans. 2019, 43, 355–384. [Google Scholar]
  26. MacDonald, I.; Zucchini, W. Hidden Markov models for discrete-valued time series. In Handbook of Discrete-Valued Time Series; Davis, R.A., Holan, S.H., Lund, R., Ravishanker, N., Eds.; CRC Press: Boca Raton, FL, USA, 2016; pp. 267–286. [Google Scholar]
  27. Contreras-Reyes, J.E. Information quantity evaluation of multivariate SETAR processes of order one and applications. Stat. Pap. 2023, in press. [CrossRef]
  28. White, H. Asymptotic Theory For Econometricians; Academic Press: London, UK, 2001. [Google Scholar]
Figure 1. Bivariate sales counts from Section 5: time series plots, sample PACFs, and cross-correlations of both sub-series. The dots in the time series plots are printed in gray (black) color if the state equals 1 (2).
Figure 1. Bivariate sales counts from Section 5: time series plots, sample PACFs, and cross-correlations of both sub-series. The dots in the time series plots are printed in gray (black) color if the state equals 1 (2).
Entropy 26 00168 g001
Figure 2. Bivariate sales counts from Section 5: sample means, variances, and ACFs of Pearson residuals with respect to fitted CuBINAR(1) model.
Figure 2. Bivariate sales counts from Section 5: sample means, variances, and ACFs of Pearson residuals with respect to fitted CuBINAR(1) model.
Entropy 26 00168 g002
Figure 3. Bivariate sales counts from Section 5: PIT histograms with respect to fitted CuBINAR(1) model.
Figure 3. Bivariate sales counts from Section 5: PIT histograms with respect to fitted CuBINAR(1) model.
Entropy 26 00168 g003
Table 1. State-dependent sample means and variances of sales counts data.
Table 1. State-dependent sample means and variances of sales counts data.
State 1State 2
MeanVarMeanVar
x 1 , t 3.1113.0462.1322.063
x 2 , t 4.5003.2061.5531.876
Table 2. CML parameter estimates of sales counts data.
Table 2. CML parameter estimates of sales counts data.
ModelCuBINAR(1)RE-BINAR(1)BINAR(1)
α ^ 1 0.265 α ^ 1 0.609 α ^ 1 0.288
α ^ 2 0.215 α ^ 2 0.619 α ^ 2 0.396
λ ^ 1 ( 1 ) 2.929 λ ^ 1 ( 1 ) 1.823 λ ^ 1 2.480
λ ^ 1 ( 2 ) 2.266 λ ^ 1 ( 2 ) 2.090
λ ^ 2 ( 1 ) 4.187 λ ^ 2 ( 1 ) 3.589 λ ^ 2 2.467
λ ^ 2 ( 2 ) 1.586 λ ^ 2 ( 2 ) 1.322
ϕ ^ 0.250 ϕ ^ 0.312
Table 3. AIC, logarithmic score, and RMSE of sales counts data.
Table 3. AIC, logarithmic score, and RMSE of sales counts data.
Model:AICLogarithmic Scorein-RMSEout-RMSE
CuBINAR ( 1 ) 399.83.507 x 1 , t : 1.4681.849
x 2 , t : 1.4561.864
RE BINAR ( 1 ) 427.83.780 x 1 , t : 1.6072.163
x 2 , t : 1.5662.139
BINAR ( 1 ) 418.13.710 x 1 , t : 1.5041.887
x 2 , t : 1.7612.408
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, H.; Weiß, C.H. The Circumstance-Driven Bivariate Integer-Valued Autoregressive Model. Entropy 2024, 26, 168. https://doi.org/10.3390/e26020168

AMA Style

Wang H, Weiß CH. The Circumstance-Driven Bivariate Integer-Valued Autoregressive Model. Entropy. 2024; 26(2):168. https://doi.org/10.3390/e26020168

Chicago/Turabian Style

Wang, Huiqiao, and Christian H. Weiß. 2024. "The Circumstance-Driven Bivariate Integer-Valued Autoregressive Model" Entropy 26, no. 2: 168. https://doi.org/10.3390/e26020168

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop