Next Article in Journal
Feature Correspondences Increase and Hybrid Terms Optimization Warp for Image Stitching
Next Article in Special Issue
A Conway–Maxwell–Poisson-Binomial AR(1) Model for Bounded Time Series Data
Previous Article in Journal
Prescribed Performance Back-Stepping Tracking Control for a Class of High-Order Nonlinear Systems via a Disturbance Observer
Previous Article in Special Issue
Adaptation of Partial Mutual Information from Mixed Embedding to Discrete-Valued Time Series
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Partial Autocorrelation Diagnostics for Count Time Series

by
Christian H. Weiß
1,*,
Boris Aleksandrov
1,
Maxime Faymonville
2 and
Carsten Jentsch
2
1
Department of Mathematics and Statistics, Helmut Schmidt University, 22043 Hamburg, Germany
2
Department of Statistics, TU Dortmund University, 44221 Dortmund, Germany
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(1), 105; https://doi.org/10.3390/e25010105
Submission received: 6 December 2022 / Revised: 23 December 2022 / Accepted: 28 December 2022 / Published: 4 January 2023
(This article belongs to the Special Issue Discrete-Valued Time Series)

Abstract

:
In a time series context, the study of the partial autocorrelation function (PACF) is helpful for model identification. Especially in the case of autoregressive (AR) models, it is widely used for order selection. During the last decades, the use of AR-type count processes, i.e., which also fulfil the Yule–Walker equations and thus provide the same PACF characterization as AR models, increased a lot. This motivates the use of the PACF test also for such count processes. By computing the sample PACF based on the raw data or the Pearson residuals, respectively, findings are usually evaluated based on well-known asymptotic results. However, the conditions for these asymptotics are generally not fulfilled for AR-type count processes, which deteriorates the performance of the PACF test in such cases. Thus, we present different implementations of the PACF test for AR-type count processes, which rely on several bootstrap schemes for count times series. We compare them in simulations with the asymptotic results, and we illustrate them with an application to a real-world data example.

1. Introduction

Autoregressive (AR) models for time series date back to Walker [1], Yule [2], and they assume the current observation of the considered process to be generated from its own past by a linear scheme. The ordinary pth-order AR-model for a real-valued process ( Z t ) t Z = { , 1 , 0 , 1 , } , abbreviated as AR ( p ) model, is defined by the recursive scheme
Z t = α 1 · Z t 1 + + α p · Z t p + ε t ( α p 0 ) ,
where the innovations ( ε t ) Z are independent and identically distributed (i. i. d.) real-valued random variables (rv), which are also assumed to be square-integrable (“white noise”). To ensure a (weakly) stationary and causal solution for the AR ( p ) recursion (1), the AR-parameters α 1 , , α p R have to be chosen such that the roots of the characteristic polynomial α ( z ) = 1 α 1 z α p z p are outside the unit circle. Then, if the innovations ( ε t ) Z follow a normal distribution, also the observations ( Z t ) Z are normal, leading to the Gaussian AR ( p ) process.
A characteristic property of the AR ( p ) process is given by the fact that its autocorrelation function (ACF), ρ ( h ) = C o r r [ Z t , Z t h ] with h N = { 1 , 2 , } and ρ ( 0 ) = 1 , satisfies the following set of linear equations:
ρ ( h ) = i = 1 p α i ρ | h i | for h = 1 , 2 ,
These Yule–Walker (YW) equations, in turn, give rise to define the partial autocorrelation function (PACF), ρ part ( h ) with time lags h N , in the following way (see Appendix A for further details): if R k : = ρ ( | i j | ) i , j = 1 , , k and r k : = ρ ( 1 ) , , ρ ( k ) R k for k = 1 , 2 , , and if a k R k denotes the solution of the equation R k a k = r k , then the PACF at lag k is defined as the last component of  a k , i.e., ρ part ( k ) : = a k , k . Hence, if the YW-equations (2) hold, it follows that
ρ part ( p ) = α p , ρ part ( h ) = 0 for all h > p .
This characteristic abrupt drop-down of the PACF towards zero after lag h = p is commonly used for model identification in practice, namely by inspecting the sample PACF for such a pattern, see the Box–Jenkins program dating back to Box & Jenkins [3]. Details on the PACF’s computation are summarized in Appendix A. There, we also provide a brief discussion on some equivalences between ACF, PACF, and the AR-coefficients, in the sense that the AR ( p ) model (1) is characterized equivalently by either α 1 , , α p , or ρ ( 1 ) , , ρ ( p ) , or ρ part ( 1 ) , , ρ part ( p ) .
Since the introduction of the ordinary AR ( p ) model, several other AR-type models have been proposed in the literature, not only for real-valued processes, but also for different types of quantitative processes such as count processes (and even for categorical processes), see the surveys by Holan et al. [4], Weiß [5]. In the present work, the focus is on (stationary and square-integrable) AR-type count processes ( X t ) Z , i.e., where the  X t have a quantitative range contained in N 0 = { 0 , 1 , } . Here, the AR ( p ) structure is implied by requiring the conditional mean at each time t to be linear in the last p observations [6], i.e.,
E [ X t | X t 1 , ] = α 0 + α 1 X t 1 + + α p X t p ,
because then, the YW-equations (2) immediately follow by using the law of total covariance. Note that one also has to require α 0 > 0 and α 1 , , α p 0 , as the counts  X t are non-negative rvs having a truly positive mean, computed as μ = α 0 / ( 1 α 1 α p ) . The considered class of count processes satisfying (4) covers many popular special cases, such as the INAR ( p ) model (integer-valued AR) by Du & Li [7], the INARCH ( p ) model (‘CH’ = conditional heteroscedasticity) by Ferland et al. [8], or their bounded-counts counterparts discussed in Kim et al. [9]; see Section 2 for further details. These count processes satisfying (4), however, are not truly linear processes: by contrast to (1), there is no linear relation between their observations.
As all these AR ( p ) -like count processes satisfy the YW-equations (2) and, thus, the PACF characterization (3), it is common practice to employ the sample PACF (SPACF) for model identification given a count time series X 1 , , X n . More precisely, one commonly computes the SPACF from X 1 , , X n , ρ ^ part ( h ) for h = 1 , 2 , , and checks for the pattern (3) among those SPACF values that are classified as being significantly different from zero. An analogous procedure is common during a later step of the Box–Jenkins program. After having fitted a model to the data, one commonly computes the Pearson residuals to check the model adequacy; see Weiß [5], Jung & Tremayne [10] as well as Section 2. While, for an adequate model fit, the Pearson residuals are expected to be uncorrelated, significant SPACF values computed thereof would indicate that the fitted model does not adequately capture the true dependence structure. In both cases, practitioners usually evaluate the significance of ρ ^ part ( h ) based on the following asymptotic result (see [11] (Theorem 8.1.2)):
n ρ ^ part ( h ) a N ( 0 , 1 ) for lags h p ,
i.e., the value ρ ^ part ( h ) is compared to the critical values ± z 1 α / 2 / n to test the null hypothesis of an AR ( h 1 ) process on level  α . Here, N ( μ , σ 2 ) denotes the normal distribution with mean  μ and variance  σ 2 , and z γ abbreviates the γ -quantile of N ( 0 , 1 ) . The aforementioned critical values are automatically plotted in SPACF plots by common statistical software, e.g., if one uses the command pacf in R. However, Theorem 8.1.2 in Brockwell & Davis [11] assumes that the SPACF is computed from a truly linear AR ( p ) process as in (1), which is neither the case for the aforementioned AR-type count processes, nor for the Pearson residuals computed thereof. Thus, it is not clear if the approximation (5) is asymptotically correct and sufficiently precise in finite samples. In fact, some special asymptotic results in Kim & Weiß [12], Mills & Seneta [13], see Section 3 for further details, as well as some simulation results for Pearson residuals in Weiß et al. [14] indicate that this is generally indeed not the case.
Therefore, several alternative ways of implementing the PACF-test are presented in Section 4, namely relying on different types of bootstrap schemes for count time series. The performance of these bootstrap implementations compared to the asymptotic ones is analyzed in a comprehensive simulation study. In Section 5, we start with the case where the SPACF is applied to the original count time series ( X t ) with the aim of identifying the AR model order. Afterwards in Section 6, we consider the case of applying the SPACF to the (non-integer) Pearson residuals computed based on a model fit, i.e., the SPACF is used for checking the model adequacy. Our findings are also illustrated by a real-data example on claims counts in Section 7. Here, the computations and simulations in Section 5, Section 6 and Section 7 have been performed with the software R, and the documented R-code for Section 7 is provided in the Supplementary Materials to this article. Further R-codes can be obtained from the corresponding author upon request. We conclude the article in Section 8.

2. On AR-Type Count Time Series and Pearson Residuals

Several (stationary and square-integrable) AR-type count processes ( X t ) Z , which also have a conditional linear mean according to (4), have been discussed in the literature. Most of these processes either follow a model recursion using so-called thinning operators (typically referred to as INAR models), or they are defined by specifying the conditional distribution of X t | X t 1 , together with condition (4), leading to INARCH models, see Weiß [5] for a survey. For this research, we focus on the most popular instance of these two classes, namely the INAR ( p ) model of Du & Li [7] on the one hand, and the INARCH ( p ) model of Ferland et al. [8] on the other hand.
The INAR ( p ) model of Du & Li [7] makes use of the binomial thinning operator “∘” introduced by Steutel & van Harn [15]. Having the parameter  α ( 0 ; 1 ) and being applied to a count rv X, it is defined by the conditional binomial distribution α X | X Bin ( X , α ) , where the boundary cases are included as 0 X = 0 and 1 X = X . Let ( ϵ t ) Z be square-integrable i. i. d. count rv, denote μ ϵ = E [ ϵ t ] and σ ϵ 2 = V [ ϵ t ] . Then, the INAR ( p ) process ( X t ) Z is defined by the recursion
X t = α 1 X t 1 + + α p X t p + ϵ t ,
where all thinnings are executed independently of each other, and where α : = j = 1 p α j < 1 is assumed to ensure a stationary solution. The INAR ( p ) process (6) constitutes a pth-order Markov process, the transition probabilities of which are a convolution between the p binomial distributions Bin ( X t 1 , α 1 ) , , Bin ( X t p , α p ) and the innovations’ distribution [16] (p. 469). The conditional mean satisfies (4) with α 0 = μ ϵ , and the conditional variance is given by
V [ X t | X t 1 , ] = σ ϵ 2 + j = 1 p α j ( 1 α j ) X t j ,
see Drost et al. [16] (p. 469). The default choice for  ϵ t in the literature is a Poisson (Poi) distribution (which is the integer counterpart to the normal distribution), leading to the Poi-INAR ( p ) process. However, any other (non-degenerate) count distribution for  ϵ t might be used as well, such as the negative-binomial (NB) distribution for increased dispersion, leading to the NB-INAR ( p ) process. In the case of such a parametric specification for  ϵ t , ones computes the moments  μ ϵ , σ ϵ 2 according to this model, and then the conditional mean and variance according to (4) and (7), respectively.
The INARCH ( p ) model of Ferland et al. [8] directly assumes the conditional linear mean (4) to hold, and then specifies the conditional distribution of X t | X t 1 , In Ferland et al. [8], the case of a conditional Poi-distribution is assumed, i.e., altogether
X t | X t 1 , Poi ( α 0 + α 1 X t 1 + + α p X t p ) ,
such that the conditional variance of this Poi-INARCH ( p ) process equals V [ X t | X t 1 , ] = E [ X t | X t 1 , ] . However, other choices for the conditional distribution of X t | X t 1 , have been investigated in the literature; see [5] (Section 4.2).
For parameter estimation, one commonly uses either simple method-of-moment (MM) estimators (i.e., derived from marginal sample moments and the sample ACF, also see Appendix A), or the more advanced conditional maximum likelihood (CML) estimators, which are computed by using a numerical optimization routine (see [5] (Section 2.2)). It should be noted that for the INAR ( p ) model, also a semi-parametric specification exists (where the innovations’ distribution is left unspecified). The corresponding semi-parametric CML estimator was analyzed by Drost et al. [16]; see also the small-sample refinement by Faymonville et al. [17]. It leads to non-parametric estimates for the probabilities p ϵ , k = P ( ϵ t = k ) for k between some finite bounds 0 l < u < (and p ϵ , k = 0 for k { l , , u } ), which can then be used for computing μ ϵ , σ ϵ 2 as required for the conditional moments (4) and (7). More precisely, the rth moment, r N , is given by E [ ϵ t r ] = k = l u k r p ϵ , k .
After having fitted a model to the count time series X 1 , , X n , a widely used approach for checking the model adequacy is to investigate the corresponding (standardized) Pearson residuals [5,10,14,18,19]. Let the parameters of the considered AR ( p ) -type model be comprised in the vector  θ , and let θ ^ denote the estimated parameters of the fitted model. Furthermore, let us write the conditional mean as E [ X t | X t 1 , ; θ ] and the conditional variance as V [ X t | X t 1 , ; θ ] to express their dependence on the actual parameter values. Then, the Pearson residuals are defined as
R t : = R t ( θ ^ ) = X t E X t | X t 1 , ; θ ^ V X t | X t 1 , ; θ ^ for t = p + 1 , , n .
If the fitted AR ( p ) -type model is adequate for X 1 , , X n , then R p + 1 , , R n should have a sample mean (variance) close to zero (one), and they should be uncorrelated. These necessary criteria are then used as adequacy checks. In the present research, our focus is on the SPACF computed from R p + 1 , , R n , which, for an adequate model fit, should not have values being significantly different from zero.

3. Some Asymptotic Results for the Sample PACF

The basic asymptotic result (5), which has been shown for the SPACF being computed from a true AR ( p ) process, has been extended in several directions. First, some refinements have been derived by Anderson [20,21] and further investigated by Kwan [22], who, however, assume the data-generating process (DGP) to be i. i. d. Gaussian, i.e., neither AR dependence nor count rvs are covered by their results. More precisely, Anderson [20] complements the asymptotic variance 1 / n in (5) by the following O ( n 2 ) approximation of the mean:
E ρ ^ part ( h ) = a 1 / n + O ( n 2 ) if h odd , 2 / n + O ( n 2 ) if h even .
While the Gaussian assumption is weakened by the statement that the result (10) “seems likely to have some validity for many non-Gaussian distributions” [20] (p. 406), the i. i. d.-assumption is not relaxed.
The O ( n 2 ) approximation in (10) is extended to a corresponding O ( n 3 ) approximation in Anderson [21] (pp. 565–566).
E ρ ^ part ( h ) = a 1 n h 1 n 2 + O ( n 3 ) if h odd , 2 n h / 2 2 n 2 + O ( n 3 ) if h even , V ρ ^ part ( h ) = a 1 n h + 2 n 2 + O ( n 3 ) .
While the O ( n 3 ) extension in (11) seems relevant only for very small sample sizes n, the alternating pattern for the mean in (10) might affect the performance of the normal approximation also for larger n.
Another extension of the basic asymptotic result (5) is due to Kim & Weiß [12], Mills & Seneta [13]. These authors consider two particular types of AR ( 1 ) count process, namely a Poi-INAR ( 1 ) and a binomial AR ( 1 ) process, respectively, and derive an O ( n 2 ) approximation of V ρ ^ part ( h ) for h 2 . While their exact formulae are not relevant for the present research, the crucial point is as follows: In both cases, the approximate variance is of the form ( 1 + c ) / n , where c is inverse proportional to the mean  μ , and also depends on the value of ρ ( 1 ) . Especially for low means  μ , the numerator 1 + c deviates notably from 1. Hence, the basic asymptotics (5) do not hold for these types of count process. An analogous conclusion can be drawn from the simulation results in Weiß et al. [14] (Table 1), where the rejection rate for the SPACF of the Pearson residuals (with CML-fitted Poi-INAR ( p ) model) under the basic asymptotic critical values (5) is analyzed. These rejection rates are often below the intended level, which indicates that (5) does not hold here.
These possible drawbacks of existing asymptotic results are illustrated by Figure 1. The upper panel refers to the mean of SPACF ( h ) , which is either computed from 10 4 simulated Poi-INAR ( 1 ) time series (black and dark grey bars), or according to the refined asymptotic result (11) (light grey bars). Note that the sample size n = 1000 was chosen rather large such that sample properties and (true) asymptotic properties should agree reasonably well. In Figure 1a, where the SPACF is computed from the raw counts ( X t ) , we omit plotting the mean at h = 1 as this would violate the graphic’s Y-range (recall that ρ part ( 1 ) = α ). From (a) and (b), we recognize that the simple asymptotics (5), where the mean of SPACF is approximated by zero, would be misleading in practice, because a negative bias with an oscillating pattern (odd vs. even lags) is observed. As a consequence, if testing the PACF based on (5) and thus ignoring the bias, we may get unreliable sizes, which is also later observed in our simulation studies. The alternating pattern of the bias in (a) and (b) is similar to the refined asymptotics (11). However, we do not observe an exact agreement to (11), as the simulated means seem to depend on the actual value of the AR-parameter  α . The effect of  α gets much stronger in (c), where even positive bias values for low h are observed, contradicting (11). This is caused by the use of the MM estimator, which is known to be increasingly biased with increasing  α [23]; a possible solution for practice could be to use a bias-corrected version of the MM estimator. The lower panel in Figure 1 shows the corresponding standard deviations (SDs). The strongest deviation between simulated and asymptotic results is observed for lag h = 1 , followed by lag h = 2 . In particular, for both types of Pearson residuals and both h = 1 , 2 , the asymptotic SD from (11) is too large (and the asymptotic SD according to (5) would even be larger) such that a corresponding PACF-test is expected to be conservative (which is later confirmed by our simulation study). Therefore, it seems advisable to look for other ways of implementing the PACF-test, neither relying on (5) nor (11). An approximation based on asymptotic results does not look promising in general, as we expect the asymptotics to highly depend on the actual DGP, recall the aforementioned results by Kim & Weiß [12], Mills & Seneta [13]. Thus, in what follows, our idea is to try out different types of bootstrap implementations, i.e., the true distribution of the SPACF is approximated by appropriate resampling schemes. This might also allow to account for the effect of the selected estimator when computing the Pearson residuals.

4. Bootstrap Approaches for the Sample PACF

Let ϑ denote the parameter of interest for the actual DGP ( Y t ) , and let ϑ ^ = T ( Y 1 , , Y n ) denote an estimate of it (in the present research, this parameter is the (S)PACF at some lag  h N ). Analogously, let ( Y t * ) denote a corresponding bootstrap DGP, and let ϑ ^ * = T ( Y 1 * , , Y n * ) be the estimator obtained from a bootstrap sample. If E * [ · ] denotes the expectation operator of the bootstrap DGP, that is, conditional on the data X 1 , , X n , then the centered bootstrap estimate is given by ϑ ^ c e n t * : = ϑ ^ * E * ϑ ^ * . A common approach for constructing a two-sided bootstrap confidence interval (CI) for  ϑ with confidence level 1 α ( 0 ; 1 ) is given by
ϑ ^ q 1 α / 2 ϑ ^ cent * ; ϑ ^ q α / 2 ϑ ^ cent * ,
where q γ ( · ) denotes the γ -quantile, see Hall [24]. The bootstrap CI (12) is used for testing the null hypothesis “ H 0 : ϑ = ϑ 0 ” on level  α by applying the following decision rule: reject  H 0 if ϑ 0 is not contained in the CI (12). This implies the equivalent decision rule to reject  H 0 if
ϑ ^ ϑ 0 < q α / 2 ϑ ^ cent * or ϑ ^ ϑ 0 > q 1 α / 2 ϑ ^ cent * .
In the present article, ϑ refers to the PACF at lag h, computed from either the original count process ( X t ) , or from the Pearson residuals ( R t ) obtained after model fitting. In both cases, the PACF at lag h is tested against the hypothetical value ϑ 0 = 0 , as it would be the case for an AR-type process of order < h .
If we apply the PACF to the original count time series X 1 , , X n , then the following setups are considered:
  • fully parametric setup: a fully parametric count AR ( p ) model with p 2 is fitted to the data and then used as the bootstrap DGP; the PACF at certain lags h > p is tested against zero. Here, we focus on the Poi-INAR ( p ) model, and we use the parametric INAR-bootstrap of Jentsch & Weiß [25].
  • semi-parametric setup: a semi-parametric count AR ( p ) model is fitted to the data [16] and then used as the bootstrap DGP; the PACF at lags h > p is tested against zero. Here, we focus on the INAR ( p ) model with unspecified innovations, and we use the semi-parametric INAR-bootstrap of Jentsch & Weiß [25].
  • non-parametric setup: we use the circular block bootstrap as considered by Politis & White [26], where an automatic block-length selection might be done by using the function b.star in R-package “np” (https://CRAN.R-project.org/package=np, accessed on 31 March 2022).
In case of an INAR ( p ) bootstrap DGP, the centering at lag h is done by the lag-h PACF corresponding to the fitted model, i.e., which satisfies the YW-equations (2) under estimated parameters, see Appendix A for computational details. In case of the non-parametric block bootstrap, the sample PACF at lag h is used for centering the bootstrap values.
If we apply the PACF to the Pearson residuals ( R t ) , then again (semi-)parametric setups are considered, where also model fitting is replicated based on the bootstrap time series, as well as the subsequent computation of Pearson residuals based on the bootstrap model fit. This time, a centering is not necessary. Non-parametric bootstrap schemes can be directly applied to the original Pearson residuals (without the need for model fitting during bootstrap replication). Under the null of model adequacy, we expect the available Pearson residuals to be uncorrelated. Thus, a first idea is to apply the classical Efron bootstrap [27], although this bootstrap scheme actually requires i. i. d. data. Therefore, as a second idea, we also apply the aforementioned block bootstrap to ( R t ) to account for possible non-linear dependencies.
Remark 1.
For implementing the (semi-)parametric INAR bootstraps, or for computing the Pearson residuals with respect to an INAR model, the model parameters have to be estimated. The following approaches are used for this purpose:
  • If the fully parametric Poi-INAR ( p ) model is fitted, we use either the MM estimator of θ = ( α 1 , , α p , μ ϵ ) , which is obtained by solving the mean equation μ = μ ϵ / ( 1 α 1 α p ) as well as the YW-equations (2) for h = 1 , , p in μ ϵ , α 1 , , α p and by plugging-in the sample counterparts for μ , ρ ( 1 ) , , ρ ( p ) , or the CML estimator of  θ . The latter is obtained by numerically maximizing the conditional log-likelihood function ( θ | x p , , x 1 ) = t = p + 1 T ln p ( x t | x t 1 , , x t p , θ ) , where the transition probabilities p ( x t | x t 1 , , x t p ) are computed by evaluating the convolution of the p thinnings’ binomial distributions and the innovations’ Poisson distribution, i.e., Bin ( x t 1 , α 1 ) Bin ( x t p , α p ) Poi ( μ ϵ ) .
  • If the semi-parametric Poi-INAR ( p ) model is fitted, then the innovations’ distribution is not specified. As a result, the parameter vector now equals θ s p = ( α 1 , , α p , p ϵ , 0 , p ϵ , 1 , ) , and we use the semi-parametric CML approach of Drost et al. [16] for estimation. In this case, the transition probabilities for the log-likelihood function ( θ s p | x p , , x 1 ) are obtained from the convolution Bin ( x t 1 , α 1 ) Bin ( x t p , α p ) G ϵ , where G ϵ expresses the unspecified innovations’ distribution with probability masses p ϵ , 0 , p ϵ , 1 ,

5. PACF Diagnostics for Raw Counts

In the first part of our simulation study, we analyze the performance of the asymptotic and (semi-)parametric implementations of PACF-tests if these are applied to the raw counts ( X t ) (the results of the non-parametric bootstrap schemes are discussed separately in Remark 2). We consider 1st- and 2nd-order AR-type DGPs, where the aim of applying the PACF-tests (nominal level 0.05) is the identification of the correct AR-order p. As the bootstrap versions of these tests are computationally very demanding (especially the semi-parametric INAR bootstrap), we use the warp-speed method of Giacomini et al. [28] for executing the simulations. This, in turn, allows us to use 10 4  replications throughout our simulation study. We also cross-checked that the achieved rejection rates are close to those obtained by a traditional bootstrap implementation with B = 500 bootstrap replications per simulation run. All simulations have been done with the software R, and R-codes can be obtained from the corresponding author upon request.
Table 1 shows the rejection rates of the PACF-tests for different types of AR ( 1 ) -like count DGP, recall Section 2. There, the PACFs are computed from a simulated count time series x 1 , , x n of length n, where the choice n = 100 ( n = 1000 ) represents the small (large) sample behaviour. The results refer to the medium autocorrelation case ρ ( 1 ) = 0.5 , but further results for ρ ( 1 ) { 0.25 , 0.75 } are summarized in Appendix B, see Table A1. Five implementations of the PACF-test are considered: using the simple asymptotic approximation (5) or the refined one (11) (recall Section 3), using the parametric Poi-INAR ( 1 ) bootstrap with either MM or CML estimates, and using the semi-parametric INAR ( 1 ) bootstrap with CML estimates (recall Section 4). If first looking at the block “Poi-INAR ( 1 ) DGP” in Table 1, we recognize that all implementations perform roughly the same, i.e., the rejection rate at lag h = 1 (expressing the power of the PACF-test) is close to 1, and the rejection rates at lags h 2 (expressing the size) are close to the 0.05-level. It should be noted, however, that for ρ ( 1 ) = 0.25 , see Table A1, the asymptotic implementations have notably less power at lag h = 1 . An analogous conclusion holds for the NB-INAR ( 1 ) block in Table 1, although now, the model behind the parametric Poi-INAR ( 1 ) bootstrap is misspecified. So the parametric bootstrap exhibits robustness properties in finite samples. In the third block, “Poi-INARCH ( 1 ) ”, also the semi-parametric bootstrap is misspecified, but again the rejection rates are robust for ρ ( 1 ) = 0.5 . For ρ ( 1 ) = 0.75 in Table A1, however, we observe size exceedences for lags h 2 , i.e., the misspecification of Poi-INARCH ( 1 ) as Poi-INAR ( 1 ) is not negligible anymore for this DGP. This is plausible in view of Remark 4.1.7 in Weiß [5], where it is shown that these models lead to different sample paths for high autocorrelation. Much more surprising, also both asymptotic implementations deteriorate (even more severely) for a Poi-INARCH ( 1 ) DGP with ρ ( 1 ) = 0.75 , see Table A1, i.e., we get too many false rejections in any case. Thus, if one anticipates that the data are generated by an INARCH process, a tailor-made parametric bootstrap implementation of the PACF-tests should be used.
Let us continue our performance analyses by turning to 2nd-order DGPs. In Table 2, the (semi-)parametric bootstrap schemes are still executed by (erroneously) assuming a 1st-order INAR DGP (like in Table 1), i.e., they are affected by a (further) source of model misspecification. But as seen from the rejection rates in Table 2, we still have good size ( h 3 ) and power values ( h = 1 , 2 ), comparable to those of the refined asymptotic implementation (11). By contrast, the simple asymptotic (5) leads to a clearly reduced power at lag h = 2 . Finally, in Table 3, the bootstrap schemes now correctly assume a 2nd-order INAR DGP, i.e., we only have the following misspecifications left: parametric Poi-INAR ( 2 ) bootstrap applied to NB-INAR ( 2 ) or Poi-INARCH ( 2 ) DGP, and semi-parametric INAR ( 2 ) bootstrap applied to Poi-INARCH ( 2 ) DGP. It can be seen that the parametric bootstrap using MM estimates as well as the semi-parametric bootstrap lead to improved power at lag h = 2 , whereas the parametric CML-setup even deteriorates (especially under Poi-INARCH ( 2 ) misspecification). The latter observation fits well to later results in Section 6, where the parametric bootstrap with CML estimates does again worse than its MM- or semi-CML-counterparts. This can be explained by the fact that for a fully parametric CML approach, model misspecification affects the estimates of all parameters simultaneously, while for the MM approach, for example, the estimation of mean and dependence parameters coincide across all three types of DGP. So it does not seem advisable to use a fully parametric bootstrap in combination with CML estimation for PACF diagnostics.
To sum up, if computing the SPACF from the raw counts ( X t ) , with the aim of identifying the AR-order of the given count DGP, the overally best performance is shown by the MM-based parametric and CML-based semi-parametric bootstrap implementation of the PACF-test, but also the refined asymptotic implementation relying on (11) does reasonably well. The latter is remarkable as these asymptotics are not the correct ones regarding the considered count DGPs (also recall the discussion of Figure 1), but it appears that their approximation quality is sufficient anyway. The simple asymptotic implementation (5), by contrast, as it is used by statistical software packages by default, leads to reduced power in some cases. From a practical point of view, as the additional benefit of the (semi-)parametric bootstrap schemes compared to the refined asymptotic implementation (11) is not that large, especially in view of the necessary computational effort, it seems advisable for practice to use (11) for doing the PACF-test. Recall that this recommendation refers to the case, where the SPACF is computed from the raw counts ( X t ) to identify the DGP’s AR-order. The case of applying the PACF-test to Pearson residuals for checking the model adequacy is analyzed in the following Section 6.

6. PACF Diagnostics for Pearson Residuals

While the raw counts’ SPACF is typically computed before model fitting (namely for identifying appropriate candidate models), the PACF-analysis of the Pearson residuals is relevant after model fitting, namely for checking the fitted model’s adequacy. Thus, the main difference of the simulations in the present section, compared to those of Section 5, is given by the fact that this time, we first fit a (Poi-)INAR model to the data, and then we apply the SPACF to the Pearson residuals computed thereof. For Poi-INAR model fitting, we again use either MM- or CML-estimation, and then we apply the asymptotic or corresponding parametric bootstrap implementations (like before, we use the warp-speed method). An exception is given by the semi-parametric CML estimation, as in this case, also the semi-parametric bootstrap is used for methodological consistency (and Pearson residuals are computed with respect to an unspecified INAR model). We also consider the same scenarios of model orders as before, i.e., 1st-order DGPs and INAR ( 1 ) -fit (Table 4 and Table 7), 2nd-order DGPs but still INAR ( 1 ) -fit (Table 5 and Table 8), and 2nd-order DGPs and INAR ( 2 ) -fit (Table 6 and Table 9). Recall that the fitted model is now used for both the computation of the Pearson residuals and the implementation of (semi-)parametric bootstrap schemes.
Table 4. Rejection rates of PACF-tests applied to Pearson residuals using MM estimates (DGPs with μ = 5 and ρ ( 1 ) = 0.5 ), where both residuals and parametric bootstrap rely on null of Poi-INAR ( 1 ) process.
Table 4. Rejection rates of PACF-tests applied to Pearson residuals using MM estimates (DGPs with μ = 5 and ρ ( 1 ) = 0.5 ), where both residuals and parametric bootstrap rely on null of Poi-INAR ( 1 ) process.
True DGP:Poi-INAR(1)NB-INAR(1), σ 2 μ = 1.5 Poi-INARCH(1)
PACF at Lag h = PACF at Lag h = PACF at Lag h =
Method n 123412341234
asym.1000.0000.0260.0400.0430.0010.0260.0380.0440.0000.0310.0410.044
(5)10000.0000.0290.0440.0530.0000.0300.0430.0520.0010.0330.0440.050
asym.1000.0010.0300.0460.0500.0010.0320.0420.0490.0010.0370.0490.049
(11)10000.0000.0300.0470.0480.0000.0300.0470.0510.0000.0320.0470.050
param.1000.0560.0510.0520.0480.0500.0490.0490.0450.0660.0500.0470.051
MM10000.0510.0530.0490.0450.0500.0510.0530.0510.0600.0460.0500.044
Let us start with the case of fitting a Poi-INAR model by MM estimation, see Table 4, Table 5 and Table 6. In Table 4 (1st-order models and DGPs; also see Table A3 in the Appendix B), we recognize that both asymptotic implementations lead to undersizing at lags h = 1 , 2 (particularly severe at h = 1 ). This is in close agreement to our conclusions drawn from Figure 1 as well as to the findings of Weiß et al. [14]. An analogous observation can be done in Table 6 (2nd-order models and DGPs), but now for lags h = 1 , 2 , 3 (particularly severe at h = 1 , 2 ). In both cases, however, the MM-based parametric bootstrap holds the nominal 0.05-level reasonably well. The drawback resulting from this undersizing gets clear in Table 5, where the wrong AR-order was selected during model fitting: the asymptotic implementations lead to a very low power for sample size n = 100 , implying that one will hardly recognize the inadequate model choice. Thus, if model assumptions are used anyway for computing the Pearson residuals, the asymptotic implementation should be avoided, but the model assumptions should also be utilized for executing the PACF-test by using the parametric bootstrap scheme. As a final remark, strictly speaking, we are always concerned with model misspecification if having an NB-INAR or Poi-INARCH DGP. However, all three DGPs per table have the same conditional mean and, thus, the same autocorrelation structure, only their conditional variances differ. Also the MM-estimates required for computing the conditional mean are identical across all models. Thus, it is not surprising that the rejection rates of the PACF-tests do not differ much among these three types of DGP (but again with slight oversizing for the Poi-INARCH DGPs).
Finally, we did the same simulations again, but using CML instead of MM estimation. Table 7 (as well as Table A5 in the Appendix B) refer to the case of both 1st-order models and 1st-order DGPs. In the first block, where the parametric Pearson residuals are computed by correctly assuming a Poi-INAR ( 1 ) DGP, we have again strong undersizing at lag 1 for the asymptotic implementation, but a close agreement to the nominal 0.05-level for the parametric bootstrap. The remaining blocks with NB-INAR ( 1 ) and Poi-INARCH ( 1 ) DGP, however, differ notably from the corresponding blocks in Table 7 and Table A5, respectively. This is plausible as the parametric CML approach for a misspecified model leads to misleading estimates for all parameters. While MM estimation leads to the same estimates for the dependence parameters across the three 1st-order models, these differ for parametric CML estimation. Therefore, we have high rejection rates especially at lag 1 (especially if using the parametric bootstrap), which is desirable on the one hand as the fitted model is indeed not adequate. On the other hand, we did not misspecify the (P)ACF structure (a 1st-order model is correct for all DGPs) but the actual data-generating mechanism, i.e., a user might draw the wrong conclusion from this rejection based on the lag-1 PACF. At this point, it is interesting to look at the semi-parametric model fit and bootstrap in Table 7. For both INAR ( 1 ) DGPs, the rejection rates are close to the 0.05-level, which is the desirable result as we are concerned with an adequate model fit. For the Poi-INARCH ( 1 ) DGP, by contrast, we get moderately increased rejection rates at lag 1, which again has to be assessed ambivalently: on the one hand, the fitted INAR ( 1 ) model is indeed not adequate, but on the other hand, the inadequacy does not refer to the autocorrelation structure.
Table 5. Like Table 4, but 2nd-order DGPs with α 2 = 0.2 .
Table 5. Like Table 4, but 2nd-order DGPs with α 2 = 0.2 .
True DGP:Poi-INAR(2)NB-INAR(2), σ 2 μ = 1.5 Poi-INARCH(2)
PACF at Lag h = PACF at Lag h = PACF at Lag h =
Method n 123412341234
asym.1000.0160.2640.0790.0430.0140.2660.0770.0430.0180.2750.0800.045
(5)10000.9750.9980.6480.1910.9580.9980.6250.1920.9660.9990.6420.191
asym.1000.0130.3590.1040.0610.0100.3560.1000.0590.0140.3690.1050.060
(11)10000.9720.9980.6620.2120.9540.9980.6400.2100.9620.9990.6550.210
param.1000.3950.3650.1020.0630.3690.3560.0940.0640.3960.3780.0920.069
MM10001.0000.9990.6640.2011.0000.9990.6460.2041.0000.9990.6620.211
Table 6. Rejection rates of PACF-tests applied to Pearson residuals using MM estimates (DGPs with μ = 5 , ρ ( 1 ) = 0.5 , and α 2 = 0.2 ), where both residuals and parametric bootstrap rely on null of Poi-INAR ( 2 ) process.
Table 6. Rejection rates of PACF-tests applied to Pearson residuals using MM estimates (DGPs with μ = 5 , ρ ( 1 ) = 0.5 , and α 2 = 0.2 ), where both residuals and parametric bootstrap rely on null of Poi-INAR ( 2 ) process.
True DGP:Poi-INAR(2)NB-INAR(2), σ 2 μ = 1.5 Poi-INARCH(2)
PACF at Lag h = PACF at Lag h = PACF at Lag h =
Method n 123412341234
asym.1000.0000.0010.0330.0340.0000.0020.0350.0380.0000.0020.0340.036
(5)10000.0000.0010.0360.0420.0000.0010.0330.0420.0000.0010.0360.040
asym.1000.0000.0030.0410.0430.0000.0030.0400.0440.0000.0020.0420.046
(11)10000.0000.0010.0360.0420.0000.0010.0350.0440.0000.0010.0360.045
param.1000.0500.0370.0440.0480.0530.0380.0470.0500.0520.0380.0590.050
MM10000.0490.0490.0460.0520.0590.0490.0520.0520.0670.0540.0550.053
Table 7. Rejection rates of PACF-tests applied to Pearson residuals using CML estimates (DGPs with μ = 5 and ρ ( 1 ) = 0.5 ), where both residuals and bootstrap rely on null of Poi-INAR ( 1 ) process (parametric bootstrap) or unspecified INAR ( 1 ) process (semi-parametric bootstrap), respectively.
Table 7. Rejection rates of PACF-tests applied to Pearson residuals using CML estimates (DGPs with μ = 5 and ρ ( 1 ) = 0.5 ), where both residuals and bootstrap rely on null of Poi-INAR ( 1 ) process (parametric bootstrap) or unspecified INAR ( 1 ) process (semi-parametric bootstrap), respectively.
True DGP:Poi-INAR(1)NB-INAR(1), σ 2 μ = 1.5 Poi-INARCH(1)
PACF at Lag h = PACF at Lag h = PACF at Lag h =
Method n 123412341234
asym.1000.0090.0350.0410.0460.0280.0360.0410.0390.0230.0380.0450.042
(5)10000.0080.0350.0450.0480.9020.1820.0690.0530.7450.1480.0720.051
asym.1000.0090.0410.0460.0480.0430.0550.0470.0500.0340.0530.0480.051
(11)10000.0090.0390.0460.0530.9090.1980.0770.0580.7530.1670.0720.055
param.1000.0520.0490.0490.0450.2380.0620.0480.0490.2090.0640.0530.050
CML10000.0490.0530.0490.0530.9930.2260.0750.0510.9630.1880.0820.048
semi-p.1000.0500.0510.0540.0530.0570.0480.0520.0440.0700.0520.0490.051
CML10000.0390.0530.0560.0480.0520.0530.0550.0490.2250.0670.0580.050
Table 8. Like Table 7, but 2nd-order DGPs with α 2 = 0.2 .
Table 8. Like Table 7, but 2nd-order DGPs with α 2 = 0.2 .
True DGP:Poi-INAR(2)NB-INAR(2), σ 2 μ = 1.5 Poi-INARCH(2)
PACF at Lag h = PACF at Lag h = PACF at Lag h =
Method n 123412341234
asym.1000.0260.3010.0840.0430.0010.4030.0900.0440.0010.4040.0920.045
(5)10000.5220.9990.6960.1920.0011.0000.7180.1780.0001.0000.7260.174
asym.1000.0200.3910.1100.0610.0020.5020.1140.0590.0010.4920.1250.064
(11)10000.5080.9990.7090.2120.0011.0000.7330.1930.0001.0000.7270.193
param.1000.0990.3990.1140.0590.0310.5140.1050.0620.0280.4950.1310.063
CML10000.8401.0000.7100.2350.0411.0000.7550.1830.0231.0000.7370.194
semi-p.1000.2220.3500.1060.0630.1720.3840.0980.0540.1340.4100.1090.066
CML10000.9990.9990.6640.2220.9760.9990.6650.2040.9171.0000.7200.208
Table 9. Rejection rates of PACF-tests applied to Pearson residuals using CML estimates (DGPs with μ = 5 , ρ ( 1 ) = 0.5 , and α 2 = 0.2 ), where both residuals and bootstrap rely on null of Poi-INAR ( 2 ) process (parametric bootstrap) or unspecified INAR ( 2 ) process (semi-parametric bootstrap), respectively.
Table 9. Rejection rates of PACF-tests applied to Pearson residuals using CML estimates (DGPs with μ = 5 , ρ ( 1 ) = 0.5 , and α 2 = 0.2 ), where both residuals and bootstrap rely on null of Poi-INAR ( 2 ) process (parametric bootstrap) or unspecified INAR ( 2 ) process (semi-parametric bootstrap), respectively.
True DGP:Poi-INAR(2)NB-INAR(2), σ 2 μ = 1.5 Poi-INARCH(2)
PACF at Lag h = PACF at Lag h = PACF at Lag h =
Method n 123412341234
asym.1000.0020.0020.0330.0360.0030.0010.0370.0380.0010.0010.0370.036
(5)10000.0000.0000.0370.0440.3030.0210.0820.0630.1580.0050.0630.047
asym.1000.0020.0020.0390.0410.0050.0040.0470.0490.0020.0020.0420.046
(11)10000.0000.0010.0380.0440.3320.0280.0890.0700.1740.0060.0740.055
param.1000.0020.0020.0370.0430.0030.0030.0510.0470.0020.0020.0480.051
CML10000.0000.0010.0380.0440.3440.0230.0860.0670.1860.0070.0650.058
semi-p.1000.0430.0380.0500.0440.0440.0360.0470.0430.0380.0390.0520.050
CML10000.0350.0450.0530.0530.0510.0540.0550.0510.1490.0560.0540.053
Essentially analogous conclusions can be drawn from Table 9, where we are concerned with both 2nd-order models and 2nd-order DGPs. So let us turn to Table 8, where 1st-order models are fitted to 2nd-order DGPs. Thus, we are concerned with at least an inadequate autocorrelation structure (and sometimes also further model misspecification) such that high rejection rates are desirable. Let us start with the first block about the Poi-INAR ( 2 ) DGP. As a consequence of the strong undersizing at lag 1, the parametric bootstrap, and especially the asymptotic implementations, show relatively low power values, especially for the small sample size n = 100 . The semi-parametric bootstrap, by contrast, has substantially higer power values at lag 1. For lags h 2 , the rejection rates are similar between the different implementations, with a slight advantage for the refined asymptotics as well as the parametric bootstrap. The discrepancy at lag 1 gets even more extreme for the NB-INAR ( 2 ) and Poi-INARCH ( 2 ) DGP, then all other implementations than the semi-parametric one lead to power close to zero. For lags 2 and 3, by contrast, the refined asymptotics as well as the parametric bootstrap are again more powerful. However, looking back to Table 5, it seems that the overall most appealing power is shown by the MM-based parametric bootstrap. This type of bootstrap also has the advantage that the necessary computational effort is much less than for the CML-based bootstraps. Thus, altogether, while we recommended to use the refined asymptotics (11) if testing the PACF computed from the raw counts, the PACF analysis of Pearson residuals should be done by the MM-based parametric bootstrap: if computing the Pearson residuals from an MM-fitted Poi-INAR model, and if using this model fit for parametric bootstrap, one has good size properties and an appealing power performance at the same time. Certainly, this recommendation does not exclude to do CML-fitting in a second step, once the correct AR-order has been identified. But during the phase of model diagnostics, at least if n is not particularly large, the parametric-MM solution seems to be best suited.
Remark 2.
As mentioned in Section 4, we also tried out fully non-parametric bootstrap schemes. For the case where the PACF-tests are applied to the raw counts ( X t ) , as discussed in Section 5, the circular block bootstrap was used as a fully non-parametric setup, see Table A2 in the Appendix B for the obtained results. While these implementations lead to an appealing power at lag h = 1 , strong size deteriorations are observed for h 2 . The strongest deviations are observed for the fixed block length b = 5 . Increasing b, first the low-lag rejection rates stabilize at 0.05, while we have undersizing for large lags. For b = 20 , 25 , we have good sizes for h = 5 , 6 , but now the low lags lead to exceedances of 0.05. Thus, tailor-made block lengths would be required for different lags h. The automatic block-length selection viab.startypically leads to block lengths between 5 and 10 (depending on the actual extent of ρ ( 1 ) ), but this causes undersizing throughout, getting more severe for increasing h. The reason whyb.startends to pick block lengths that are too small to capture dependence at larger lags is given by the fact that it is designed to select a block length suitable for inference about the sample mean, but not for the sample PACF. In view of the aforementioned size problems and the unclear choice of block lengths, we discourage from using block-bootstrap implementations of the PACF-test for analyzing the raw counts data.
If doing a PACF-analysis of the Pearson residuals, as we investigate it in the present Section 6, then, besides block-bootstrap implementations, also the Efron bootstrap appears reasonable for this task. For the case where the Pearson residuals rely on MM estimates, simulation results are summarized in Table A4 in the Appendix B. If doing an automatic block-length selection viab.star, we often end up with block length 1 (as the Pearson residuals are uncorrelated under model adequacy). Thus, the b.star-block bootstrap shows nearly the same rejection rates as the Efron bootstrap, but these are too low at lags h = 1 , 2 , like for the asymptotic implementations. Increasing the block length to the fixed values b = 5 or  b = 10 , we get an even further decrease in size. Therefore, neither Efron nor block bootstrap offer any advantage compared to the asymptotic implementations. Analogous conclusions hold if model fitting is done by CML estimation, see Table A6 in the Appendix B, so we discourage from the use of Efron and block bootstrap also if doing a PACF-test of the Pearson residuals.

7. Real-Data Application

For illustration, we pick up a widely discussed data example from the literature, namely the claims counts data introduced by Freeland [29]. These counts express the monthly number of claims caused by burn-related injuries in the heavy manufacturing industry for the period 1987–1994, i.e., the count time series is of length n = 96 ; see Figure 2. Recall that the R-code used for the subsequent computations is provided in the Supplementary Materials. Freeland [29] suggested to model these data by a Poi-INAR ( 1 ) model, but following the discussions of subsequent authors, this model choice is not without controversy. For example, the marginal distribution exhibits moderate overdispersion, as the sample variance 11.357 exceeds the mean 8.604. Therefore, some authors suggested to consider an NB-INAR ( 1 ) or Poi-INARCH ( 1 ) model instead. Furthermore, one may doubt the 1st-order AR-structure, see Weiß et al. [30], as the SPACF in Figure 2 is only slightly non-significant at lag h = 2 , where the plotted critical values (dashed lines) refer to the PACF-test on level 0.05 based on the simple asymptotic implementation (5). Thus, altogether, we are concerned with a scenario that fits very well to our simulation study in Section 5 and Section 6: the null hypothesis for the data is that of a Poi-INAR ( 1 ) model, but this model might be misspecified in terms of marginal distribution, model order, or the actual AR-type data-generating mechanism. Moreover, the sample size n = 96 and the lag-1 sample ACF 0.452 are close to the parametrizations considered there. In what follows, we apply the different implementations of the PACF-test to (the Pearson residuals computed from) the claims data. Certainly, as we do not know the true model behind the data, we are not in a position to pass definitive judgement on whether or not a test lead to the correct or wrong decision. But we shall discuss the PACF-tests with respect to our simulation results.
Let us start with an analysis of the raw counts’ SPACF, in analogy to Section 5. Table 10 summarizes the SPACF ( h ) values for h = 1 , , 5 (bold font) as well as the corresponding critical values (level 0.05). The latter are computed by the five methods considered in Section 5, with the number of bootstrap replications chosen as B = 1000 . For the simple asymptotic implementation (5), as we have already seen in Figure 2, we get a rejection only at lag 1, whereas the remaining methods reject also at lag 2. Thus, there is indeed evidence that the data might stem from a higher-order model. In addition, the different lag-2 decisions for (5) vs. the remaining implementations appear plausible in view of Table 2, where we found clearly lower power for (5) at h = 2 . Note that all critical values except (5) are visibly asymmetric, so the SPACF appears rather biased for n = 96 . Furthermore, all bootstrap implementations lead to quite similar critical values, and the refined asymptotic implementation (11) is also similar to them except for the upper critical value at h = 1 .
Next, we fit either a Poi-INAR ( 1 ) model to the claims counts (via MM or CML), or an unspecified INAR ( 1 ) model by the semi-parametric CML approach. Using the resulting model fits, we first compute a set of Pearson residuals for each model, and then the SPACF thereof, like in Section 6. The critical values are determined by both asymptotic approaches as well as by the bootstrap approach corresponding to the respective estimation method. Results are summarized in Table 11. We get only a few rejections anymore, namely for the CML-fit of the Poi-INAR ( 1 ) model at lag h = 2 , both for the refined asymptotics and the parametric bootstrap. The remaining model fits do not lead to a rejection, and one might ask, why? The reason seems to be the respective estimate of the AR ( 1 ) -parameter α 1 = ρ ( 1 ) , which equals only 0.396 for CML, but 0.452 for MM and 0.434 for semi-CML. So the CML-fit explains less of the dependency in the data. The deeper reason for this ambiguous outcome seems to be the low sample size n = 96 ; according to Section 6, we can generally expect only mild power values. It is again interesting to compare the different critical values. For the Poi-INAR ( 1 ) CML-fit, bootstrap and refined asymptotics lead to rather similar critical values, in agreement with our simulation results in Section 6, where a similar performance of both methods was observed. For the remaining estimation approaches, the bootstrap critical values tend to be more narrow than the asymptotic ones, especially at lags 1 and 2. The strongest “shrinkage” of the critical values is observed for h = 1 , which goes along with our findings in Section 6, where the asymptotic implementation lead to severe undersizing at lag 1, whereas the bootstrap approaches held the nominal level quite well. Furthermore, due to the narrower critical values, the MM and semi-CML bootstraps are also more powerful at lags 1 and 2.

8. Conclusions

In this paper, we considered PACF model diagnostics for AR-type count processes based on raw data and on Pearson residuals, respectively. At first, we illustrated the limitations of the widely used and well-known asymptotic distribution result (as well as some refinements thereof) for the sample PACF values. Then, we introduced appropriate bootstrap schemes for the approximation of the correct sample PACF distribution. We considered a fully parametric bootstrap combined with MM and CML estimation, a semi-parametric bootstrap combined with CML estimation, and a fully non-parametric bootstrap scheme. We compared the performance of the different procedures for first- and second-order AR-type count processes. In the case where we apply the PACF test directly to the raw count data, the best performance was observed for the MM-based parametric bootstrap, CML-based semi-parametric bootstrap, and the refined asymptotic results, where the latter are preferable for computing time reasons. By contrast, when applying the PACF test to the Pearson residuals, we advise using the MM-based parametric bootstrap procedure which simultaneously provides good size properties and power performance. Finally, we applied our different PACF procedures to a well-known data set on claims counts and found some evidence for a higher-order model.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/e25010105/s1.

Author Contributions

Conceptualization (all authors); Funding acquisition (C.H.W. and C.J.); Methodology (all authors); Software (B.A.); Supervision (C.H.W. and C.J.); Writing—original draft preparation (C.H.W. and B.A.); Writing—review and editing (C.H.W., M.F. and C.J.). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)–Projektnummer 437270842.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the supplementary material.

Acknowledgments

The authors thank the two referees for their useful comments on an earlier draft of this article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. On the Equivalence of ACF, PACF, and AR Coefficients

The first p Yule–Walker (YW) equations in
ρ ( h ) = i = 1 p α i ρ | h i | for h = 1 , 2 ,
can be rewritten in vector-matrix notation as follows: For k N , let α k : = α 1 , , α k R k with α i = 0 for i > p , let r k : = ρ ( 1 ) , , ρ ( k ) R k , and
R k : = ρ | i j | i , j = 1 , , k = 1 ρ ( 1 ) ρ ( k 1 ) ρ ( 1 ) 1 ρ ( 1 ) ρ ( k 1 ) ρ ( 1 ) 1 R k × k .
Then, (A1) implies that the AR ( p ) process satisfies the linear equation
R p α p = r p .
Note that  R k constitutes a symmetric Toeplitz matrix, i.e., it is characterized by having constant diagonals. This type of matrix structure was first considered by Toeplitz [31,32], and it is crucial for efficiently solving (A3) in  α p (see the details below).
Assume that R k is invertible, and let a k R k be the unique solution of the equation
R k a k = r k , i . e . , a k = R k 1 r k .
Then, the PACF at lag k is defined by ρ part ( k ) : = a k , k (last component of  a k ); let us denote π k : = ρ p a r t ( 1 ) , , ρ p a r t ( k ) R k .
If ( X t ) Z follows an AR(p) model, then (A3) implies that
ρ part ( p ) = α p , ρ part ( h ) = 0 for all h > p ,
holds; in particular, we have a p = α p . Because of the Toeplitz structure of  R k , the YW-equations (A3) can be solved recursively for k = 1 , 2 , , which was first recognized by Durbin [33], Levinson [34]. The recursive scheme, which is commonly referred to as the Durbin–Levinson (DL) algorithm, can be expressed as
a k + 1 , k + 1 = ρ ( k + 1 ) i = 1 k a k , i ρ ( k + 1 i ) 1 i = 1 k a k , i ρ ( i ) , a k + 1 , 1 a k + 1 , k = a k , 1 a k , k a k + 1 , k + 1 a k , k a k , 1 .
Given the (sample) ACF, the DL-algorithm (A6) is used to recursively compute the (sample) PACF for k = 1 , 2 , , where ρ part ( 1 ) = a 1 , 1 = ρ ( 1 ) .
Furthermore, applying the DL-algorithm to (A3), we can compute the AR parameters α 1 , , α p corresponding to the ACF values ρ ( 1 ) , , ρ ( p ) (or if using the sample ACF, we end up with moment estimates for the AR parameters, referred to as YW-estimates). In R, this is readily implemented via acf2AR. Given the AR parameters, in turn, (A1) or (A3) can also be solved in the ACF, see Section 3.3 in Brockwell & Davis [11] as well as the R command ARMAacf.
The previous discussion shows that an AR ( p ) model can be characterized equivalently by either α p or r p . According to Barndorff-Nielsen & Schou [35], this type of “equivalent parametrization” can be further extended by the one-to-one relationship between α p and π p , i.e., we have one-to-one relations between r p α p π p . For computing α p from π p , Barndorff-Nielsen & Schou [35] suggest to use the DL-algorithm (A6) together with (A4) as follows:
a k + 1 , 1 a k + 1 , k a k + 1 , k + 1 = a k , 1 a k , k 0 ρ part ( k + 1 ) a k , k a k , 1 1 for k = 1 , 2 , ,
which is initialized by setting a 1 , 1 = ρ part ( 1 ) . Then, α p = a p . Altogether, the application of the DL-algorithm allows the transformations
r p ( A 3 ) α p ( A 6 ) ( A 7 ) π p
By contrast, recall that α p r p (and thus π p α p r p ) is done by solving (A1) or (A3) in the ACF, e.g., by using the “third method” in Brockwell & Davis [11] (Section 3.3) or R’s ARMAacf.

Appendix B. Further Simulation Results

Table A1. Rejection rates of PACF-test applied to DGP with μ = 5 , where semi-parametric (parametric) bootstrap relies on null of (Poi-)INAR ( 1 ) process.
Table A1. Rejection rates of PACF-test applied to DGP with μ = 5 , where semi-parametric (parametric) bootstrap relies on null of (Poi-)INAR ( 1 ) process.
True DGP:Poi-INAR(1)NB-INAR(1), σ 2 μ = 1.5 Poi-INARCH(1)
PACF at Lag h = PACF at Lag h = PACF at Lag h =
Method ρ ( 1 ) n 123412341234
asym.0.251000.6290.0500.0470.0430.6460.0500.0420.0440.6390.0490.0460.045
(1.5) 10001.0000.0490.0500.0461.0000.0540.0520.0491.0000.0540.0470.050
0.51000.9980.0530.0400.0440.9980.0540.0430.0460.9950.0540.0460.044
10001.0000.0560.0470.0491.0000.0570.0510.0531.0000.0560.0530.050
0.751001.0000.0500.0420.0431.0000.0570.0480.0451.0000.0620.0530.047
10001.0000.0500.0480.0541.0000.0600.0560.0551.0000.0710.0630.061
asym.0.251000.6920.0450.0460.0540.6880.0510.0480.0520.6900.0540.0520.047
(3.2) 10001.0000.0530.0500.0521.0000.0530.0510.0471.0000.0510.0500.049
0.51000.9980.0540.0470.0480.9970.0520.0500.0510.9970.0470.0490.048
10001.0000.0500.0490.0511.0000.0610.0530.0491.0000.0600.0510.048
0.751001.0000.0470.0510.0461.0000.0540.0540.0501.0000.0600.0610.053
10001.0000.0550.0540.0541.0000.0620.0580.0561.0000.0730.0660.060
param.0.251000.7420.0460.0460.0500.7360.0450.0540.0510.7470.0530.0480.048
MM 10001.0000.0540.0480.0431.0000.0480.0550.0541.0000.0500.0480.050
0.51001.0000.0520.0550.0560.9990.0530.0520.0551.0000.0480.0530.049
10001.0000.0550.0520.0491.0000.0470.0520.0461.0000.0460.0560.046
0.751001.0000.0500.0570.0561.0000.0520.0470.0471.0000.0640.0670.061
10001.0000.0610.0490.0491.0000.0600.0590.0541.0000.0670.0620.059
param.0.251000.7470.0490.0490.0540.7350.0460.0490.0510.7260.0530.0520.055
CML 10001.0000.0510.0490.0471.0000.0490.0580.0491.0000.0440.0510.048
0.51001.0000.0540.0510.0540.9990.0490.0530.0500.9990.0590.0460.049
10001.0000.0480.0490.0571.0000.0500.0550.0511.0000.0520.0520.047
0.751001.0000.0520.0500.0511.0000.0510.0510.0501.0000.0610.0550.053
10001.0000.0500.0500.0521.0000.0520.0580.0531.0000.0690.0660.066
semi-p.0.251000.7360.0430.0460.0480.7230.0490.0520.0510.7330.0530.0530.056
CML 10001.0000.0480.0520.0471.0000.0460.0530.0571.0000.0540.0590.046
0.51001.0000.0530.0530.0511.0000.0540.0510.0490.9990.0440.0480.054
10001.0000.0470.0540.0541.0000.0510.0490.0571.0000.0520.0540.052
0.751001.0000.0520.0540.0541.0000.0510.0510.0411.0000.0640.0630.057
10001.0000.0480.0460.0511.0000.0510.0480.0481.0000.0600.0570.060
Table A2. Rejection rates of PACF-test applied to Poi-INAR ( 1 ) DGP with μ = 5 , where circular block bootstrap with automatically selected (“b.star”) or fixed block length b is used.
Table A2. Rejection rates of PACF-test applied to Poi-INAR ( 1 ) DGP with μ = 5 , where circular block bootstrap with automatically selected (“b.star”) or fixed block length b is used.
PACF at Lag h =
Method ρ ( 1 ) n123456
b.star0.251000.9220.0100.0060.0070.0070.007
10001.0000.0370.0250.0160.0110.006
0.51001.0000.0240.0120.0080.0080.007
10001.0000.0500.0410.0320.0320.023
0.751001.0000.0410.0270.0190.0150.010
10001.0000.0450.0500.0420.0370.035
b = 5 0.251000.8570.0310.0240.0110.0050.005
10001.0000.0380.0230.0110.0030.004
0.51001.0000.0360.0200.0140.0040.008
10001.0000.0420.0260.0130.0040.006
0.751001.0000.0280.0210.0080.0020.008
10001.0000.1250.0740.0380.0160.029
b = 10 0.251000.8330.0520.0470.0360.0260.022
10001.0000.0450.0410.0360.0310.025
0.51001.0000.0530.0420.0320.0260.020
10001.0000.0470.0400.0340.0270.024
0.751001.0000.0450.0410.0350.0230.020
10001.0000.0540.0400.0340.0290.019
b = 15 0.251000.8310.0580.0500.0510.0370.031
10001.0000.0430.0480.0450.0370.030
0.51001.0000.0580.0540.0510.0420.029
10001.0000.0580.0450.0460.0380.031
0.751001.0000.0540.0420.0350.0360.034
10001.0000.0530.0440.0390.0310.036
b = 20 0.251000.8130.0640.0600.0550.0510.047
10001.0000.0530.0460.0450.0460.041
0.51001.0000.0670.0650.0560.0500.044
10001.0000.0520.0480.0380.0450.039
0.751001.0000.0590.0640.0510.0430.041
10001.0000.0490.0480.0470.0340.038
b = 25 0.251000.8190.0800.0720.0580.0580.056
10001.0000.0540.0580.0510.0470.044
0.51001.0000.0780.0650.0650.0520.048
10001.0000.0490.0600.0500.0460.052
0.751001.0000.0700.0640.0580.0480.051
10001.0000.0540.0500.0500.0480.040
Table A3. Rejection rates of PACF-test applied to Pearson residuals using MM estimates (DGPs with μ = 5 ), where both residuals and parametric bootstrap rely on null of Poi-INAR ( 1 ) process.
Table A3. Rejection rates of PACF-test applied to Pearson residuals using MM estimates (DGPs with μ = 5 ), where both residuals and parametric bootstrap rely on null of Poi-INAR ( 1 ) process.
True DGP:Poi-INAR(1)NB-INAR(1), σ 2 μ = 1.5 Poi-INARCH(1)
PACF at Lag h = PACF at Lag h = PACF at Lag h =
Method ρ ( 1 ) n 123412341234
asym.0.251000.0000.0380.0440.0470.0000.0410.0400.0460.0000.0420.0460.046
(1.5) 10000.0000.0430.0500.0500.0000.0430.0490.0500.0000.0450.0470.051
0.51000.0000.0260.0400.0430.0010.0260.0380.0440.0000.0310.0410.044
10000.0000.0290.0440.0530.0000.0300.0430.0520.0010.0330.0440.050
0.751000.0110.0220.0340.0370.0100.0220.0300.0350.0130.0260.0360.041
10000.0110.0260.0360.0420.0090.0240.0350.0440.0170.0280.0390.045
asym.0.251000.0000.0390.0460.0480.0000.0430.0490.0480.0000.0460.0490.045
(3.2) 10000.0000.0460.0520.0550.0000.0440.0480.0480.0000.0470.0470.053
0.51000.0010.0300.0460.0500.0010.0320.0420.0490.0010.0370.0490.049
10000.0000.0300.0470.0480.0000.0300.0470.0510.0000.0320.0470.050
0.751000.0140.0310.0380.0470.0150.0300.0370.0450.0190.0360.0450.053
10000.0110.0260.0360.0420.0100.0240.0340.0430.0180.0290.0400.045
param.0.251000.0290.0500.0460.0520.0300.0410.0520.0470.0250.0500.0460.052
MM 10000.0470.0470.0490.0500.0470.0530.0470.0460.0570.0530.0570.049
0.51000.0560.0510.0520.0480.0500.0490.0490.0450.0660.0500.0470.051
10000.0510.0530.0490.0450.0500.0510.0530.0510.0600.0460.0500.044
0.751000.0590.0420.0500.0530.0580.0440.0460.0450.0710.0560.0550.050
10000.0510.0510.0470.0490.0430.0460.0460.0450.0640.0540.0610.053
Table A4. Rejection rates of PACF-test applied to Pearson residuals using MM estimates (Poi-INAR ( 1 ) DGPs with μ = 5 ), where circular block bootstrap with automatically selected (“b.star”) or fixed block length b is used.
Table A4. Rejection rates of PACF-test applied to Pearson residuals using MM estimates (Poi-INAR ( 1 ) DGPs with μ = 5 ), where circular block bootstrap with automatically selected (“b.star”) or fixed block length b is used.
PACF at Lag h =
Method ρ ( 1 ) n123456
Efron0.251000.0000.0470.0460.0470.0510.049
10000.0000.0400.0470.0510.0480.046
0.51000.0010.0330.0540.0480.0440.050
10000.0000.0330.0430.0500.0500.055
0.751000.0140.0300.0410.0440.0470.053
10000.0110.0280.0320.0440.0430.046
b.star0.251000.0000.0410.0410.0480.0450.054
10000.0000.0490.0520.0470.0490.053
0.51000.0000.0340.0440.0460.0450.054
10000.0010.0300.0480.0480.0470.054
0.751000.0080.0370.0400.0440.0440.049
10000.0070.0270.0330.0410.0440.049
b = 5 0.251000.0000.0240.0350.0480.0440.047
10000.0000.0230.0350.0440.0550.053
0.51000.0010.0220.0340.0440.0470.042
10000.0000.0170.0330.0470.0460.049
0.751000.0030.0170.0350.0420.0500.047
10000.0030.0130.0280.0390.0440.051
b = 10 0.251000.0000.0150.0190.0270.0320.035
10000.0000.0100.0170.0250.0280.033
0.51000.0000.0160.0200.0300.0330.043
10000.0000.0080.0170.0210.0280.037
0.751000.0030.0120.0180.0230.0300.034
10000.0020.0080.0130.0200.0250.035
Table A5. Rejection rates of PACF-test applied to Pearson residuals using CML estimates (DGPs with μ = 5 ), where both residuals and bootstrap rely on null of Poi-INAR ( 1 ) process (parametric bootstrap) or unspecified INAR ( 1 ) process (semi-parametric bootstrap), respectively.
Table A5. Rejection rates of PACF-test applied to Pearson residuals using CML estimates (DGPs with μ = 5 ), where both residuals and bootstrap rely on null of Poi-INAR ( 1 ) process (parametric bootstrap) or unspecified INAR ( 1 ) process (semi-parametric bootstrap), respectively.
True DGP:Poi-INAR(1)NB-INAR(1), σ 2 μ = 1.5 Poi-INARCH(1)
PACF at Lag h = PACF at Lag h = PACF at Lag h =
Method ρ ( 1 ) n 123412341234
asym.0.251000.0010.0440.0430.0470.0010.0420.0400.0460.0010.0450.0470.047
(1.5) 10000.0000.0450.0510.0520.2360.0570.0490.0500.0000.0480.0480.052
0.51000.0090.0350.0410.0460.0280.0360.0410.0390.0230.0380.0450.042
10000.0080.0350.0450.0480.9020.1820.0690.0530.7450.1480.0720.051
0.751000.0320.0420.0430.0420.0570.0410.0440.0400.3640.1130.0700.047
10000.0330.0400.0480.0500.5810.2880.1620.0941.0000.9130.4880.204
asym.0.251000.0010.0430.0470.0490.0020.0490.0500.0480.0000.0460.0490.049
(3.2) 10000.0000.0420.0490.0480.2700.0600.0490.0480.0000.0450.0500.050
0.51000.0090.0410.0460.0480.0430.0550.0470.0500.0340.0530.0480.051
10000.0090.0390.0460.0530.9090.1980.0770.0580.7530.1670.0720.055
0.751000.0340.0430.0460.0480.0750.0620.0520.0550.4250.1650.0940.067
10000.0350.0400.0460.0460.6090.3100.1730.1121.0000.9210.5020.221
param.0.251000.0400.0520.0490.0480.2620.0500.0510.0560.0460.0510.0510.056
CML 10000.0490.0460.0530.0491.0000.0650.0510.0540.1940.0470.0500.047
0.51000.0520.0490.0490.0450.2380.0620.0480.0490.2090.0640.0530.050
10000.0490.0530.0490.0530.9930.2260.0750.0510.9630.1880.0820.048
0.751000.0460.0460.0500.0510.1230.0790.0540.0520.6060.1900.0970.068
10000.0480.0530.0480.0470.7090.3380.1690.1211.0000.9360.5220.221
semi-p.0.251000.0370.0450.0470.0500.0490.0440.0500.0540.0450.0510.0520.051
CML 10000.0370.0540.0500.0470.0370.0470.0520.0520.0310.0510.0540.054
0.51000.0500.0510.0540.0530.0570.0480.0520.0440.0700.0520.0490.051
10000.0390.0530.0560.0480.0520.0530.0550.0490.2250.0670.0580.050
0.751000.0510.0440.0490.0500.0490.0490.0460.0470.2230.1040.0720.061
10000.0460.0490.0490.0510.0550.0530.0500.0500.3770.2170.1290.080
Table A6. Rejection rates of PACF-test applied to Pearson residuals using CML estimates (Poi-INAR ( 1 ) DGPs with μ = 5 ), where circular block bootstrap with automatically selected (“b.star”) or fixed block length b is used.
Table A6. Rejection rates of PACF-test applied to Pearson residuals using CML estimates (Poi-INAR ( 1 ) DGPs with μ = 5 ), where circular block bootstrap with automatically selected (“b.star”) or fixed block length b is used.
PACF at Lag h =
Method ρ ( 1 ) n123456
b.star0.251000.0010.0460.0500.0500.0520.049
10000.0000.0500.0530.0500.0500.051
0.51000.0070.0360.0460.0540.0450.052
10000.0050.0340.0460.0470.0500.050
0.751000.0230.0380.0430.0450.0420.058
10000.0220.0490.0460.0470.0500.052
b = 5 0.251000.0010.0240.0370.0480.0460.048
10000.0000.0200.0330.0480.0450.056
0.51000.0030.0200.0380.0390.0500.050
10000.0020.0180.0350.0450.0510.049
0.751000.0100.0230.0420.0470.0500.048
10000.0090.0180.0320.0450.0490.044
b = 10 0.251000.0010.0160.0200.0300.0330.042
10000.0000.0120.0180.0280.0290.034
0.51000.0030.0140.0190.0300.0310.036
10000.0010.0100.0150.0200.0270.031
0.751000.0100.0160.0180.0250.0400.039
10000.0060.0110.0150.0230.0230.032

References

  1. Walker, G.T. On periodicity in series of related terms. Proc. R. Soc. Lond. Ser. A 1931, 131, 518–532. [Google Scholar] [CrossRef]
  2. Yule, G.U. On a method of investigating periodicities in disturbed series, with special reference to Wolfer’s sunspot numbers. Philos. Trans. R. Soc. Lond. Ser. A 1927, 226, 267–298. [Google Scholar]
  3. Box, G.E.P.; Jenkins, G.M. Time Series Analysis: Forecasting and Control, 1st ed.; Holden-Day: San Francisco, CA, USA, 1970. [Google Scholar]
  4. Holan, S.H.; Lund, R.; Davis, G. The ARMA alphabet soup: A tour of ARMA model variants. Stat. Surv. 2010, 4, 232–274. [Google Scholar] [CrossRef]
  5. Weiß, C.H. An Introduction to Discrete-Valued Time Series; John Wiley & Sons, Inc.: Chichester, UK, 2018. [Google Scholar]
  6. Grunwald, G.; Hyndman, R.J.; Tedesco, L.; Tweedie, R.L. Non-Gaussian conditional linear AR(1) models. Aust. N. Z. J. Stat. 2000, 42, 479–495. [Google Scholar] [CrossRef] [Green Version]
  7. Du, J.-G.; Li, Y. The integer-valued autoregressive (INAR(p)) model. J. Time Ser. Anal. 1991, 12, 129–142. [Google Scholar]
  8. Ferland, R.; Latour, A.; Oraichi, D. Integer-valued GARCH processes. J. Time Ser. Anal. 2006, 27, 923–942. [Google Scholar] [CrossRef]
  9. Kim, H.-Y.; Weiß, C.H.; Möller, T.A. Models for autoregressive processes of bounded counts: How different are they? Comput. Stat. 2020, 35, 1715–1736. [Google Scholar] [CrossRef] [Green Version]
  10. Jung, R.C.; Tremayne, A.R. Useful models for time series of counts or simply wrong ones? AStA Adv. Stat. Anal. 2011, 95, 59–91. [Google Scholar] [CrossRef]
  11. Brockwell, P.J.; Davis, R.A. Time Series: Theory and Methods, 2nd ed.; Springer: New York, NY, USA, 1991. [Google Scholar]
  12. Kim, H.-Y.; Weiß, C.H. Goodness-of-fit tests for binomial AR(1) processes. Statistics 2015, 49, 291–315. [Google Scholar] [CrossRef]
  13. Mills, T.M.; Seneta, E. Independence of partial autocorrelations for a classical immigration branching process. Stoch. Process. Their Appl. 1991, 37, 275–279. [Google Scholar] [CrossRef] [Green Version]
  14. Weiß, C.H.; Scherer, L.; Aleksandrov, B.; Feld, M.H.-J.M. Checking model adequacy for count time series by using Pearson residuals. J. Time Ser. Econom. 2020, 12, 20180018. [Google Scholar] [CrossRef]
  15. Steutel, F.W.; van Harn, K. Discrete analogues of self-decomposability and stability. Ann. Probab. 1979, 7, 893–899. [Google Scholar] [CrossRef]
  16. Drost, F.C.; van den Akker, R.; Werker, B.J.M. Efficient estimation of auto-regression parameters and innovation distributions for semiparametric integer-valued AR(p) models. J. R. Stat. Soc. Ser. B 2009, 71, 467–485. [Google Scholar] [CrossRef]
  17. Faymonville, M.; Jentsch, C.; Weiß, C.H.; Aleksandrov, B. Semiparametric estimation of INAR models using roughness penalization. Stat. Methods Appl. 2022, 1–36. [Google Scholar] [CrossRef]
  18. Harvey, A.C.; Fernandes, C. Time series models for count or qualitative observations. J. Bus. Econ. Stat. 1989, 7, 407–417. [Google Scholar]
  19. Zhu, F.; Wang, D. Diagnostic checking integer-valued ARCH(p) models using conditional residual autocorrelations. Comput. Stat. Data Anal. 2010, 54, 496–508. [Google Scholar] [CrossRef]
  20. Anderson, O.D. Approximate moments to O(n−2) for the sampled partial autocorrelations from a white noise process. Comput. Stat. Data Anal. 1993, 16, 405–421. [Google Scholar] [CrossRef]
  21. Anderson, O.D. Exact general-lag serial correlation moments and approximate low-lag partial correlation moments for Gaussian white noise. J. Time Ser. Anal. 1993, 14, 551–574. [Google Scholar] [CrossRef]
  22. Kwan, A.C.C. Sample partial autocorrelations and portmanteau tests for randomness. Appl. Econ. Lett. 2003, 10, 605–609. [Google Scholar] [CrossRef]
  23. Weiß, C.H.; Schweer, S. Bias corrections for moment estimators in Poisson INAR(1) and INARCH(1) processes. Stat. Probab. Lett. 2016, 112, 124–130. [Google Scholar] [CrossRef]
  24. Hall, P. The Bootstrap and Edgeworth Expansion; Springer: New York, NY, USA, 1992. [Google Scholar]
  25. Jentsch, C.; Weiß, C.H. Bootstrapping INAR models. Bernoulli 2019, 25, 2359–2408. [Google Scholar] [CrossRef] [Green Version]
  26. Politis, D.N.; White, H. Automatic block-length selection for the dependent bootstrap. Econom. Rev. 2004, 23, 53–70, Correction in Econom. Rev. 2009, 28, 372–375. [Google Scholar] [CrossRef]
  27. Efron, B. Bootstrap methods: Another look at the jackknife. Ann. Stat. 1979, 73, 1–26. [Google Scholar] [CrossRef]
  28. Giacomini, R.; Politis, D.N.; White, H. A warp-speed method for conducting Monte Carlo experiments involving bootstrap estimators. Econom. Theory 2013, 29, 567–589. [Google Scholar] [CrossRef] [Green Version]
  29. Freeland, R.K. Statistical Analysis of Discrete Time Series with Applications to the Analysis of Workers Compensation Claims Data. Ph.D. Thesis, University of British Columbia, Vancouver, BC, Canada, 1998. Available online: https://open.library.ubc.ca/cIRcle/collections/ubctheses/831/items/1.0088709 (accessed on 17 January 2019).
  30. Weiß, C.H.; Feld, M.H.-J.M.; Mamode Khan, N.; Sunecher, Y. INARMA modeling of count time series. Stats 2019, 2, 284–320. [Google Scholar] [CrossRef] [Green Version]
  31. Toeplitz, O. Zur Transformation der Scharen bilinearer Formen von unendlichvielen Veränderlichen. Nachrichten von der Ges. der Wiss. Göttingen Math.-Phys. Kl. 1907, 1907, 110–115. (In German) [Google Scholar]
  32. Toeplitz, O. Zur Theorie der quadratischen und bilinearen Formen von unendlichvielen Veränderlichen. I. Teil: Theorie der L-Formen. Mathematische Annalen 1911, 70, 351–376. (In German) [Google Scholar] [CrossRef]
  33. Durbin, J. The fitting of time-series models. Rev. Int. Stat. Inst. 1960, 28, 233–244. [Google Scholar] [CrossRef] [Green Version]
  34. Levinson, N. The Wiener (root mean square) error criterion in filter design and prediction. J. Math. Phys. 1946, 25, 261–278. [Google Scholar] [CrossRef]
  35. Barndorff-Nielsen, O.; Schou, G. On the parametrization of autoregressive models by partial autocorrelations. J. Multivar. Anal. 1973, 3, 408–419. [Google Scholar] [CrossRef]
Figure 1. Means in (ac) and SDs in (df) of SPACF ( h ) for sample size n = 1000 , either simulated values for Poi-INAR ( 1 ) DGP with μ = 5 and AR-parameter α , or asymptotic values from (11). SPACF computed from raw counts, and from Pearson residuals with CML or MM estimation.
Figure 1. Means in (ac) and SDs in (df) of SPACF ( h ) for sample size n = 1000 , either simulated values for Poi-INAR ( 1 ) DGP with μ = 5 and AR-parameter α , or asymptotic values from (11). SPACF computed from raw counts, and from Pearson residuals with CML or MM estimation.
Entropy 25 00105 g001
Figure 2. Time series plot and SPACF ( h ) of claims counts, see Section 7.
Figure 2. Time series plot and SPACF ( h ) of claims counts, see Section 7.
Entropy 25 00105 g002
Table 1. Rejection rates of PACF-tests applied to DGP with μ = 5 and ρ ( 1 ) = 0.5 , where semi-parametric (parametric) bootstrap relies on null of (Poi-)INAR ( 1 ) process.
Table 1. Rejection rates of PACF-tests applied to DGP with μ = 5 and ρ ( 1 ) = 0.5 , where semi-parametric (parametric) bootstrap relies on null of (Poi-)INAR ( 1 ) process.
True DGP:Poi-INAR(1)NB-INAR(1), σ 2 μ = 1.5 Poi-INARCH(1)
PACF at Lag h = PACF at Lag h = PACF at Lag h =
Method n 123412341234
asym.1000.9980.0530.0400.0440.9980.0540.0430.0460.9950.0540.0460.044
(5)10001.0000.0560.0470.0491.0000.0570.0510.0531.0000.0560.0530.050
asym.1000.9980.0540.0470.0480.9970.0520.0500.0510.9970.0470.0490.048
(11)10001.0000.0500.0490.0511.0000.0610.0530.0491.0000.0600.0510.048
param.1001.0000.0520.0550.0560.9990.0530.0520.0551.0000.0480.0530.049
MM10001.0000.0550.0520.0491.0000.0470.0520.0461.0000.0460.0560.046
param.1001.0000.0540.0510.0540.9990.0490.0530.0500.9990.0590.0460.049
CML10001.0000.0480.0490.0571.0000.0500.0550.0511.0000.0520.0520.047
semi-p.1001.0000.0530.0530.0511.0000.0540.0510.0490.9990.0440.0480.054
CML10001.0000.0470.0540.0541.0000.0510.0490.0571.0000.0520.0540.052
Table 2. Like Table 1, but 2nd-order DGPs with α 2 = 0.2 .
Table 2. Like Table 1, but 2nd-order DGPs with α 2 = 0.2 .
True DGP:Poi-INAR(2)NB-INAR(2), σ 2 μ = 1.5 Poi-INARCH(2)
PACF at Lag h = PACF at Lag h = PACF at Lag h =
Method n 123412341234
asym.1000.9840.3830.0480.0470.9830.3900.0470.0480.9830.3840.0490.048
(5)10001.0001.0000.0530.0531.0001.0000.0560.0521.0001.0000.0550.053
asym.1000.9900.4780.0480.0490.9870.4800.0530.0470.9860.4800.0530.054
(11)10001.0001.0000.0520.0511.0001.0000.0560.0511.0001.0000.0560.062
param.1000.9960.4700.0550.0580.9950.4650.0530.0540.9950.4820.0480.054
MM10001.0001.0000.0560.0581.0001.0000.0530.0511.0001.0000.0550.054
param.1000.9950.4700.0440.0500.9940.4650.0540.0510.9940.4710.0520.054
CML10001.0001.0000.0520.0531.0001.0000.0510.0571.0001.0000.0550.058
semi-p.1000.9950.4750.0550.0540.9960.4720.0470.0530.9940.4850.0490.058
CML10001.0001.0000.0540.0521.0001.0000.0540.0561.0001.0000.0540.053
Table 3. Rejection rates of PACF-tests applied to DGP with μ = 5 , ρ ( 1 ) = 0.5 and α 2 = 0.2 , where semi-parametric (parametric) bootstrap relies on null of (Poi-)INAR ( 2 ) process.
Table 3. Rejection rates of PACF-tests applied to DGP with μ = 5 , ρ ( 1 ) = 0.5 and α 2 = 0.2 , where semi-parametric (parametric) bootstrap relies on null of (Poi-)INAR ( 2 ) process.
True DGP:Poi-INAR(2)NB-INAR(2), σ 2 μ = 1.5 Poi-INARCH(2)
PACF at Lag h = PACF at Lag h = PACF at Lag h =
Method n 123412341234
asym.1000.9840.3830.0480.0470.9830.3900.0470.0480.9830.3840.0490.048
(5)10001.0001.0000.0530.0531.0001.0000.0560.0521.0001.0000.0550.053
asym.1000.9900.4780.0480.0490.9870.4800.0530.0470.9860.4800.0530.054
(11)10001.0001.0000.0520.0511.0001.0000.0560.0511.0001.0000.0560.062
param.1000.9920.5100.0440.0500.9920.5160.0490.0480.9920.5310.0540.056
MM10001.0001.0000.0460.0511.0001.0000.0550.0541.0001.0000.0580.052
param.1000.9770.4470.0570.0480.9940.4780.0520.0500.9910.4460.0550.056
CML10001.0001.0000.0500.0531.0001.0000.0530.0511.0001.0000.0540.051
semi-p.1000.9930.5480.0530.0470.9900.5210.0530.0540.9920.4980.0510.049
CML10001.0001.0000.0550.0471.0001.0000.0490.0471.0001.0000.0510.051
Table 10. SPACF ( h ) of claims counts (bold font), lower and upper critical values (level 0.05) by different methods, where italic font indicates that critical value is violated.
Table 10. SPACF ( h ) of claims counts (bold font), lower and upper critical values (level 0.05) by different methods, where italic font indicates that critical value is violated.
Lag h:12345
Upperasym. (5)0.2000.2000.2000.2000.200
critical valueasym. (11)0.1860.1750.1840.1730.182
by method …param., MM0.1340.1750.1840.1810.186
param., CML0.1480.1670.1790.1820.185
semi-p., CML0.1380.1710.1660.1620.192
SPACF ( h ) 0.4520.198−0.010−0.0380.040
Lowerasym. (5)−0.200−0.200−0.200−0.200−0.200
critical valueasym. (11)−0.207−0.217−0.205−0.215−0.203
by method …param., MM−0.211−0.224−0.206−0.202−0.199
param., CML−0.224−0.223−0.218−0.204−0.197
semi-p., CML−0.213−0.213−0.198−0.220−0.199
Table 11. SPACF ( h ) of Pearson residuals after fitting a (Poi-)INAR ( 1 ) model to the claims counts (bold font). Lower and upper critical values (level 0.05) by different methods, where italic font indicates that critical value is violated.
Table 11. SPACF ( h ) of Pearson residuals after fitting a (Poi-)INAR ( 1 ) model to the claims counts (bold font). Lower and upper critical values (level 0.05) by different methods, where italic font indicates that critical value is violated.
Poi-INAR(1), MMLag h:12345
Upperasym. (5)0.2010.2010.2010.2010.201
critical valueasym. (11)0.1870.1760.1850.1740.183
by method …param., MM0.1080.1670.1950.1840.189
SPACF ( h ) −0.0600.1560.061−0.032−0.007
Lowerasym. (5)−0.201−0.201−0.201−0.201−0.201
critical valueasym. (11)−0.208−0.218−0.206−0.216−0.205
by method …param., MM−0.076−0.190−0.195−0.195−0.202
Poi-INAR ( 1 ) , CMLLag h:12345
Upperasym. (5)0.2010.2010.2010.2010.201
critical valueasym. (11)0.1870.1760.1850.1740.183
by method …param., CML0.1720.1660.1830.1800.193
SPACF ( h ) 0.0090.1850.062−0.031−0.002
Lowerasym. (5)−0.201−0.201−0.201−0.201−0.201
critical valueasym. (11)−0.208−0.218−0.206−0.216−0.205
by method …param., CML−0.208−0.213−0.219−0.205−0.201
INAR ( 1 ) , semi-CMLLag h:12345
Upperasym. (5)0.2010.2010.2010.2010.201
critical valueasym. (11)0.1870.1760.1850.1740.183
by method …semi-p., CML0.1580.1710.1780.1620.203
SPACF ( h ) −0.0410.1650.064−0.029−0.006
Lowerasym. (5)−0.201−0.201−0.201−0.201−0.201
critical valueasym. (11)−0.208−0.218−0.206−0.216−0.205
by method …semi-p., CML−0.142−0.204−0.196−0.215−0.210
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Weiß, C.H.; Aleksandrov, B.; Faymonville, M.; Jentsch, C. Partial Autocorrelation Diagnostics for Count Time Series. Entropy 2023, 25, 105. https://doi.org/10.3390/e25010105

AMA Style

Weiß CH, Aleksandrov B, Faymonville M, Jentsch C. Partial Autocorrelation Diagnostics for Count Time Series. Entropy. 2023; 25(1):105. https://doi.org/10.3390/e25010105

Chicago/Turabian Style

Weiß, Christian H., Boris Aleksandrov, Maxime Faymonville, and Carsten Jentsch. 2023. "Partial Autocorrelation Diagnostics for Count Time Series" Entropy 25, no. 1: 105. https://doi.org/10.3390/e25010105

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop