Next Article in Journal
An Enhanced Quantum K-Nearest Neighbor Classification Algorithm Based on Polar Distance
Next Article in Special Issue
A Modified Multiplicative Thinning-Based INARCH Model: Properties, Saddlepoint Maximum Likelihood Estimation, and Application
Previous Article in Journal
FCKDNet: A Feature Condensation Knowledge Distillation Network for Semantic Segmentation
Previous Article in Special Issue
Partial Autocorrelation Diagnostics for Count Time Series
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Conway–Maxwell–Poisson-Binomial AR(1) Model for Bounded Time Series Data

1
School of Mathematics and Statistics, Henan University, Kaifeng 475004, China
2
School of Mathematics, Jilin University, Changchun 130012, China
3
College of Mathematics, Taiyuan University of Technology, Taiyuan 030024, China
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(1), 126; https://doi.org/10.3390/e25010126
Submission received: 28 November 2022 / Revised: 6 January 2023 / Accepted: 6 January 2023 / Published: 7 January 2023
(This article belongs to the Special Issue Discrete-Valued Time Series)

Abstract

:
Binomial autoregressive models are frequently used for modeling bounded time series counts. However, they are not well developed for more complex bounded time series counts of the occurrence of n exchangeable and dependent units, which are becoming increasingly common in practice. To fill this gap, this paper first constructs an exchangeable Conway–Maxwell–Poisson-binomial (CMPB) thinning operator and then establishes the Conway–Maxwell–Poisson-binomial AR (CMPBAR) model. We establish its stationarity and ergodicity, discuss the conditional maximum likelihood (CML) estimate of the model’s parameters, and establish the asymptotic normality of the CML estimator. In a simulation study, the boxplots illustrate that the CML estimator is consistent and the qqplots show the asymptotic normality of the CML estimator. In the real data example, our model takes a smaller AIC and BIC than its main competitors.

1. Introduction

Bounded time series of counts are commonly observed in real-world applications. Its (binomial) index of dispersion (as a function of n, μ and σ 2 ) is defined by BID ( X ) = n σ 2 / μ ( n μ ) , where n is the predetermined upper limit of the range, E ( X ) = μ and Var ( X ) = σ 2 . If its BID ( X ) < 1 , then it is under-dispersed, if its BID ( X ) = 1 , then it is equi-dispersed, while if its BID ( X ) > 1 , then it is over-dispersed (or the extra-binomial variation).
A popular tool to establish a binomial autoregressive model (BAR) is the binomial thinning operator “∘” [1], which is introduced by
α X : = i = 1 X W i ,
where X is a non-negative integer-valued random variable, { W i , i = 1 , 2 , , n } is an i.i.d. Bernoulli random variable sequence with P ( W i = 1 ) = 1 P ( W i = 0 ) = α and independent of X. McKenzie [2] used the binomial thinning operator given in (1) to establish the binomial AR(1) model, which is a popular model for bounded time series and defined as follows
X t = α X t 1 + β ( n X t 1 ) ,
where n N is the predetermined upper limit of the range; X 0 follows the binomial distribution with P ( X 0 = k ) = n k π k ( 1 π ) n k ; α = β + ρ and β = ( 1 ρ ) π with ρ max { π / ( 1 π ) , ( 1 π ) / π } , 1 and π ( 0 , 1 ) ; the counting series at time t are independent of the random variables X s , s < t ; and all the counting series in “ α ” and “ β ” are mutually independent sequences of independent Bernoulli random variables with parameters α and β , respectively. The binomial AR(1) process given in (2) is now well understood and it is an ergodic Markov chain with a stationary distribution Bin ( n , π ) with π = β / ( 1 ρ ) and ρ = α β . Hence, its BID ( X t ) = 1 , i.e., the BAR model given in (2), applies to equi-dispersed time series with finite range; see [3,4,5,6,7] for more discussion about the BAR(1) model.
Weiß and Pollett [8] extended the binomial AR(1) model as the density-dependent BAR(1) model (denoted as the DDBAR(1) model), whose thinning probabilities vary over time by assuming α t = α ( X t 1 / n ) and β t = β ( X t 1 / n ) . In particular, for given n, if α t = ( 1 ρ ) ( a + b X t 1 / n ) and β t = ( 1 ρ ) ( a + b X t 1 / n ) + ρ , the DDBAR(1) model allows to analyze bounded integer-valued time series with under-dispersion, equi-dispersion and over-dispersion, see Section 4 in [8] for more details. To model extra-binomial variation for time series counts, Weiß and Kim [9] proposed the beta-binomial AR (BBAR) model based on the beta-binomial thinning operator “ α ϕ ”, which is introduced by
α ϕ X = i = 1 X B i ,
where X is a non-negative integer-valued random variable, { B i , i = 1 , 2 , , n } is an i.i.d. Bernoulli random variable sequence with P ( B i = 1 | α ϕ ) = 1 P ( B i = 0 | α ϕ ) = α ϕ and α ϕ Beta ( τ α , τ ( 1 α ) ) , τ = ( 1 ϕ ) / ϕ , { B i , i = 1 , 2 , , X } is independent of X.
As discussed in Weiß [10], the BAR(1) model, DDBAR(1) model, and BBAR(1) model can be interpreted as a system with n mutually independent units and each unit being either in state “1” or state “0”. Assume X t is the number of units being in state “1” at time t. Then α X t 1 ( α t X t 1 or α ϕ X t 1 ) is the number of units still in state “1” at time t with survival probability α (random survival probability α t or α ϕ ), β ( n X t 1 ) ( β t ( n X t 1 ) or β ϕ ( n X t 1 ) ) is the number of units, which moved from state “0” to state “1” at time t with revival probability β (random revival probability β t or β ϕ ). It is worth mentioning that all of BAR(1), DDBAR(1), and BBAR(1) models are aimed at a system with n independent units, but not a system with n dependent units, i.e., the counting series in “∘” is independent and identically distributed, but not dependent. To solve this dilemma, Kang et al. [11] proposed a generalized binomial AR (GBAR) model based on the generalized binomial thinning operator “ α θ ”, which is proposed by Ristić et al. [12] and given as follows
α θ X = i = 1 X U i ,
where U i = ( 1 V i ) W i + V i Z , { W i } and { V i } are two independent random sequences of iid random variables with Bernoulli( α ) and Bernoulli( θ ) distributions, Z is a Bernoulli( α ) random variable and is responsible for the cross-dependence, i , j = 1 , 2 , . . . , X , { W i } , { V j } and Z are mutually independent and each of them is independent of X.
Unfortunately, the GBAR model [11] can not use to analyze under-dispersed or equi-dispersed bounded data. To fill this gap, we are inspired by the Conway–Maxwell–Poisson-binomial (CMPB) distribution [13] and construct the Conway–Maxwell–Poisson-binomial thinning operator, whose counting series is exchangeablility. Furthermore, we propose a new Conway–Maxwell–Poisson-binomial autoregressive (CMPBAR) model, which not only allows us to analyze bounded data with over-dispersion but also allows us to model bounded data with equi-dispersion or under-dispersion. The second contribution of this paper is that we discuss the CML estimation of the parameters involved in the new model, and establish the asymptotic normality of the CML estimator. To illustrate that the new model is more flexible and superior, we apply the new model on the weekly rainy days at Hamburg–Neuwiedenthal in Germany.
The paper is organized as follows. Section 2 first gives a brief review of the Conway–Maxwell–Poisson-binomial distribution, then gives the definition of the exchangeable Conway–Maxwell–Poisson-binomial thinning operator and that of the Conway–Maxwell–Poisson-binomial AR model. The conditional maximum likelihood estimation and its asymptotic properties are established in Section 3. Section 4 gives a simulation study and Section 5 gives real data to show the better performance of the new model. Conclusions are made in Section 6.

2. Model Formulation and Stability Properties

2.1. Conway–Maxwell–Poisson-Binomial Distribution

For readability, we first give a brief review of the CMPB distribution introduced by Shmueli et al. [13].
A random variable X taking values in { 0 , 1 , 2 , , n } is said to follow the Conway–Maxwell–Poisson-binomial distribution with parameters ( α , ν ) , if the probability mass function (pmf) of X takes the form P ( X = x | α , ν , n ) = n x ν α x ( 1 α ) n x / Z ( α , ν ) , where Z ( α , ν ) = x = 0 n n x ν α x ( 1 α ) n x , 0 < α < 1 , ν R and n N is the predetermined upper limit of the range.
For simplicity, we write X CMPB ( n , α , ν ) . Denote θ = α / ( 1 α ) , the pmf of X can be rewritten as
P ( X = x | θ , ν , n ) = 1 S ( θ , ν ) n x ν θ x ,
where S ( θ , ν ) = x = 0 n n x ν θ x , θ > 0 and n N is the predetermined upper limit of the range. Therefore, we obtain the moment-generating function of X as M X ( s ) = E ( e s X ) = S ( θ e s , ν ) S ( θ , ν ) . Furthermore,
E ( X ) = θ S ( θ , ν ) S ( θ , ν ) , Var ( X ) = θ S ( θ , ν ) S ( θ , ν ) + θ 2 S ( θ , ν ) S ( θ , ν ) S ( θ , ν ) S ( θ , ν ) 2 , BID = n Var ( X ) E ( X ) n E ( X ) = S ( θ , ν ) S ( θ , ν ) + θ S ( θ , ν ) S ( θ , ν ) θ ( S ( θ , ν ) ) 2 n S ( θ , ν ) S ( θ , ν ) θ ( S ( θ , ν ) ) 2 ,
where S ( θ , ν ) = S ( θ , ν ) / θ and S ( θ , ν ) = S ( θ , ν ) / θ (see Shmueli et al. [13], Borges et al. [14], Daly and Gaunt [15], and Kadane [16] for more detailed discussion).
Unfortunately, the specific range of the BID for the CMPB distribution can not be obtained by (4). To solve this dilemma, we give an example in Figure 1 with n = 7 , when α and ν are varying from { 0.1 , 0.2 , 0.3 , , 0.9 } and { 2 , 1.5 , 0.5 , 0 , 0.5 , 1 , 1.5 , 2 , 2.5 } , respectively.
From Figure 1, the BID of the CMPB distribution takes a value, which may be less than 1, equal to 1, or greater than 1 for different values α and ν . Additionally, it implies that the CMPB distribution allows us to analyze bounded time series counts with under-dispersion, equi-dispersion, and over-dispersion.
To further explore the dynamic change of the BID with α varying from { 0.1 , 0.2 , , 0.9 } for given n = 7 and ν = 0.5 , 0, 0.5, 1, 1.5, or 2, we present the plots of the BID in Figure 2.
From Figure 2, we obtain the following observations. First, if ν < 1 , the BID is no less than 1. To be precise, its BID is increasing to maximum when α is varying from 0 to 0.5, and then decreasing to 1 when α is varying from 0.5 to 1. Second, if ν = 1 , its BID = 1, for all α ( 0 , 1 ) . Third, if ν > 1 , its BID is no more than 1. Precisely, its BID is decreasing to the minimum when α is varying from 0 to 0.5, and then increasing to 1 when α is varying from 0.5 to 1. To sum up, the Conway–Maxwell–Poisson-binomial distribution allows under-dispersion, equi-dispersion, and over-dispersion for bounded time series data.
Remark 1.
By (3), the pmf of the CMPB ( n , α , ν ) is expressed as that of the power series distribution and if ν = 0 , P ( X = x | θ , ν , n ) = θ x / x = 0 n θ x , θ = α / ( 1 α ) , if ν = 1 , the CMPB ( n , α , ν ) reduces to binomial distribution with parameter α.

2.2. Conway–Maxwell–Poisson-Binomial Thinning Operator

By Shmueli et al. [13], the CMPB distribution is a distribution on the sum of n dependent Bernoulli components without specifying anything else about the joint distribution of those components. Precisely, if X CMPB ( n , α , ν ) , there exists a Bernoulli variable sequence { Z i } such that X = i = 1 n Z i , where
P z 1 , , z n : = P ( Z 1 = z 1 , , Z n = z n ) = 1 z 1 = 0 1 z n = 0 1 n x ν 1 θ x n x ν 1 θ x
with θ = α / ( 1 α ) , x = i = 1 n z i and ( z 1 , z 2 , , z n ) { 0 , 1 } n .
Definition 1.
Let θ = α / ( 1 α ) . Then the exchangeable Conway–Maxwell–Poisson-binomial thinning operator is introduced by
α ν X : = i = 1 X Z i ,
where X is a non-negative random variable, { Z i , i = 1 , 2 , , X } is an exchangeable Bernoulli variable sequence with its pmf taking the form (5) and independent of X.
To generate the random number of “ α ν X ”, we first let X = n , then α ν X | ( X = n ) CMPB ( n , α , ν ) . Therefore, E ( α ν X | X = n ) = θ S ( θ , ν ) / S ( θ , ν ) ,   Var ( α ν X | X = n ) = θ S ( θ , ν ) S ( θ , ν ) + θ 2 S ( θ , ν ) S ( θ , ν ) S ( θ , ν ) S ( θ , ν ) 2 and the conditional binomial index of dispersion (CBID) is CBID = S ( θ , ν ) S ( θ , ν ) + θ S ( θ , ν ) S ( θ , ν ) θ ( S ( θ , ν ) ) 2 n S ( θ , ν ) S ( θ , ν ) θ ( S ( θ , ν ) ) 2 , where S ( θ , ν ) = x = 0 n n x ν θ x , S ( θ , ν ) = S ( θ , ν ) / θ , and S ( θ , ν ) = S ( θ , ν ) / θ .
Second, we let θ = α / ( 1 α ) , then the pmf of α ν n takes the form (3). Third, we let θ = λ ν , λ > 0 . By (3), the pmf of the α ν n can be rewritten as
P ( α ν n = x ) = 1 U ( λ , ν ) n x λ x ν with U ( λ , ν ) = x = 0 n n x λ x ν .
Furthermore,
P ( α ν n = x + 1 ) = n x x + 1 λ ν P ( α ν n = x ) ,
by which an algorithm is used to generate a random number of α ν X with X = n can be expressed as follows.
Remark 2.
By Kadane [16], the counting series { Z i } in Definition 1 is a dependent Bernoulli variable sequence with exchangeability of order 2. To account for the concept of exchangeability, we assume π is a permutation of ( z 1 , z 2 , , z n ) . Then P z 1 , , z n = P π ( 1 , , z n ) . By the definition of exchangeability in Section 6 in Kadane [16], i = 1 n Z i is n-exchangeable. Kadane [16] stated that “de Finetti’s Theorem shows that sums of exchangeable random variables are mixtures of Binomial random variables. Because the marginal distribution of each component is Bernoulli, interest centers on the joint distribution of pairs of such variables”. By Theorem 4 in Kadane [16], n-exchangeability applies to every permutation of length n, it implies that n is exchangeable for each n < n . Hence, { Z i } is exchangeable with order 2 because every pair has the same distribution as every other pair, i.e., every pair of { Z 1 , Z 2 , , Z n } has the same distribution as every other pair and for any pair ( Z i , Z j ) , i , j = 1 , 2 , , n , and i j ,   P ( Z i = 0 , Z j = 1 ) = P ( Z i = 1 , Z j = 0 ) > 0 , P ( Z i = 0 , Z j = 0 ) + 2 P ( Z i = 0 , Z j = 1 ) + P ( Z i = 1 , Z j = 1 ) = 1 , P ( Z i = 1 , Z j = 1 ) > 0 , and P ( Z i = 0 , Z j = 0 ) > 0 ; see [16] for more discussion.

2.3. Binomial Autoregressive Model with the CMPB Operator

Now, we define the BAR(1) model with the CMPB operator by
X t = α ν X t 1 + β ν ( n X t 1 ) ,
where 0 < α < 1 , 0 < β < 1 , both α ν X t 1 = i = 1 X t 1 Z i and β ν ( n X t 1 ) = i = 1 n X t 1 W i are the CMPB thinning operators given in Definition 1, their counting series { Z i } and { W i } are the exchangeable Bernoulli variable sequence with their pmfs taking the form (5), { Z i } is independent of { W j } , i = 1 , 2 , , X t 1 , j = 1 , 2 , , ( n X t 1 ) , and all the thinnings at time t are independent of { X s , s < t } , n N , ν R .
For simplicity, we denote the new model as the CMPBAR(1) model. By (8), { X t } N is a Markov chain and its one-step transition probability takes the form
P η ( k | l ) = P ( X t = k | X t 1 = l ) = 1 S ( θ 1 , ν ) S ( θ 2 , ν ) i = 0 min { k , l } l i ν n l k i ν θ 1 i θ 2 k i ,
where S ( θ 1 , ν ) = i = 0 l l i ν θ 1 i and S ( θ 2 , ν ) = i = 0 n l n l i ν θ 2 i with η = ( θ 1 , θ 2 , ν ) and θ 1 = α / ( 1 α ) and θ 2 = β / ( 1 β ) .
Theorem 1.
If { X t } satisfies (8), then { X t } is ergodicity and strictly stationarity.
Proof. 
Similar to that of Theorem 1 in Kang et al. [11], the state space of { X t } is { 0 , 1 , , n } . Because P ( X t = i | X t 1 = j ) > 0 , i , j { 0 , 1 , , n } , so the state space of { X t } is an equivalence class. Furthermore, { X t } is an irreducible and aperiodic Markov chain; therefore, { X t } is ergodic with a unique stationary distribution by [17]. □
By Definition 1 and (8), for given X t 1 , { X t } given in (8) consists of two independent parts α ν X t 1 and β ν ( n X t 1 ) , where α ν X t 1 CMPB ( X t 1 , α , ν ) and β ν ( n X t 1 ) CMPB ( n X t 1 , β , ν ) . Denote θ 1 = α / ( 1 α ) and θ 2 = β / ( 1 β ) . Then
E ( X t | X t 1 ) = θ 1 S 1 / S 1 + θ 2 S 2 / S 2 , Var ( X t | X t 1 ) = θ 1 S 1 S 1 + θ 2 S 2 S 2 + θ 1 2 S 1 S 1 S 1 S 1 2 + θ 2 2 S 2 S 2 S 2 S 2 2 ,
and the conditional binomial index of dispersion (CBID) is
CBID = θ 1 2 S 2 2 S 1 S 1 ( S 1 ) 2 + θ 2 2 S 1 2 S 2 S 2 ( S 2 ) 2 + θ 1 S 1 S 1 S 2 2 + θ 2 S 2 S 2 S 1 2 n S 1 S 2 θ 1 S 1 S 2 θ 2 S 1 S 2 θ 1 S 2 S 1 + θ 2 S 1 S 2
where S 1 : = S 1 ( θ 1 , ν ) = x = 0 X t 1 X t 1 x ν θ 1 x , S 1 : = S 1 ( θ 1 , ν ) = S 1 ( θ 1 , ν ) / θ 1 , S 1 : = S 1 ( θ 1 , ν ) = S 1 ( θ 1 , ν ) / θ 1 , S 2 : = S 2 ( θ 2 , ν ) = x = 0 n X t 1 n X t 1 x ν θ 2 x , S 2 : = S 2 ( θ 2 , ν ) = S 2 ( θ 2 , ν ) / θ 2 , S 2 : = S 2 ( θ 2 , ν ) = S 2 ( θ 2 , ν ) / θ 2 .
Unfortunally, because of the complexity of S 1 ( θ 1 , ν ) and S 2 ( θ 2 , ν ) , we can not obtain the marginal distribution of { X t } and its the autocorrelation structure, including the E ( X t ) , Var ( X t ) , and BID. To resolve this dilemma, for given n = 10 , we create some plots of the BID (in Figure 3) by generating some samples from the CMPBAR(1) model with ν { 5 , 4.5 , 4 , , 4.5 , 5 } and sample size T = 500 , when ( α , β ) = (0.2, 0.2), (0.2, 0.5), (0.2, 0.6), (0.5, 0.6), i.e., ( θ 1 , θ 2 ) = (0.25, 0.25), (0.25, 1), (0.25, 1.5), (1, 1.5).
From Figure 3, we have the following observations. First, if ν < 1 , the BID of the CMPBAR(1) model is greater than 1, i.e., the CMPBAR(1) model allows us to analyze bounded integer-valued time series with overdispersion. Second, if ν > 1 , the BID of the CMPBAR(1) model is less than 1, i.e., the CMPBAR(1) model allows us to analyze bounded integer-valued time series with underdispersion. Third, if ν = 1 , the CMPBAR(1) model becomes to the BAR(1) given in (2) and its BID is equal to 1, i.e., equi-dispersed bounded integer-valued time series is allowed.

3. Parameter Estimation

In this section, we use the conditional maximum likelihood method to estimate the parameters (denoted as η = ( θ 1 , θ 2 , ν ) ) involving in the CMPBAR(1) model. Let { X 0 , X 1 , , X T } be a realization of { X t } , and generate by the CMPBAR(1) process based on Algorithm 1, where T N represents the size of sample.
Algorithm 1: Random number generation algorithm for the CMPB distribution
Step 1.
generate a random number u, u Uniform ( 0 , 1 ) ;
Step 2.
x = 0 , p = P ( α ν n = 0 | θ , ν , n ) , F = p , where P ( α ν n = 0 | θ , ν , n ) is given in (3);
Step 3.
if u < F , set α ν n = x and stop;
Step 4.
else p = p × n x x + 1 λ ν by (7), F = F + p , x = x + 1 ;
Step 5.
go to Step 3.
By using (9), the conditional log-likelihood function can be written as:
( η ) = t = 1 T log P η ( X t | X t 1 ) = t = 1 T log i = 0 m X t 1 i ν n X t 1 X t i ν θ 1 i θ 2 X t i log ( S ( θ 1 , ν ) ) log ( S ( θ 2 , ν ) ) ,
where S ( θ 1 , ν ) = i = 0 X t 1 i X t 1 ν θ 1 i and S ( θ 2 , ν ) = i = 0 n X t 1 i n X t 1 ν θ 2 i with m = min { X t , X t 1 } , θ 1 > 0 , θ 2 > 0 , and ν R . Then the CML estimate η ^ c m l is obtained by minimizing (10).
Assumption 1.
If there exists a t 1 , such that X t ( η ) = X t ( η 0 ) , P η 0 a.s., then η = η 0 , where P η 0 is the probability measure under the true parameter η 0 with η 0 = { θ 1 0 , θ 2 0 , ν 0 } .
Theorem 2.
Let { X t } be generalized by the CMPBAR(1) model. If Assumption 1 holds, there exists an estimator η ^ c m l such that
η ^ c m l a . s . η 0 a n d T ( η ^ c m l η 0 ) d N 0 , J 1 ( η 0 ) I ( η 0 ) J 1 ( η 0 ) , T ,
where I ( η 0 ) = E log P η 0 X t | X t 1 η log P η 0 X t | X t 1 η and J ( η 0 ) = E 2 ( η 0 ) η η .
Proof. 
To prove the consistence of η ^ c m l , we denote t ( η ) = log P η X t | X t 1 . Hence, ( η ) = t = 1 T t ( η ) . Similar to the first item of Theorem 4 in Chen et al. [18], we can verify that the assumptions of Theorem 4.1.2 in Amemiya [19] hold under Assumption 1, i.e., E t ( η ) attains a strict local maximum at η 0 ; therefore, there exists an estimator η ^ c m l such that η ^ c m l a . s . η 0 .
In the following, we prove the asymptotic normality of η ^ c m l . It is easy to see t ( η ) / θ 1 , t ( η ) / θ 2 , and t ( η ) / ν exist and are three times continuous differentiable in Θ . Thus, there exist a N ( η 0 ) such that 2 t ( η ) / ( η η ) attains the maximum value at η ˜ N ( η 0 ) . Therefore,
E sup η N ( η 0 ) 2 t ( η ) η η = E 2 t ( η ˜ ) η i η j < .
Similar to the second item of Theorem 4 in [18], we can prove that
T 1 t = 1 T 2 t ( η ) η η p E 2 t ( η 0 ) η η
by Theorem 4.1.3 in Amemiya [19]. Furthermore,
T 1 t = 1 T t ( η 0 ) / η p E ( t ( η 0 ) / η )
by using ergodic theorem. Using the Martingale central limit theorem and the Cramér device, it is direct to show that
T 1 / 2 ( η 0 ) / η d N ( 0 , I ( η 0 ) ) .
Then the asymptotic normal distribution of η ^ c m l is obtained based on the Taylor series expansion of ( η ^ c m l ) / η around η 0 . □

4. Simulation

In this section, we conduct a simulation study to illustrate the large sample property of the CMPBAR(1) model.
In the simulation, we fix n = 10 , let sample size T = 100 , 300 , 500 , and use the optim function in R to optimize ( η ) in (10). To check the finite sample performance, we use the following parameter combinations of ( θ 1 , θ 2 , ν ) as
( A 1 ) = ( 0.25 , 0.25 , 0.5 ) , ( A 2 ) = ( 0.25 , 1 , 0.5 ) , ( A 3 ) = ( 0.25 , 1.5 , 0.5 ) , ( A 4 ) = ( 1 , 1.5 , 0.5 ) , ( B 1 ) = ( 0.25 , 0.25 , 1 ) , ( B 2 ) = ( 0.25 , 1 , 1 ) , ( B 3 ) = ( 0.25 , 1.5 , 1 ) , ( B 4 ) = ( 1 , 1.5 , 1 ) , ( C 1 ) = ( 0.25 , 0.25 , 1.5 ) , ( C 2 ) = ( 0.25 , 1 , 1.5 ) , ( C 3 ) = ( 0.25 , 1.5 , 1.5 ) , ( C 4 ) = ( 1 , 1.5 , 1.5 ) ,
where ν = 0.5 , 1 and 1.5 to reflect overdispersion, equidispersion, and underdispersion, respectively.
For the simulated sample, performances of mean and standard deviation (sd) are given. For a scale parameter φ , sd = 1 m 1 i = 1 m ( φ ^ i φ ) 2 , where φ ^ i is the estimator of φ in the ith replication and m = 10 , 000 . Summaries of the simulation results are given in Table 1, Table 2 and Table 3.
To illustrate the consistency and the asymptotic normality of the CML estimators, we present the boxplots of the CML estimates for (A1), (B1), and (C1) in Figure 4, Figure 5, and Figure 6, and their qqplots with T = 500 in Figure 7, Figure 8, and Figure 9, respectively. Others are similar and we omit them.
These studies indicate that the CML method seems to perform reasonably well. First, Table 1, Table 2 and Table 3 show that the standard deviation of the CML estimator is decreasing with the sample size increase and the mean of the CML estimator is closer to the true parameter value in general cases. Second, Figure 4, Figure 5 and Figure 6 account for the location and dispersion of the estimates, all of which indicate the consistency of the estimators. Third, Figure 7, Figure 8 and Figure 9 indicate the asymptotic normality of the CML estimator.

5. Real Data Example

In this section, we consider the number of weekly rainy days for the period from 1 January 2005 to 31 December 2010 at Hamburg–Neuwiedenthal in Germany, where a week is defined as being from Saturday to Friday and n = 7 . The data were collected from the German Weather Service (http://www.dwd.de/, accessed on 12 December 2018). The sample path and the ACF and PACF plots of the observations are given in Figure 10 and Figure 11, respectively.
By computation, the sample mean and variance are 3.8371 and 3.6753, and the BID of the data is 1.2371, which implies the data exhibits extra-binomial variation. Hence, we use the CMPBAR(1) model, BAR(1) model [2], BBAR(1) model [9], and GBAR(1) model [11] to fit data by the CML method. We compare the estimated standard error (SE), −log-likelihood (−log-lik), Akaike’s information criterion (AIC) and Bayesian information criterion (BIC), which are summarized in Table 4, including the fitted results of the CML estimate.
From Table 4, the CMPBAR(1) model takes the smallest values of the −log-lik, AIC, and BIC. Hence, the CMPBAR(1) model might be more appropriate for the weekly rainy days.
To illustrate the adequacy of the CMPBAR(1) model, we consider the fitted Pearson residual analysis of the CMPBAR(1) model. By computation, the mean and variance of the fitted Pearson residual are 0.0760 and 1.0500 , respectively. The residual analysis in Figure 12 shows that this model performs rather well.
In addition, to further check the adequacy of the CMPBAR(1) model, we present the probability integral transform (PIT) (if the fitted model is adequate, its PIT histogram looks like that of a uniform distribution, see [10] for more discussion) in Figure 13 based on the fitted CMPBAR(1) model.
As can be seen in Figure 13, the PIT histogram of the CMPBAR(1) model is close to uniformity, i.e., the PIT histogram confirms that the fitted CMPBAR(1) model works reasonably well for the weekly rainy days.

6. Concluding Remarks

This paper considers a new CMPB thinning operator and proposes a new CMPBAR(1) model, which provides an available method to model bounded data with under-dispersion, equi-dispersion, and over-dispersion. We discuss some properties of the new model, the estimate of the parameters, and its large-sample properties. Simulations are conducted to examine the finite sample performance of estimators. A real data example is provided to illustrate the applicability of the CMPBAR(1) model.
There are several directions in which we plan to take this work forward. First, the random coefficient CMPBAR(1) model can be introduced by
X t = α t ν X t 1 + β t ν ( n X t 1 ) ,
where α t = α ( X t 1 / n ) and β t = β ( X t 1 / n ) , “ ν ” is the CMPB thinning operator and the counting series in “ α t ν ”, and that in “ β t ν ” is independent and all of the counting series at time t is independent of { X s , s < t } ; see Weiß and Pollett [8] for the random coefficient BAR(1) model. Second, a correlated sign-thinning operator can be established by
α ν X = sign ( α ) sign ( X ) i = 1 X Z i ,
where sign(x) = 1 if x 0 and sign(x)= 1 if x < 0 , { Z i , i = 1 , 2 , , X } is an exchangeable Bernoulli variable sequence with its pmf taking the form (5). Based on the correlated sign thinning operator, one can construct a Z -valued autoregressive model to analyze data with a range Z and under-dispersed, equi-dispersed, and over-dispersed. Third, a class of Conway–Maxwell–Poisson-binomial generalized autoregressive conditional heteroskedasticity models can be considered by
Z t | F t 1 CMPB ( n , α t , ν ) , α t = g η ( Z t 1 / n , α t 1 ) ,
where η is the parameter vector involving in the model (see Ristić et al. [20] and Chen et al. [18] for ARCH-type models, Lee and Lee [21] and Chen et al. [22] for GARCH-type models for bounded data). In addition, a semi-parameter version can be considered by
Z t | F t 1 CMPB ( n , α t , ν ) , α t = g η ( Z t 1 / n , α t 1 ) + f γ ( X t ) ,
where η is the parameter vector involved in the model, { X t } is the covariate process imposed in the observe process { Z t } , and γ is the parameter vector involving in f ( · ) .

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/e25010126/s1.

Author Contributions

Conceptualization, H.C.; methodology, H.C. and J.Z.; software, H.C. and J.Z.; validation, H.C., J.Z. and X.L.; formal analysis, H.C.; investigation, H.C.; resources, H.C.; data curation, H.C. and J.Z.; writing—original draft preparation, H.C.; writing—review and editing, H.C.; visualization, H.C., J.Z. and X.L.; supervision, H.C., J.Z. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

Chen’s work is supported by the Natural Science Foundation of Henan Province No. 222300420127 and Postdoctoral research in Henan Province No. 202103051. Liu’s work is supported by the Basic Research Programs of Shanxi Province No. 202103021223084.

Data Availability Statement

The weekly rainy days for the period from 1st January 2005 to 31st December 2010 at Hamburg–Neuwiedenthal in Germany is collected from the German Weather Service (http://www.dwd.de/ accessed on 12 December 2018), where a week is defined as being from Saturday to Friday and n = 7 and the data can be found in supplementary materials.

Acknowledgments

The authors thank the Editor-in-Chief and the anonymous referees for the valuable comments and suggestions that result in a substantial improvement of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Steutel, F.W.; van Harn, K. Discrete analogues of self-decomposability and stability. Ann. Probab. 1979, 7, 893–899. [Google Scholar] [CrossRef]
  2. McKenzie, E. Some simple models for discrete variate time series. Water Resour Bull. 1985, 21, 645–650. [Google Scholar] [CrossRef]
  3. Weiß, C.H. Monitoring correlated processes with binomial marginals. J. Appl. Stat. 2009, 36, 399–414. [Google Scholar] [CrossRef]
  4. Weiß, C.H. Jumps in binomial AR(1) processes. Stat. Probab. Lett. 2009, 79, 2012–2019. [Google Scholar] [CrossRef] [Green Version]
  5. Weiß, C.H. A new class of autoregressive models for time series of binomial counts. Commun. Stat.-Theory Methods 2009, 38, 447–460. [Google Scholar] [CrossRef]
  6. Weiß, C.H.; Kim, H.Y. Binomial AR(1) processes: Moments, cumulants, and estimation. Statistics 2013, 47, 494–510. [Google Scholar] [CrossRef]
  7. Weiß, C.H.; Kim, H.Y. Parameter estimation for binomial AR(1) models with applications in finance and industry. Stat. Pap. 2013, 54, 563–590. [Google Scholar] [CrossRef]
  8. Weiß, C.H.; Pollett, P.K. Binomial autoregressive processes with density dependent thinning. J. Time Ser. Anal. 2014, 35, 115–132. [Google Scholar] [CrossRef]
  9. Weiß, C.H.; Kim, H.Y. Diagnosing and modeling extra-binomial variation for time-dependent counts. Appl. Stoch. Model. Bus. Ind. 2014, 30, 588–608. [Google Scholar] [CrossRef]
  10. Weiß, C.H. An Introduction to Discrete-Valued Time Series; John Wiley & Sons: Chichester, UK, 2018. [Google Scholar]
  11. Kang, Y.; Wang, D.H.; Yang, K. Extended binomial AR(1) processes with generalized binomial thinning operator. Commun. Stat.-Theory Methods 2020, 49, 3498–3520. [Google Scholar] [CrossRef]
  12. Ristić, M.M.; Nastić, A.S.; Ilić, A.V.M. A geometric time series model with dependent Bernoulli counting series. J. Time Ser. Anal. 2013, 34, 466–476. [Google Scholar] [CrossRef]
  13. Shmueli, G.; Minka, T.P.; Kadane, J.B.; Borle, S.; Boatwright, P. A useful distribution for fitting discrete data: Revival of the Conway-Maxwell-Poisson distribution. Appl. Stat. 2005, 54, 127–142. [Google Scholar] [CrossRef]
  14. Borges, P.; Rodrigues, J.; Balakrishnan, N.; Bazn, J. A COM-Poisson type generalization of the binomial distribution and its properties and applications. Stat. Probab. Lett. 2014, 87, 158–166. [Google Scholar] [CrossRef]
  15. Daly, F.; Gaunt, R.E. The Conway-Maxwell-Poisson distribution: Distributional theory and approximation. ALEA Lat. Am. J. Probabability Math. Stat. 2016, 13, 635–658. [Google Scholar] [CrossRef]
  16. Kadane, J.B. Sums of possibly associated bernoulli variables: The Conway-Maxwell-Binomial distribution. Bayesian Anal. 2016, 11, 363–374. [Google Scholar] [CrossRef]
  17. Seneta, E. Non-Negative Matrices and Markov Chains, 2nd ed.; Springer: New York, NY, USA, 1983. [Google Scholar]
  18. Chen, H.; Li, Q.; Zhu, F. Two classes of dynamic binomial integer-valued ARCH models. Braz. J. Probab. Stat. 2020, 34, 685–711. [Google Scholar] [CrossRef]
  19. Amemiya, T. Advanced Econometrics; Harvard University Press: Cambridge, UK, 1985; pp. 110–112. [Google Scholar]
  20. Ristić, M.M.; Weiß, C.H.; Janjić, A.D. A binomial integer-valued ARCH model. Int. J. Biostat. 2016, 12, 20150051. [Google Scholar] [CrossRef] [PubMed]
  21. Lee, Y.; Lee, S. CUSUM test for general nonlinear integer–valued GARCH models: Comparison study. Ann. Inst. Stat. Math. 2019, 71, 1033–1057. [Google Scholar] [CrossRef]
  22. Chen, H.; Li, Q.; Zhu, F. A new class of integer-valued GARCH models for time series of bounded counts with extra-binomial variation. AStA Adv. Stat. Anal. Vol. 2022, 106, 243–270. [Google Scholar] [CrossRef]
Figure 1. Plot of the BID of the CMPB distribution for different choices of α and ν .
Figure 1. Plot of the BID of the CMPB distribution for different choices of α and ν .
Entropy 25 00126 g001
Figure 2. Plots of the BID of the CMPB distribution for different choices of α .
Figure 2. Plots of the BID of the CMPB distribution for different choices of α .
Entropy 25 00126 g002
Figure 3. Plots of BID of the CMPBAR model.
Figure 3. Plots of BID of the CMPBAR model.
Entropy 25 00126 g003
Figure 4. Boxplots of the CML estimates for (A1).
Figure 4. Boxplots of the CML estimates for (A1).
Entropy 25 00126 g004
Figure 5. Boxplots of the CML estimates for (B1).
Figure 5. Boxplots of the CML estimates for (B1).
Entropy 25 00126 g005
Figure 6. Boxplots of the CML estimates for (C1).
Figure 6. Boxplots of the CML estimates for (C1).
Entropy 25 00126 g006
Figure 7. qqplots of the CML estimates for (A1) with T = 500 .
Figure 7. qqplots of the CML estimates for (A1) with T = 500 .
Entropy 25 00126 g007
Figure 8. qqplots of the CML estimates for (B1) with T = 500 .
Figure 8. qqplots of the CML estimates for (B1) with T = 500 .
Entropy 25 00126 g008
Figure 9. qqplots of the CML estimates for (C1) with T = 500 .
Figure 9. qqplots of the CML estimates for (C1) with T = 500 .
Entropy 25 00126 g009
Figure 10. Path of the weekly rainy days.
Figure 10. Path of the weekly rainy days.
Entropy 25 00126 g010
Figure 11. ACF and PACF plots of the weekly rainy days. (1) shows that the ACF exhibits significant value for lag 1, and (2) presents that the PACF indicates an AR(1)-like autocorrelation structure.
Figure 11. ACF and PACF plots of the weekly rainy days. (1) shows that the ACF exhibits significant value for lag 1, and (2) presents that the PACF indicates an AR(1)-like autocorrelation structure.
Entropy 25 00126 g011
Figure 12. Pearson residual analysis of the weekly rainy days. (1) ACF (2) PACF.
Figure 12. Pearson residual analysis of the weekly rainy days. (1) ACF (2) PACF.
Entropy 25 00126 g012
Figure 13. PIT histogram based on the fitted CMPBAR(1) model.
Figure 13. PIT histogram based on the fitted CMPBAR(1) model.
Entropy 25 00126 g013
Table 1. Mean and sd in parentheses of estimates for (A1)–(A4).
Table 1. Mean and sd in parentheses of estimates for (A1)–(A4).
100300500
(A1) = (0.25, 0.25, 0.5)
θ 1 0.2336 (0.1425)0.2435 (0.0881)0.2471 (0.0683)
θ 2 0.2408 (0.0829)0.2467 (0.0498)0.2479 (0.0391)
ν 0.5682 (0.2484)0.5231 (0.1371)0.5135 (0.1065)
(A2) = (0.25, 1, 0.5)
θ 1 0.2420 (0.0847)0.2471 (0.0477)0.2483 (0.0369)
θ 2 1.0058 (0.0935)1.0022 (0.0510)1.0010 (0.0390)
ν 0.5236 (0.1353)0.5070 (0.0742)0.5044 (0.0567)
(A3) = (0.25, 1.5, 0.5)
θ 1 0.2450 (0.0644)0.2483 (0.0374)0.2490 (0.0288)
θ 2 1.5283 (0.1677)1.5092 (0.0936)1.5053 (0.0710)
ν 0.5269 (0.1505)0.5072 (0.0821)0.5046 (0.0628)
(A4) = (1, 1.5, 0.5)
θ 1 1.0032 (0.1132)1.0002 (0.0622)1.0005 (0.0481)
θ 2 1.5446 (0.2335)1.5176 (0.1389)1.5097 (0.1066)
ν 0.5246 (0.1336)0.5087 (0.0755)0.5052 (0.0585)
Table 2. Mean and sd in parentheses of estimates for (B1)–(B4).
Table 2. Mean and sd in parentheses of estimates for (B1)–(B4).
100200500
(B1) = (0.25, 0.25, 1)
θ 1 0.2442 (0.1286)0.2475 (0.0755)0.2487 (0.0586)
θ 2 0.2484 (0.0693)0.2497 (0.0402)0.2496 (0.0313)
ν 1.0484 (0.2317)1.0152 (0.1288)1.0094 (0.0997)
(B2) = (0.25, 1, 1)
θ 1 0.2483 (0.0906)0.2491 (0.0508)0.2496 (0.0393)
θ 2 1.0114 (0.1667)1.0033 (0.0906)1.0016 (0.0692)
ν 1.0390 (0.2070)1.0130 (0.1140)1.0083 (0.0873)
(B3) = (0.25, 1.5, 1)
θ 1 0.2507 (0.0770)0.2497 (0.0440)0.2499 (0.0339)
θ 2 1.5215 (0.2412)1.5097 (0.1417)1.5053 (0.1082)
ν 1.0409 (0.2167)1.0128 (0.1201)1.0084 (0.0922)
(B4) = (1, 1.5, 1)
θ 1 1.0219 (0.1985)1.0042 (0.1127)1.0028 (0.0876)
θ 2 1.5420 (0.3113)1.5251 (0.2070)1.5151 (0.1632)
ν 1.0318 (0.1883)1.0114 (0.1057)1.0067 (0.0820)
Table 3. Mean and sd in parentheses of estimates for (C1)–(C4).
Table 3. Mean and sd in parentheses of estimates for (C1)–(C4).
100200500
(C1) = (0.25, 0.25, 1.5)
θ 1 0.2563 (0.1402)0.2517 (0.0784)0.2514 (0.0611)
θ 2 0.2550 (0.0732)0.2513 (0.0435)0.2506 (0.0336)
ν 1.5431 (0.2529)1.5169 (0.1553)1.5103 (0.1191)
(C2) = (0.25, 1, 1.5)
θ 1 0.2586 (0.1141)0.2524 (0.0620)0.2515 (0.0479)
θ 2 1.0332 (0.2637)1.0094 (0.1449)1.0052 (0.1120)
ν 1.5408 (0.2482)1.5157 (0.1497)1.5100 (0.1153)
(C3) = (0.25, 1.5, 1.5)
θ 1 0.2625 (0.1000)0.2523 (0.0559)0.2515 (0.0433)
θ 2 1.5186 (0.3340)1.5169 (0.2200)1.5100 (0.1730)
ν 1.5383 (0.2512)1.5167 (0.1531)1.5103 (0.1180)
(C4) = (1, 1.5, 1.5)
θ 1 1.0528 (0.2914)1.0134 (0.1701)1.0075 (0.1329)
θ 2 1.5339 (0.3820)1.5310 (0.2724)1.5221 (0.2243)
ν 1.5398 (0.2350)1.5161 (0.1396)1.5100 (0.1082)
Table 4. Estimates for the weekly rainy days and SE are shown in parentheses.
Table 4. Estimates for the weekly rainy days and SE are shown in parentheses.
ModelEstimates−log-likAICBIC
π ^ ρ ^
BAR(1)0.54760.1323 691.54001387.08001394.5720
(0.0122)(0.0325)
π ^ ρ ^ ϕ ^
BBAR(1)0.54750.14080.2827623.66171253.332331264.5619
(0.0177)(0.0507)(0.0320)
π ^ ρ ^ ϕ ^
GBAR(1)0.54930.13960.5209625.49581256.99161268.2303
(0.0169)(0.0492)(0.0279)
θ ^ 1 θ ^ 2 ν ^
CMPBAR(1)1.23130.95470.0995622.66691251.33371262.5723
(0.0627)(0.0532)(0.0681)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, H.; Zhang, J.; Liu, X. A Conway–Maxwell–Poisson-Binomial AR(1) Model for Bounded Time Series Data. Entropy 2023, 25, 126. https://doi.org/10.3390/e25010126

AMA Style

Chen H, Zhang J, Liu X. A Conway–Maxwell–Poisson-Binomial AR(1) Model for Bounded Time Series Data. Entropy. 2023; 25(1):126. https://doi.org/10.3390/e25010126

Chicago/Turabian Style

Chen, Huaping, Jiayue Zhang, and Xiufang Liu. 2023. "A Conway–Maxwell–Poisson-Binomial AR(1) Model for Bounded Time Series Data" Entropy 25, no. 1: 126. https://doi.org/10.3390/e25010126

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop