Next Article in Journal
Cogging Torque Reduction in PMSM in Wide Temperature Range by Response Surface Methodology
Previous Article in Journal
Generalized Higher Order Preinvex Functions and Equilibrium-like Problems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimation and Hypothesis Testing for the Parameters of Multivariate Zero Inflated Generalized Poisson Regression Model

by
Dewi Novita Sari
1,2,
Purhadi Purhadi
1,*,
Santi Puteri Rahayu
1 and
Irhamah Irhamah
1
1
Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia
2
BPS—Statistics of Banyumas Regency, Purwokerto 53114, Indonesia
*
Author to whom correspondence should be addressed.
Symmetry 2021, 13(10), 1876; https://doi.org/10.3390/sym13101876
Submission received: 13 August 2021 / Revised: 23 September 2021 / Accepted: 28 September 2021 / Published: 5 October 2021
(This article belongs to the Section Mathematics)

Abstract

:
We propose a multivariate regression model called Multivariate Zero Inflated Generalized Poisson Regression (MZIGPR) type II. This model further develops the Bivariate Zero Inflated Generalized Poisson Regression (BZIGPR) type II. This study aims to develop parameter estimation, test statistics, and hypothesis testing, both simultaneously and partially, for significant parameters of the MZIGPR model. The steps of the EM algorithm for obtaining the parameter estimator are also described in this article. We use Berndt–Hall–Hall–Hausman (BHHH) numerical iteration to optimize the EM algorithm. Simultaneous testing is carried out using the maximum likelihood ratio test (MLRT) and the Wald test to partially assess the hypothesis. The proposed MZIGPR model is then used to model the three response variables: the number of maternal childbirth deaths, the number of postpartum maternal deaths, and the number of stillbirths with four predictors. The units of observation are the sub-districts of the Pekalongan Residency, Indonesia. The indicate overdispersion in the data on the number of maternal childbirth deaths and stillbirths, and underdispersion in the data on the number of postpartum maternal deaths. The empirical studies show that the three response variables are significantly affected by all the predictor variables.

1. Introduction

The Poisson regression model is commonly used to analyze data in which the response variable follows a Poisson distribution. Even though the Poisson regression model has been widely applied in various disciplines, it involves an assumption that, in some cases, is difficult to realize—namely, equidispersion. Equidispersion is the condition wherein sample variance is equal to the mean. When the sample variance is greater (less) than the sample mean, it is called overdispersion (underdispersion). The use of Poisson regression on over- or underdispersed data tends to make the standard error and test statistics derived from the model inaccurate, resulting in invalid conclusions [1,2].
As alternatives, several distribution models have been developed, such as negative binomial distribution (NBD) and log-normal Poisson distribution, which can be used to resolve overdispersion. Alternative models for count data that can overcome both over-and underdispersion include double Poisson distribution, gamma count distribution, and generalized Poisson distribution (GPD). The first two distribution models are weak in their probability function, which is complex, and the variance and mean have no explicit form [3]. Research on GPD has developed it into generalized Poisson regression (GPR) known as the GP-1 regression model (or the classic GPR model [4]), GP-2 regression (or the restricted GPR model [5]), the bivariate GPR (BGPR) model [6], and the multivariate GPR (MGPR) model [7].
Another problem with Poisson regression is the quantity of response data with a value of zero (excess zero). Lambert [8] developed zero inflated Poisson distribution (ZIPD) to resolve this problem. The ZIP model assumes that the population, or observations, consist of two groups. A single observation is included in a group that is always zero (zero states) with a probability of p, and is included in a group with a zero value and a positive value in the count data generated by Poisson distribution (Poisson state) with a probability of 1-p. The development of the ZIP model has led to the emergence of several studies on bivariate ZIP (BZIP) and multivariate ZIP (MZIP) models [9,10,11].
The ZIPR model is the proper way to handle excess zero in the Poisson model, but is not suitable for controlling overdispersion. A study by Ridout et al. in 1998 showed that the results of ZIP parameter estimation are biased when there is an overdispersion of excess zero data. In such cases, the Zero Inflated Negative Binomial (ZINB) model performs better than the ZIP model with smaller AIC and BIC values [12]. However, the iteration technique used for parameter estimation in the ZINB model often fails to converge. Therefore, Famoye and Singh (2006) introduced the Zero Inflated Generalized Poisson (ZIGP) model as an alternative to the ZINB model [13]. Czado et al. [14] defined ZIGP distribution as an analog of ZIP distribution with additional zero inflation parameters.
The ZIGP regression model was formed based on ZIGP distribution. Thus, the ZIGP regression model can model count data with over/underdispersion and excess zero. The ZIGP model for univariate response has been applied in many aspects, such as domestic violence, entrepreneurship, biology, health, and zoology [1,13,14,15,16].
Zhang and Huang (2015) introduced the bivariate ZIGP (BZIGP), which is an extension of the univariate ZIGP. Zhang and Huang proposed type I and type II BZIGP distribution with a flexible correlation structure that can be used both when the correlation is positive or negative, and when there is over or underdispersion. The difference between the two marginal distributions of BZIGP gives the zero inflation parameters. BZIGP type I has the same zero inflation parameter on both marginals, whereas the two marginals of BZIGP type II have different zero inflation parameters [17].
BZIGP type I distribution was later developed into MZIGP type I distribution by Huang et al. in 2017. MZIGP type I was applied to data with two responses: the number of days children were inactive in the previous four weeks due to sickness, and the number of days children were bedridden in the previous four weeks due to sickness [18]. MZIGP type I distribution only considers zero and non-zero response pairs. This study proposes a multivariate regression model that can be used on count data with excess zeros and several types of dispersion, and allows for several combinations of response pairs. The proposed model is called a multivariate Zero Inflated Generalized Poisson Regression (MZIGPR) type II (henceforth MZIGPR(II)).
Based on the previously mentioned background, the aims of this study are: (i) to construct an MZIGPR(II) model, (ii) to obtain a parameters estimator, (iii) to test the significance of the model as well as the significance of the individual parameter, and (iv) to apply the MZIGPR(II) model to real data. The case study used here includes factors that influence the number of maternal childbirth deaths (Y1), the number of postpartum maternal deaths (Y2), and the number of stillbirths (Y3). The units of observation are 91 sub-districts in Pekalongan Residency, Indonesia. The predictor variables include the percentage of childbirths assisted by health workers, the percentage of TT2+ vaccination in pregnant women, the percentage of obstetric complications that were handled, and the ratio of midwives in the population (per 10,000).
The decline in maternal mortality and neonatal mortality has become a global issue because it arises in all countries worldwide, and not only in developing countries. The first and second targets of the third goal of the Sustainable Development Goals (SDGs) are to reduce the maternal mortality ratio to less than 70 deaths per 100,000 live births, and to reduce neonatal deaths to at least 12 deaths per 1000 live births, by 2030 [19]. Despite showing a decline, Indonesia’s maternal and neonatal mortality rates are still relatively high, at 305 maternal deaths per 100,000 live births and 15 neonatal deaths per 1000 live births [20,21]. Pekalongan Residency has more maternal deaths and stillbirths than other residencies in Central Java Province, Indonesia—24.0% and 20.79% [22]. To achieve the SDGs by 2030, greater efforts are required.
The discussion in this study is divided into several sections. In Section 2, we discuss the MZIGPR(II) model. Section 3 discusses the estimation of the parameters of the MZIGPR(II) model using the EM algorithm method, and the testing of the hypothesis of the MZIGPR(II) model using maximum likelihood ratio test (MLRT). The proposed approach is used to model data on the numbers of maternal childbirth deaths, postpartum maternal deaths, and stillbirths. The discussion and conclusions can be found in the last two sections.

2. Multivariate Zero Inflated Generalized Poisson Regression Type II (MZIGPR(II)) Model

A k-dimensional multivariate ZIGP (MZIGP(II)) has been constructed via a mixture of multivariate GP, univariate GP, and a distribution degenerate at the original point (0, 0, …,0). Suppose Y1 is a case of maternal childbirth death and Y2 is a case of stillbirth. The sub-districts can then be categorized as follows, based on death cases: (i) sub-districts that have no cases of childbirth maternal deaths and stillbirths (Y1 = 0, Y2 = 0), (ii) sub-districts with cases of childbirth maternal deaths, but no cases of stillbirths (Y1 > 0, Y2 = 0), (iii) sub-districts with no cases of childbirth maternal deaths, but stillbirths (Y1 = 0, Y2 > 0), and (iv) sub-districts with cases of childbirth maternal deaths and stillbirths (Y1 > 0, Y2 > 0). To obtain a k-dimensional ZIGP multivariate model, we use a mixture of a degenerate on the zero points (0, 0, …, 0), k distributions with univariate GP for one type of death and k − 1 zeros (for example, (GP λ 1 , φ 1 , 0, …, 0), …, (0,…, 0, GP λ k , φ k )), k k 1 2 = C 2 k distributions with bivariate GP for two types of death and k − 2 zeros, …, k distributions each with (k − 1)-dimensional GP for k − 1 types of death and 1 zero, and finally, a k-dimensional GP for all k types of death. Since the amount of mixture formed depends on the number of responses used, the following discussion will focus on a model with three responses.
This MZIGP(II) distribution is used mainly for situations in which the observed events are mostly zero. Therefore, if the observed events are mostly non-zero, then the k-dimensional Y > 0 will represent all mixtures of l-dimensional Y > 0 and k l dimensional Y = 0, where k l 2 [10]. Suppose Y 1 , Y 2 , Y 3 M Z I G P ( I I ) p 1 , p 2 , p 3 ; λ 1 , λ 2 , λ 3 , φ 1 , φ 2 , φ 3 . Thus, the pmf of MZIGPR(II) can be written as follows:
P Y 1 i = y 1 i , Y 2 i = y 2 i , Y 3 i = y 3 i = A i ,   if   y 1 i = 0 , y 2 i = 0 , y 3 i = 0 B i ,   if   y 1 i > 0 , y 2 i = 0 , y 3 i = 0 C i ,   if   y 1 i = 0 , y 2 i > 0 , y 3 i = 0 D i ,   if   y 1 i = 0 , y 2 i = 0 , y 3 i > 0 E i ,   if   y 1 i > 0 , y 2 i > 0 , y 3 i > 0
The joint pmf P Y 1 i = y 1 i , Y 2 i = y 2 i , Y 3 i = y 3 i in Equation (1) (which can be seen in detail in Appendix A), with functions μ k i and p k i , satisfies
μ k i = exp x i T β k p k i = exp x i T γ k 1 + exp x i T γ k   and   1 p k i = 1 1 + exp x i T γ k ,     k = 1 , 2 , 3 ;   i = 1 , 2 , , n
where φ is the dispersion parameter, γ is the zero inflation parameter, x i is the i-th row of the covariate matrix X β k is the q-dimensional regression parameter column vector, and γ k is the q-dimensional zero inflation parameter column vector.
The MZIGPR(II) model reduces to the MGPR model when the zero inflation parameter γ = 0 , and it reduces to the MZIPR model when the dispersion parameter φ = 0 .

3. Inference

3.1. Parameter Estimation of MZIGPR(II) Model

Let the random variable Y 1 i , Y 2 i , Y 3 i T i i d Z I G P γ 1 , γ 2 , γ 3 , β 1 , β 2 , β 3 , φ 1 , φ 2 , φ 3 for i = 1 , 2 , 3 , , n . Define
I 0 = i : y 1 i = 0 ,     y 2 i = 0 ,     y 3 i = 0 ;       i = 1 , 2 , 3 , , n I 1 = i : y 1 i > 0 ,     y 2 i = 0 ,     y 3 i = 0 ;       i = 1 , 2 , 3 , , n I 2 = i : y 1 i = 0 ,     y 2 i > 0 ,     y 3 i = 0 ;       i = 1 , 2 , 3 , , n I 3 = i : y 1 i = 0 ,     y 2 i = 0 ,     y 3 i > 0 ;       i = 1 , 2 , 3 , , n I 4 = i : y 1 i > 0 ,     y 2 i > 0 ,     y 3 i > 0 ;       i = 1 , 2 , 3 , , n
The number of elements of I 0 , I 1 , I 2 , I 3 , I 4 is n 0 , n 1 , n 2 , n 3 and n 4 , where n 4 = n n 0 n 1 n 2 n 3 . There is no limit to the sample size used, regardless of whether the sample sizes are the same or different. In this study, we used different sample sizes of n 0 , n 1 , n 2 , n 3 and n 4 .
Suppose θ = β 1 , β 2 , β 3 , γ 1 , γ 2 , γ 3 , φ 1 , φ 2 , φ 3 , η 12 , η 13 , η 23 ; then, the likelihood function of the observation data can be written as follows:
L ( θ Y o b s ) = i = 1 n f y 1 i , y 2 i , y 3 i = i I 0 A i i I 1 B i i I 2 C i i I 3 D i i I 4 E i
Parameter estimation is carried out using the EM algorithm. We employ a, b, c, and d as unobserved variables or latent variables.
The latent variable a divides n 0 into A 0 + A 1 + A 2 + A 3 + A 4 + A 5 + A 6 + A 7 , where A 7 = n 0 A 0 A 1 A 2 A 3 A 4 A 5 A 6 ; the latent variable b divides n 1 into B 0 + B 1 + B 2 + B 3 where B 3 = n 1 B 0 B 1 B 2 ; the latent variable c divides n 2 into C 0 + C 1 + C 2 + C 3 where C 3 = n 2 C 0 C 1 C 2 , and the latent variable d divides n 3 into D 0 + D 1 + D 2 + D 3 , where D 3 = n 3 D 0 D 1 D 2 . Thus, the distribution of a, b, c and d is
a Y o b s , θ Multinomial   n 0 ; f 0 f , f 1 f , f 2 f , f 3 f , f 4 f , f 5 f , f 6 f , f 7 f b Y o b s , θ Multinomial   n 1 ; h 0 h , h 1 h , h 2 h , h 3 h c Y o b s , θ Multinomial   n 2 ; j 0 j , j 1 j , j 2 j , j 3 j d Y o b s , θ Multinomial   n 3 ; l 0 l , l 1 l , l 2 l , l 3 l
where f = s = 0 7 f s ,   h = s = 0 3 h s ,   j = s = 0 3 j s ,   l = s = 0 3 l s and f s , h s , j s , l s refer to Appendix A (i)–(iv).
The likelihood function for complete data denoted by Y c o m = Y o b s , f 0 , f 1 , f 2 , f 3 , f 4 , f 5 , f 6 , f 7 , h 0 , h 1 , h 2 , h 3 , j 0 , j 1 , j 2 , j 3 , l 0 , l 1 , l 2 , l 3 is a function that will be maximized using the EM algorithm. This likelihood function is proportional to
L ( θ Y c o m ) f 0 a 0 f 1 a 1 f 2 a 2 f 3 a 3 f 4 a 4 f 5 a 5 f 6 a 6 f 7 a 7   x     i I 1 1 p 1 λ 1 y 1 1 + φ 1 y 1 y 1 1 y 1 ! exp λ 1 1 + φ 1 y 1   h 0 b 0 h 1 b 1 h 2 b 2 h 3 b 3 x     i I 2 1 p 2 λ 2 y 2 1 + φ 2 y 2 y 2 1 y 2 ! exp λ 2 1 + φ 2 y 2 j 0 c 0 j 1 c 1 j 2 c 2 j 3 c 3   x     i I 3 1 p 3 λ 3 y 3 1 + φ 3 y 3 y 3 1 y 3 ! exp λ 3 1 + φ 3 y 3   l 0 d 0 l 1 d 1 l 2 d 2 l 3 d 3   x     i I 4 1 p 1 1 p 2 1 p 3 k = 1 3 λ k y k 1 + φ k y k y k 1 y k ! exp λ k 1 + φ k y k Q
The ln-likelihood function L ( θ Y c o m ) = ( θ Y c o m ) is stated as follows:
( θ Y c o m ) = a 0 ln f 0 + a 1 ln f 1 + a 2 ln f 2 + a 3 ln f 3 + a 4 ln f 4 + a 5 ln f 5 + a 6 ln f 6 + a 7 ln f 7 + i I 1 ln 1 p 1 λ 1 y 1 1 + φ 1 y 1 y 1 1 y 1 ! exp λ 1 1 + φ 1 y 1 h 0 b 0 h 1 b 1 h 2 b 2 h 3 b 3 + i I 2 ln 1 p 2 λ 2 y 2 1 + φ 2 y 2 y 2 1 y 2 ! exp λ 2 1 + φ 2 y 2 j 0 c 0 j 1 c 1 j 2 c 2 j 3 c 3 + i I 3 ln 1 p 3 λ 3 y 3 1 + φ 3 y 3 y 3 1 y 3 ! exp λ 3 1 + φ 3 y 3 l 0 d 0 l 1 d 1 l 2 d 2 l 3 d 3 + i I 4 ln 1 p 1 1 p 2 1 p 3 k = 1 3 λ k y k 1 + φ k y k y k 1 y k ! exp λ k 1 + φ k y k Q
where λ k i = μ k i 1 + φ k μ k i , in which μ k i and p k i refer to Equation (2).
The solution of the first derivative of ( θ Y c o m ) concerning the parameters γ and θ does not have an explicit form. Therefore, the maximum likelihood estimator for the MZIGPR(II) model’s parameters is obtained through the EM algorithm with the Berndt–Hall–Hall–Hausman (BHHH) iteration. The E-step in the EM algorithm involves replacing a s , b s , c s and d s with a s c , b s c , c s c and d s c , where
a s c = E a s y i , θ c = n 0 f s f ,   s = 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 b s c = E b s y i , θ c = n 1 h s h ,   s = 0 , 1 , 2 , 3 c s c = E c s y i , θ c = n 2 j s j ,   s = 0 , 1 , 2 , 3 d s c = E d s y i , θ c = n 3 k s k ,   s = 0 , 1 , 2 , 3
The M-step in the EM algorithm is the step to update θ ^ c + 1 = θ ^ c H 1 θ ^ c g θ ^ c , where θ ^ c is the vector estimator at the c-th iteration and g θ ^ is the gradient vector. H θ is defined as H θ = i = 1 n g i θ g i T θ , where g i θ is the gradient vector at the i-th observation. The iteration will stop when θ ^ c converges or θ ^ c + 1 θ ^ c < ε , where ε is a very small positive real number.

3.2. Hypothesis Testing of the MZIGPR(II) Model

Hypothesis testing of the MZIGPR(II) model was conducted via both simultaneous and partial parameter testing. Simultaneous parameter testing is used to determine the significance of parameters simultaneously in the model. The hypothesis for the simultaneous testing of β and γ is as follows:
H 0 : β k 1 = = β k q = γ k 1 = = γ k q = 0 H 1 : at   least   one   β k r 0   and   γ k r 0 ;   k = 1 , 2 , , m ;   r = 1 , 2 , , q
The statistic test is G 2 = 2 ln L ω ^ L Ω ^ = 2 ln L Ω ^ ln L ω ^ . Reject the null hypothesis if G 2 > χ α , v 2 with significance level α , and v is the difference n Ω n ω .
A partial test is carried out to determine which parameters have a significant effect on the model. The hypothesis for the partial testing of β and γ is
  • Partial testing of β
    H 0 : β k r = 0 H 1 : β k r 0 ;   k = 1 , 2 , , m ;   r = 1 , 2 , , q
    The statistic test is Z b = β ^ k r s e β ^ k r . Reject the null hypothesis if Z b > Z α / 2 .
  • Partial testing of γ
    H 0 : γ k r = 0 H 1 : γ k r 0 ;   k = 1 , 2 , , m ;   r = 1 , 2 , , q
    The statistic test is Z g = γ ^ k r s e γ ^ k r . Reject the null hypothesis if Z g > Z α / 2 .

4. Application of Data

The proposed MZIGPR(II) model is applied to model the numbers of maternal childbirth deaths, postpartum maternal deaths, and stillbirths. The data used in this study are secondary data collected by the Central Java Provincial Health Office in 2017. There are seven regencies and municipalities with 91 sub-districts in the Karesidenan Pekalongan, Central Java.
This study employs three response variables (Y) and four predictor variables (X). The three response variables are the number of maternal childbirth deaths, represented as Y1, the number of postpartum maternal deaths, Y2, and the number of stillbirths, Y3. The four predictor variables are the percentage of childbirths assisted by medical personnel in a sub-district (X1), the percentage of TT2 + immunization in pregnant women in a sub-district (X2), the percentage of obstetric complications handled in a sub-district (X3), and the ratio of midwives in the population (per 10,000) in a sub-district (X4).
Table 1 shows that the mean values of the response variables Y1, Y2, and Y3 are 0.18, 0.67, and 2.76, respectively. As regards the data spread, measured by the coefficient of variation (CoV), the number of maternal childbirth deaths (Y1) showed the largest CoV, 300.88. This indicates that the variable Y1 is more heterogeneous than Y2 and Y3. The heterogeneity of Y1 is shown in Figure 1. Even though the histograms of Y1, Y2, and Y3 show asymmetrical curves, Y1′s curve looks more skewed to the right compared to the other two response variables.
In the assumed MZIGP(II) distribution, the response data with a value of zero consist of two states, namely, the zero state and the generalized Poisson (GP) state. Thus, the response data Y1, Y2, and Y3, which are zero Y k i = 0 ;   k = 1 , 2 , 3 ;     i = 1 , , n , also consist of two states. The first Y k i = 0 (zero states) means no maternal childbirth or postpartum deaths and no stillbirths in a sub-district. The second Y k i = 0 (GP state) means that there are maternal childbirth and postpartum deaths, and stillbirths, in a sub-district. However, during the last year when the data were collected, there were no maternal childbirth or postpartum deaths, or stillbirths. The zero value in the GP state is assumed to derive from a certain distribution; in this case, the MGP distribution.
MZIGPR(II) modeling including all predictor variables yields parameter estimates, as shown in Table 2. Hypothesis testing simultaneously on these parameters determines whether each predictor affects the response variable differently. Simultaneous testing gives the result that the statistic G 2 = 89092.513 is greater than χ 0.05 , 24 2 = 36.415 , so H 0 is rejected. The decision to reject H 0 indicates that at least one predictor variable affects the number of maternal deaths during childbirth and postpartum, as well as stillbirths.
Partial tests were performed to determine which variables had a significant influence on the model. Table 2 shows that all predictors significantly affect the three responses at a significance level of 0.05.

5. Discussion

There are two types of MZIGPR(II) models—the log model and the logit model. The log model states that the probability Y i of the GP state is affected by significant variables. In contrast, the logit model states that the probability Y i of the zero states is affected by significant variables. The GP state yields zero observations and positive integers generated by the GP distribution, whereas the zero states produce zero observations only.
The MZIGPR(II) model for the number of maternal deaths during childbirth or postpartum, and the number of stillbirths, based on the data of Table 2, is constructed as follows:
  • Regression model of maternal childbirth deaths:
    log μ ^ 1 i = 4.496 + 0.062 X 1 i 0.020 X 2 i 0.029 X 3 i 0.908 X 4 i logit p ^ 1 i = 2.091 0.012 X 1 i + 0.010 X 2 i + 0.008 X 3 i + 0.362 X 4 i
  • Regression model for postpartum maternal deaths:
    log μ ^ 2 i = 5.448 0.020 X 1 i 0.026 X 2 i + 0.062 X 3 i + 0.819 X 4 i logit p ^ 2 i = 1.424 0.124 X 1 i 0.051 X 2 i + 0.011 X 3 i + 0.197 X 4 i
  • Regression model for stillbirths:
    log μ ^ 3 i = 0.880 + 0.029 X 1 i + 0.014 X 2 i + 0.013 X 3 i 0.723 X 4 i logit p ^ 3 i = 1.207 0.087 X 1 i 0.058 X 2 i 0.016 X 3 i 0.064 X 4 i
The estimated dispersion parameters for φ 1 , φ 2 , and φ 3 are 1.091, −0.148 and 0.516, respectively, which means that the dispersion is not zero. The values of φ 1 and φ 3 indicate the existence of overdispersion in the data on the number of maternal childbirth deaths and stillbirths. In contrast, the values of φ 2 indicate underdispersion in the data on the number of postpartum maternal deaths.
The log and logit models in Equations (6)–(8) show that the variables that significantly affect the GP state also affect the zero state. Based on the regression coefficient of the MZIGPR model, there is a pattern of relationships between several predictor variables and responses that contradict the existing theory. For example, the log model shows that the variable coefficient X1 (the percentage of childbirths assisted by medical personnel) is positive. This means that every 1% increase in the number of childbirths assisted by medical personnel will increase the average number of maternal childbirth deaths (Y1) by 1.06 people and the average number of stillbirths (Y3) by 1.03 people, but decrease the average number of postpartum maternal deaths (Y2) by 0.98 people when the other predictor variables were held constant. Thus, X1 shows an inappropriate relationship pattern with Y1 and Y3.
The coefficient of variable X2 (the percentage of TT2+ vaccination in pregnant women) is negative. If all other variables are constant, then every 1% increase in pregnant women vaccinated with TT2+ will reduce the average number of maternal childbirth deaths (Y1) by 0.98 people and the average number of postpartum maternal deaths (Y2) by 0.97 people. Still, the number of stillbirths (Y3) increased by 1.01 people. In this case, X2 has an inappropriate relationship with Y3. The remaining predictor variables are interpreted in the same way as in the logit model for X1 and X2.
In the logit model, the X1 variable has a negative value. This means that every one percent increase in the number of childbirths assisted by medical personnel will reduce the average number of maternal survives during childbirth (Y1) by 0.99 people, the average number of maternal survives during postpartum (Y2) by 0.88 people, and the average number of live births (Y3) by 0.92 people, when the other predictor variables are held constant.
In the logit model, the variable X1 has a negative value. This means that for every one percent increase in medically assisted births, the average number of mothers who survive childbirth (Y1) decreases by 0.99 people. A one percent increase in X1 will also reduce the mean number of mothers who survive after birth (Y2) and the number of live births (Y3) by 0.88 and 0.92 people, respectively, if the other predictor variables are constant.
The relationship pattern between X1 and the three responses is inappropriate. The remaining variables in the logit model are interpreted in the same way as the X1 variable in the logit model.

6. Conclusions

In this study, we developed an MZIGPR(II) model, along with its parameter estimation and hypothesis testing. The parameter estimation is performed using the EM algorithm, followed by the BHHH iteration. The maximum likelihood ration test (MLRT) and Wald tests have been used to tests the significance of the model and the individual parameters, respectively. The proposed MZIGPR(II) model is applied to model the number of maternal childbirth deaths, the number of postpartum maternal deaths, and the number of stillbirths.
MZIGPR(II) can be used in cases of both overdispersion and underdispersion. We found underdispersion in the data on the number of postpartum maternal deaths and overdispersion in the data on the number of maternal childbirth deaths and stillbirths. The empirical results show that all four predictors affect the three response variables.
The main limitation of this study is that the data used are under-representational, because they were only collected at public health centers (puskesmas). Several variables thought to affect the numbers of maternal deaths during childbirth and postpartum, as well as stillbirths, were not used due to limited data availability at the sub-district level. In further research, the predictor variables used in this study can be replaced or added to other more relevant variables, according to the existing theory.
The MZIGPR(II) model’s global parameters can be used in all observation locations. Differences in regional characteristics and geographical conditions pertaining in each observation location make this global model less accurate. MZIGPR (II) modeling with spatial aspects can be undertaken in further research.

Author Contributions

Conceptualization, D.N.S., P.P., S.P.R. and I.I.; methodology, P.P. and S.P.R.; software, D.N.S. and I.I.; validation, P.P., S.P.R. and I.I.; formal analysis, D.N.S. and P.P.; investigation, D.N.S.; data curation, D.N.S.; writing—original draft preparation, D.N.S.; writing—review and editing, P.P., S.P.R. and I.I.; supervision, P.P., S.P.R. and I.I.; project administration, P.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Deputi Bidang Penguatan Riset dan Pengembangan, Ministry of Research and Technology/National Research and Innovation Agency (Kemenristek or RISTEK-BRIN), the Republic of Indonesia, via grant Penelitian Disertasi Doktor (PDD) in 2021 with grant number 3/E1/KP.PTNBH/2021.

Data Availability Statement

Dataset not publicly available.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Let Z ~ Bernoulli   1 p , and U 1 , U 2 , U 3 T ~ MGP λ 1 , λ 2 , λ 3 , φ 1 , φ 2 , φ 3 . The joint pmf of Y 1 , Y 2 , Y 3 T is given by P Y 1 = y 1 , Y 2 = y 2 , Y 3 = y 3 = P Z 1 U 1 = y 1 , Z 2 U 2 = y 2 , Z 3 U 3 = y 3 , so that,
  • If Y 1 = 0 ,   Y 2 = 0 ,   and   Y 3 = 0 , we have
    P Y 1 = 0 , Y 2 = 0 , Y 3 = 0 = P Z 1 = 0 , Z 2 = 0 , Z 3 = 0 + P Z 1 = 0 , Z 2 = 0 , Z 3 = 1 , U 3 = 0 + P Z 1 = 0 , Z 2 = 1 , U 2 = 0 , Z 3 = 0 + P Z 1 = 1 , U 1 = 0 , Z 2 = 0 , Z 3 = 0 + P Z 1 = 0 , Z 2 = 1 , U 2 = 0 , Z 3 = 1 , U 3 = 0 + P Z 1 = 1 , U 1 = 0 , Z 2 = 0 , Z 3 = 1 , U 3 = 0 + P Z 1 = 1 , U 1 = 0 , Z 2 = 1 , U 2 = 0 , Z 3 = 0 + P Z 1 = 1 , U 1 = 0 , Z 2 = 1 , U 2 = 0 , Z 3 = 1 , U 3 = 0 = f 0 + f 1 + f 2 + f 3 + f 4 + f 5 + f 6 + f 7
    where f 0 = p 1 p 2 p 3   f 1 = p 1 p 2 1 p 3 e λ 3   f 2 = p 1 p 3 1 p 2 e λ 2   f 3 = p 2 p 3 1 p 1 e λ 1 f 4 = p 1 1 p 2 1 p 3 e λ 2 λ 3 Q     f 5 = p 2 1 p 1 1 p 3 e λ 1 λ 3 Q     f 6 = p 3 1 p 1 1 p 2 e λ 1 λ 2 Q f 7 = 1 p 1 1 p 2 1 p 3 e λ 1 λ 2 λ 3 Q
  • If Y 1 > 0 ,   Y 2 = 0 ,   and   Y 3 = 0 , we have
    P Y 1 = y 1 , Y 2 = 0 , Y 3 = 0 = P Z 1 = 1 , U 1 = y 1 , Z 2 = 0 , Z 3 = 0 + P Z 1 = 1 , U 1 = y 1 , Z 2 = 1 , U 2 = 0 , Z 3 = 0 + P Z 1 = 0 , U 1 = y 1 , Z 2 = 1 , Z 3 = 0 , U 3 = 0   + P Z 1 = 0 , U 1 = y 1 , Z 2 = 0 , U 2 = 0 , Z 3 = 0 , U 3 = 0 = 1 p 1 λ 1 y 1 1 + φ 1 y 1 y 1 1 y 1 ! exp λ 1 1 + φ 1 y 1 h 0 + h 1 + h 2 + h 3
    where h 0 = p 2 p 3     h 1 = p 3 1 p 2 exp λ 2 Q     h 2 = p 2 1 p 3 exp λ 3 Q     h 3 = 1 p 2 1 p 3 exp λ 2 λ 3 Q
  • If Y 1 = 0 ,   Y 2 > 0 ,     and   Y 3 = 0 , we have
    P Y 1 = 0 , Y 2 = y 2 , Y 3 = 0 = P Z 1 = 0 , Z 2 = 1 , U 2 = y 2 , Z 3 = 0 + P Z 1 = 1 , U 1 = 0 , Z 2 = 1 , U 2 = y 2 , Z 3 = 0 + P Z 1 = 0 , Z 2 = 1 , U 2 = y 2 , Z 3 = 1 , U 3 = 0   + P Z 1 = 1 , U 1 = 0 , Z 2 = 1 , U 2 = y 2 , Z 3 = 1 , U 3 = 0 = 1 p 2 λ 2 y 2 1 + φ 2 y 2 y 2 1 y 2 ! exp λ 2 1 + φ 2 y 2 j 0 + j 1 + j 2 + j 3
    where j 0 = p 1 p 3     j 1 = p 3 1 p 1 exp λ 1 Q     j 2 = p 1 1 p 3 exp λ 3 Q     j 3 = 1 p 1 1 p 3 exp λ 1 λ 3 Q
  • If Y 1 = 0 ,   Y 2 = 0 ,     and   Y 3 > 0 , we have
    P Y 1 = 0 , Y 2 = 0 , Y 3 = y 3 = P Z 1 = 0 , Z 2 = 0 , Z 3 = 1 , U 3 = y 3 + P Z 1 = 1 , U 1 = 0 , Z 2 = 0 , Z 3 = 1 , U 3 = y 3 + P Z 1 = 0 , Z 2 = 1 , U 2 = 0 , Z 3 = 1 , U 3 = y 3   + P Z 1 = 1 , U 1 = 0 , Z 2 = 1 , U 2 = 0 , Z 3 = 1 , U 3 = y 3 = 1 p 3 λ 3 y 3 1 + φ 3 y 3 y 3 1 y 3 ! exp λ 3 1 + φ 3 y 3 l 0 + l 1 + l 2 + l 3
    where l 0 = p 1 p 2     l 1 = p 2 1 p 1 exp λ 1 Q     l 2 = p 1 1 p 2 exp λ 2 Q     l 3 = 1 p 1 1 p 2 exp λ 1 λ 2 Q
  • If Y 1 > 0 ,   Y 2 > 0 ,     and   Y 3 > 0 , we have
    P Y 1 = y 1 , Y 2 = y 2 , Y 3 = y 3 = P Z 1 = 1 , U 1 = y 1 , Z 2 = 1 , U 2 = y 2 , Z 3 = 1 , U 3 = y 3 = 1 p 1 1 p 2 1 p 3 k = 1 3 λ k y k 1 + φ k y k y k 1 y k ! exp λ k 1 + φ k y k Q
    for the case in which at least two of the y k ’s are greater than zero.
    where Q = 1 + η 12 exp y 1 g 1 exp y 2 g 2 + η 13 exp y 1 g 1 exp y 3 g 3 + η 23 exp y 2 g 2 exp y 3 g 3 , g k = E exp Y k = exp λ k t k 1 with ln t k φ k λ k t k 1 + 1 = 0 ;   k = 1 , 2 , 3 , μ k i = λ k i 1 φ k λ k i   or   λ k i = μ k i 1 + φ k μ k i .

References

  1. Ozmen, I.; Famoye, F. Count Regression Models with an Application to Zoological Data Containing Structural Zeros. Data Sci. J. 2007, 5, 491–502. [Google Scholar] [CrossRef]
  2. McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed.; Chapman and Hall: London, UK, 1989. [Google Scholar]
  3. Famoye, F. A New Bivariate Generalized Poisson Distribution. Stat. Neerl. 2010, 64, 112–124. [Google Scholar] [CrossRef]
  4. Consul, P.C.; Famoye, F. Generalized Poisson Regression Model. Commun. Stat. Theory Methods. 1992, 21, 89–109. [Google Scholar] [CrossRef]
  5. Wang, W.; Famoye, F. Modeling Household Fertility Decisions with Generalized Poisson Regression. J. Popul. Econ. 1997, 10, 273–283. [Google Scholar] [CrossRef] [PubMed]
  6. Famoye, F. Comparisons of Some Bivariate Regression Models. J. Stat. Comput. Simul. 2012, 82, 937–949. [Google Scholar] [CrossRef]
  7. Famoye, F. A Multivariate Generalized Poisson Regression Model. Commun. Stat. Theory Methods. 2015, 44, 497–511. [Google Scholar] [CrossRef]
  8. Jiang, X.; Paul, S.R. A Zero-Inflated Bivariate Poisson Regression Model and Application to Some Dental Epidemiological Data. Calcutta Stat. Assoc. Bull. 2009, 61, 241–244. [Google Scholar] [CrossRef]
  9. Dong, C.; Richards, S.H.; Clarke, D.B.; Zhou, X.; Ma, Z. Examining Signalized Intersection Crash Frequency Using Multivariate Zero-Inflated Poisson Regression. Saf. Sci. 2014, 70, 63–69. [Google Scholar] [CrossRef]
  10. Li, C.; Lu, J.; Park, J.; Kim, K.; Brinkley, P.A.; Peterson, J.P. Multivariate Zero-Inflated Poisson Models and Their Application. Technometrics 1999, 41, 29–38. [Google Scholar] [CrossRef]
  11. Liu, Y.; Tian, G.L. Type I Multivariate Zero-Inflated Poisson Distribution with Applications. Comput. Stat. Data Anal. 2015, 83, 200–222. [Google Scholar] [CrossRef]
  12. Ridout, M.; Demetrio, C.G.B.; Hinde, J. Models for Count Data with Many Zeros. In Proceedings of the XIXth International Biometric Conference, Cape Town, South Africa, 13–18 December 1998; pp. 179–192. [Google Scholar]
  13. Famoye, F.; Singh, K.P. Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data. Data Sci. J. 2006, 4, 117–130. [Google Scholar] [CrossRef]
  14. Czado, C.; Erhardt, V.; Min, A.; Wagner, S. Zero-inflated Generalized Poisson Models with Regression Effects on the Mean, Dispersion and Zero-inflation Level Applied to Patent Outsourcing Rates. Stat. Methodol. 2007, 7, 125–153. [Google Scholar] [CrossRef] [Green Version]
  15. Cui, Y.; Yang, W. Zero-inflated Generalized Poisson Regression Mixture Model for Mapping Quantitative Trait Loci Underlying Count Trait with Many Zeros. J. Theor. Biol. 2009, 256, 276–285. [Google Scholar] [CrossRef] [PubMed]
  16. Almasi, A.; Eshragian, M.R.; Moghimbeigi, A.; Rahimi, A.; Mohammad, K.; Fallahigilan, S. Multilevel Zero-inflated Generalized Poisson Regression Modelling for Dispersed Correlated Count Data. Stat. Methodol. 2016, 30, 1–14. [Google Scholar] [CrossRef]
  17. Zhang, C.; Tian, G.; Huang, X. Two New Bivariate Zero-Inflated Generalized Poisson Distribution with a Flexible Correlation Structure. Stat. Optim. Inf. Comput. 2015, 3, 105–137. [Google Scholar] [CrossRef] [Green Version]
  18. Huang, X.F.; Tian, G.L.; Zhang, C.; Jiang, X. Type I Multivariate Zero-Inflated Generalized Poisson Distribution with Applications. Stat. Its Interface 2017, 10, 291–311. [Google Scholar] [CrossRef]
  19. United Nations Development Programme. Indicators and Data Mapping to Measure Sustainable Development Goals (SDGs) Targets: Case of Indonesia 2015; UNDP: Jakarta, Indonesia, 2016. [Google Scholar]
  20. BPS-Statistics Indonesia. Profil Penduduk Indonesia Hasil SUPAS 2015; BPS-Statistics Indonesia: Jakarta, Indonesia, 2016.
  21. BKKBN. BPS-Statistics Indonesia, Kementerian Kesehatan and USAID. Survei Kesehatan dan Demografi Indonesia 2017; BKKBN: Jakarta, Indonesia, 2018.
  22. Dinas Kesehatan. Profil Kesehatan Provinsi Jawa Tengah Tahun 2017; Dinas Kesehatan: Semarang, Indonesia, 2018.
Figure 1. Histogram of Y1, Y2, and Y3.
Figure 1. Histogram of Y1, Y2, and Y3.
Symmetry 13 01876 g001
Table 1. Variable description.
Table 1. Variable description.
Variables (n = 91)MeanStandard DeviationCoefficient of VariationMin.Max.
The number of maternal childbirth deaths (Y1)0.180.53300.8804
The number of postpartum maternal deaths (Y2)0.670.96142.4903
The number of stillbirths (Y3)2.762.6094.13010
The percentage of medically assisted births (X1)97.754.014.1179.59100
The percentage of TT2+ vaccination in pregnant women (X2)78.2222.5128.780.64100
The percentage of obstetric complications handling (X3)30.179.631.8210.5761.61
The ratio of midwives per 10,000 population (X4)4.871.7936.681.8113.32
Table 2. The results of the estimation of the MZIGPR(II) model’s parameters.
Table 2. The results of the estimation of the MZIGPR(II) model’s parameters.
ParameterEstimateStandard ErrorZp-Value
The number of maternal childbirth deaths (Y1)
β 01 −4.4968.04 × 10−8−5.59 × 107p < 0.001
β 11 0.0628.05 × 10−67.73 × 103p < 0.001
β 21 −0.0203.97 × 10−6−5.07 × 103p < 0.001
β 31 −0.0291.80 × 10−6−1.63 × 104p < 0.001
β 41 −0.9082.35 × 10−7−3.87 × 106p < 0.001
γ 01 2.0912.13 × 10−79.84 × 106p < 0.001
γ 11 −0.0122.08 × 10−5−5.64 × 102p < 0.001
γ 21 0.0101.63 × 10−56.06 × 102p < 0.001
γ 31 0.0087.94 × 10−61.01 × 103p < 0.001
γ 41 0.3621.10 × 10−63.30 × 105p < 0.001
The number of postpartum deaths (Y2)
β 02 −5.4488.36 × 10−8−6.51 × 107p < 0.001
β 12 −0.0205.61 × 10−7−3.57 × 104p < 0.001
β 22 −0.0261.36 × 10−6−1.94 × 104p < 0.001
β 32 0.0629.51 × 10−66.53 × 103p < 0.001
β 42 −1.4244.87 × 10−13−2.93 × 1012p < 0.001
γ 02 −1.4244.87 × 10−13−2.93 × 1012p < 0.001
γ 12 −0.1244.58 × 10−11−2.71 × 109p < 0.001
γ 22 −0.0512.10 × 10−11−2.41 × 109p < 0.001
γ 32 0.0111.89 × 10−115.81 × 108p < 0.001
γ 42 0.1972.35 × 10−128.37 × 1010p < 0.001
The number of stillbirths (Y3)
β 03 −0.8802.71 × 10−6−3.25 × 105p < 0.001
β 13 0.0292.64 × 10−41.10 × 102p < 0.001
β 23 0.0141.91 × 10−47.24 × 101p < 0.001
β 33 0.0138.70 × 10−51.44 × 102p < 0.001
β 43 −0.7231.38 × 10−5−5.24 × 104p < 0.001
γ 03 1.2075.33 × 10−112.27 × 1010p < 0.001
γ 13 −0.0875.17 × 10−9−1.68 × 107p < 0.001
γ 23 −0.0582.44 × 10−9−2.37 × 107p < 0.001
γ 33 −0.0161.75 × 10−9−8.96 × 106p < 0.001
γ 43 −0.0642.67 × 10−10−2.38 × 108p < 0.001
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Sari, D.N.; Purhadi, P.; Rahayu, S.P.; Irhamah, I. Estimation and Hypothesis Testing for the Parameters of Multivariate Zero Inflated Generalized Poisson Regression Model. Symmetry 2021, 13, 1876. https://doi.org/10.3390/sym13101876

AMA Style

Sari DN, Purhadi P, Rahayu SP, Irhamah I. Estimation and Hypothesis Testing for the Parameters of Multivariate Zero Inflated Generalized Poisson Regression Model. Symmetry. 2021; 13(10):1876. https://doi.org/10.3390/sym13101876

Chicago/Turabian Style

Sari, Dewi Novita, Purhadi Purhadi, Santi Puteri Rahayu, and Irhamah Irhamah. 2021. "Estimation and Hypothesis Testing for the Parameters of Multivariate Zero Inflated Generalized Poisson Regression Model" Symmetry 13, no. 10: 1876. https://doi.org/10.3390/sym13101876

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop