A New Effective Jackknifing Estimator in the Negative Binomial Regression Model

Koç, Tuba; Koç, Haydar

doi:10.3390/sym15122107

Open AccessArticle

A New Effective Jackknifing Estimator in the Negative Binomial Regression Model

by

Tuba Koç

^*

and

Haydar Koç

Department of Statistics, Faculty of Science, Cankiri Karatekin University, Cankiri 18100, Turkey

^*

Author to whom correspondence should be addressed.

Symmetry 2023, 15(12), 2107; https://doi.org/10.3390/sym15122107

Submission received: 27 September 2023 / Revised: 15 November 2023 / Accepted: 21 November 2023 / Published: 23 November 2023

(This article belongs to the Section Mathematics)

Download Versions Notes

Abstract

:

The negative binomial regression model is a widely adopted approach when dealing with dependent variables that consist of non-negative integers or counts. This model serves as an alternative regression technique for addressing issues related to overdispersion in count data. Typically, the maximum likelihood estimator is employed to estimate the parameters of the negative binomial regression model. However, the maximum likelihood estimator can be highly sensitive to multicollinearity, leading to unreliable results. To eliminate the adverse effects of multicollinearity in the negative binomial regression model, we propose the use of a jackknife version of the Kibria–Lukman estimator. In this study, we conducted a theoretical comparison between the proposed jackknife Kibria–Lukman negative binomial regression estimator and several existing estimators documented in the literature. To assess the performance of the proposed estimator, we conducted two simulation studies and performed a real data application. The results from both the simulation studies and the real data application consistently demonstrated that the proposed jackknife Kibria–Lukman negative binomial regression estimator outperforms other estimators.

Keywords:

jackknife Kibria–Lukman estimator; Kibria–Lukman estimator; Liu-type estimator; ridge estimator; multicollinearity; negative binomial regression model

1. Introduction

Count data refers to data that represent the occurrence of events within a given period, taking on positive values indicating the frequency of those events. In the case of count data, the distribution typically exhibits right skewness. Due to the violation of normality assumptions, classical regression methods are unsuitable for count data analysis [1]. The use of classical methods often leads to biased parameter estimates. Consequently, researchers have focused on developing regression models specifically tailored for count data analysis, aiming to enhance data interpretation. Several regression models are commonly employed to analyze count data, including the Poisson regression, negative binomial regression (NBR), zero-inflated regression, zero-truncated regression, and hurdle regression models [2]. Count data often exhibit overdispersion, which violates the assumptions of the Poisson regression model. Using the Poisson regression model for overdispersed data results in biased parameter estimates and unreliable outcomes. To address this, the negative binomial regression model is frequently utilized as it assumes a negative binomial distribution for the response variable. The logarithm of the expected value of the response variable can be modeled using a linear combination of unknown parameters [3]. The negative binomial regression model finds application across various fields, such as health, actuarial science, education, economics, biostatistics, and healthcare. The issue of multicollinearity arises when there is a relationship between some of the variables under study, leading to errors induced by the researcher [4]. Maximum likelihood estimation (MLE) applied to a model with multicollinearity can result in the incorrect estimation of regression coefficients, sign mismatches between correlation coefficients and regression coefficients, inflated standard error values, and decreased model power. Various approaches can be employed to mitigate multicollinearity, such as adding new variables or removing multicollinear variables from the model. However, removing variables from the model risks discarding variables that truly contribute to the model, resulting in a loss of information. Alternative methods, such as the ridge and Liu estimators, are commonly employed to estimate regression coefficients without removing variables, albeit at the expense of introducing bias to the estimates. The ridge and Liu estimators have been developed as an alternative to MLE. Both estimators have a smaller mean square error (MSE) than MLE [5]. Månsson [6] suggested the negative binomial ridge regression estimator (RNBR) as an alternative to the MLE method in the NBR. Månsson [7] proposed a Liu estimator to solve an inflated mean square error problem of the maximum likelihood method in the presence of multicollinearity in the NBR. Asar [8] proposed a new biased estimator for the negative binomial regression model, which is a generalization of the Liu-type estimator. Türkan and Özel [9] proposed the modified jackknife estimator in the NBR. Alobaidi et al. [10] proposed a new ridge estimator in the NBR to address the multicollinearity problem. Duran and Akdeniz [11] proposed a new estimator named the modified jackknife generalized Liu-type estimator. This estimator is based on a combination of both the generalized Liu estimator and the jackknife generalized Liu estimator. Arum et al. [12] combined the modified jackknife ridge estimator with the transformed M-estimator to solve the problem of both multicollinearity and outliers in the Poisson regression model. As a result, they showed that the method they proposed was dominated by the existing methods. Akram et al. [13] proposed a new estimator, the zero-inflated negative binomial Stein estimator, to alleviate the effects of the multicollinearity problem in zero-inflated negative binomial regression models. According to the simulation and application results, they showed that the proposed new estimator is superior to the existing estimators. Dawoud et al. [14] proposed new robust estimators to handle multicollinearity and outliers in the Poisson Model. Akram et al. [15] proposed a new Conway–Maxwell–Poisson (COMP) Liu estimator for the COMP regression model to address the problem of multicollinearity. This estimator covers situations of overdispersion, equal dispersion, and underdispersion. Amin et al. [16] proposed a new adjusted Poisson Liu estimator as a robust solution for dealing with multicollinear explanatory variables in the Poisson regression. The adjusted Poisson Liu estimator addresses the problem of negative shrinkage parameter values encountered by the traditional Liu estimator. Through simulation studies and empirical applications, the adjusted Poisson Liu estimator is shown to outperform the MLE and other competing estimators in terms of robustness and consistency. Sami et al. [17] proposed several ridge regression estimators in the context of the COMP regression model to overcome the effects of multicollinearity, considering overdispersion, equidispersion, and underdispersion. Amin et al. [18] proposed a Poisson James–Stein estimator as a solution to the problems of increased variance and standard error associated with the MLE when dealing with multicollinear explanatory variables in the Poisson regression. Amin et al. [19] introduced the Bell Ridge regression estimator as a solution to mitigate the multicollinearity issue in the Bell regression model. Batool et al. [20] suggested a ridge estimator as a method to address the issue of overdispersion in the Poisson inverse Gaussian regression model. Abonazel [21] proposed a new modified Liu estimator for the COMP regression model based on two shrinkage parameters to reduce the multicollinearity effect in the COMP and used the mean square error criterion to evaluate the performance of the proposed estimator. Algamal et al. [22] proposed a jackknife version of the ridge estimator for the Bell regression model by jackknifing the ridge estimator to reduce bias. As a result, they showed that their proposed estimator improved the performance of the ridge estimator. Algamal et al. [23] proposed a modified jackknife ridge estimator for the COMP model. As a result, they found that their proposed estimator had minimum bias and minimum mean square error.

In recent years, Kibria and Lukman [24] developed the Kibria–Lukman (KL) estimator, which was found to outperform the ridge and Liu estimators in the linear regression model. Lukman et al. [25] proposed a new biased estimator called the KL estimator for linear mixed models. Aladeitan et al. [26] proposed a modified KL estimator in the Poisson regression model. Ugwuowo et al. [27] proposed the jackknife version of the KL estimator in the linear regression model. Rasheed et al. [28] proposed the jackknife approach and its modified version to reduce the effects of multicollinearity and the bias of using the Liu-type estimator simultaneously in the COMP regression model. Jabur et al. [29] proposed a jackknife Liu-type negative binomial estimator. Dawoud et al. [30] proposed a generalized version of the KL estimator along with the optimal biasing parameter of their proposed estimator derived by minimizing the scalar mean square error. Abonazel et al. [31] proposed the Kibria–Lukman estimator to overcome the multicollinearity problem in the COMP regression model.

In this study, we introduce a modification to the KL estimator proposed by Kibria and Lukman [24] for the NBR model. We employ the jackknife method to address the issue of multicollinearity and reduce the MSE in the NBR model. To evaluate the effectiveness of our proposed estimator, we conduct a comprehensive comparison with existing estimators through theoretical analysis, simulation studies, and real data application.

The article is organized as follows: In Section 2, we present the existing estimators. Section 3 provides a detailed discussion of the proposed estimator. A theoretical comparison of efficiency is presented in Section 4. In Section 5, we conduct two simulation studies and analyze a real dataset example. Finally, in Section 6, we provide concluding remarks.

2. Existing Estimators

2.1. Negative Binomial Regression Model MLE Estimator

The NBR model is widely used to model the response variable as count or non-negative integers. The negative binomial probability density function is given as

f (y_{i}; μ_{i}, θ) = \frac{Γ (θ + y_{i})}{Γ (θ) Γ (1 + y_{i})} {(\frac{θ}{θ + μ_{i}})}^{θ} {(\frac{μ_{i}}{θ + μ_{i}})}^{y_{i}}

(1)

where

Γ (.)

is the gamma function and

θ

is the overdispersion parameter. The mean and variance of the negative binomial probability distribution are, respectively,

E (y_{i}) = μ_{i}

and

V a r (y_{i}) = μ_{i} + \frac{{μ_{i}}^{2}}{θ}

[32].

The NBR model parameter can be estimated using the MLE method or it can be estimated as a member of the generalized linear models [32]. The log-likelihood function of the negative binomial regression model is given by:

L (θ, γ) = \sum_{i = 1}^{n} [(\sum_{t = 0}^{y_{i} - 1} l o g (t + θ) - l o g y_{i}! - (y_{i} + θ) l o g (1 + \frac{1}{θ} μ_{i}) + y_{i} l o g (\frac{1}{θ}) + y_{i} μ_{i}],

(2)

where

μ_{i} = e x p (x_{i}^{'} γ)

,

x_{i}

is a

p \times 1

vector of the covariates, and

γ

is a

p \times 1

vector of the parameters. Differentiating Equation (2) with respect to

γ

using the iteratively weighted least squared algorithm or Fisher’s scoring algorithm, we have

{\hat{γ}}_{M L E} = {(X^{T} \hat{W} X)}^{- 1} X^{T} \hat{W} \hat{u},

(3)

where

\hat{u}

is a vector where the

i

th elements equals

\log ({\hat{μ}}_{i}) + (y_{i} - {\hat{μ}}_{i}) / {\hat{μ}}_{i}

and

\hat{W} = d i a g ({\hat{μ}}_{i} / (1 + (\frac{1}{θ}) {\hat{μ}}_{i}) .

The MLE estimator of

γ

is normally distributed with the asymptotic mean vector

E ({\hat{γ}}_{M L E}) = 0

and the asymptotic covariance matrix

C o v ({\hat{γ}}_{M L}) = {(X^{T} \hat{W} X)}^{- 1} .

(4)

The MSE based on the asymptotic covariance matrix is as follows:

M S E ({\hat{γ}}_{M L E}) = t r {(X^{T} \hat{W} X)}^{- 1} = \sum_{j = 1}^{p} \frac{1}{ω_{j}},

(5)

where

ω_{j}

is the eigenvalue of the

X^{T} \hat{W} X

matrix.

2.2. Liu-Type Negative Binomial Regression Estimator (LNBR)

Asar [8] extended the Liu-type estimator to the NBR model, as follows:

{\hat{γ}}_{L N B R} = {(X^{T} \hat{W} X + I)}^{- 1} (X^{T} \hat{W} X + d I) {\hat{γ}}_{M L}, 0 < d < 1

(6)

where d is the Liu parameter. The mean square error matrix (MSEM) of the LNBR estimator is

M S E M ({\hat{γ}}_{L N B R}) = (Q {Ω_{k}}^{- 1} Ω_{d} Ω^{- 1} Ω_{d} {Ω_{k}}^{- 1} Q^{T}) + {(d - 1)}^{2} Q {Ω_{k}}^{- 1} γ γ^{T} {Ω_{k}}^{- 1} Q^{T},

(7)

where

Ω_{k} = d i a g (ω_{1} + k, \dots, ω_{p} + k)

and

Ω_{d} = d i a g (ω_{1} + d, \dots, ω_{p} + d)

. Then, the MSE of the LNBR estimator is

M S E ({\hat{γ}}_{L N B R}) = \sum_{j = 1}^{p} \frac{1}{ω_{j}} {(\frac{ω_{j} + d}{ω_{j} + 1})}^{2} + {(d - 1)}^{2} \sum_{j = 1}^{p} \frac{{γ_{j}}^{2}}{{(ω_{j} + 1)}^{2}} .

(8)

2.3. Ridge Negative Binomial Regression Estimator (RNBR)

The ridge negative binomial regression (RNBR) estimator was presented by Manson and Shukur [33] and Jabur et al. [29], as shown below:

{\hat{γ}}_{R N B R} = {(X^{T} \hat{W} X + k I)}^{- 1} X^{T} \hat{W} X {\hat{γ}}_{M L},

(9)

where

0 < k < 1

.

The MSEM for the RNBR estimator is as follows:

\begin{matrix} M S E M ({\hat{γ}}_{R N B R}) & = C o v {(\hat{γ}}_{R N B R}) + B i a s {(\hat{γ}}_{R N B R}) B i a s {{(\hat{γ}}_{R N B R})}^{T}, \\ = ({Q Ω}_{k}^{- 1} Ω {Ω_{k}}^{- 1} Q^{T}) + k^{2} (Q {Ω_{k}}^{- 1} γ γ^{T} {Ω_{k}}^{- 1} Q^{T}), \end{matrix}

(10)

where k > 0,

Ω = d i a g (ω_{1}, \dots, ω_{p}) = Q^{T} X^{T} \hat{W} X Q

,

Ω_{k} = d i a g (ω_{1 + k}, ω_{2 + k}, \dots, ω_{p + k})

and

Q

is the matrix whose columns are eigenvectors of the

X^{T} \hat{W} X

. The MSE for the RNBR estimator is as follows:

M S E {(\hat{γ}}_{R N B R}) = \sum_{j = 1}^{P} \frac{ω_{j}}{{(ω_{j} + k)}^{2}} + k^{2} \sum_{j = 1}^{P} \frac{{γ_{j}}^{2}}{{(ω_{j} + k)}^{2}} .

(11)

3. Proposed Estimator

Considering the Kibria and Lukman [24] and Algamal and Abonazel [34], we defined the KL negative binomial regression (KLNBR) estimator as follows:

{\hat{γ}}_{K L N B R} = {(X^{'} \hat{W} X + k I)}^{- 1} (X^{'} \hat{W} X - k I) {\hat{γ}}_{M L} .

(12)

The canonical form of the KL negative binomial regression estimator is given as:

{\hat{γ}}_{K L N B R} = {(Ω + k I)}^{- 1} (Ω - k I) {\hat{γ}}_{M L},

(13)

where

Ω = d i a g (ω_{1}, ω_{2}, \dots, ω_{p}) = T^{'} \hat{W} T

is a diagonal matrix, whose columns are the eigenvectors of

X^{'} \hat{W} X

,

T = X Q

such that

Q

is an orthogonal matrix whose rows are normalized eigenvectors of the X′WX matrix. We rewrite the negative binomial regression ML in Equation (3)

{\hat{γ}}_{M L E} = Ω^{- 1} T^{'} \hat{W} \hat{u},

(14)

{\hat{γ}}_{K L N B R} = {(Ω + k I)}^{- 1} (Ω - k I) Ω^{- 1} T^{'} \hat{W} \hat{u},

(15)

with

C^{- 1} = {(Ω + k I)}^{- 1} (Ω - k I) Ω^{- 1}

,

{\hat{γ}}_{K L N B R} = C^{- 1} T^{'} \hat{W} \hat{u}

.

Singh et al. [35] used the jackknife procedure to reduce the bias of the estimator. We applied the jackknife procedure to the KL negative binomial regression estimator, and we derived the jackknife KL negative binomial regression estimator, as follows:

{\hat{γ}}_{K L N B R_i} = {(Ω_{- i} + k I)}^{- 1} (Ω_{- i} - k I) {\hat{γ}}_{M L E} .

(16)

Considering Ugwuowo et al. [27], we defined the jackknife KL negative binomial regression (JKLNBR) estimator, as follows:

{\hat{γ}}_{K L N B R_i} = {(C - t_{i} {\hat{w}}_{i} t_{i})}^{- 1} \cdot (T^{'} \hat{W} \hat{u} - t_{i} {\hat{w}}_{i} {\hat{u}}_{i}),

(17)

{\hat{γ}}_{K L N B R_i} = (C^{- 1} + \frac{C^{- 1} t_{i} {\hat{w}}_{i} t_{i} C^{- 1}}{1 - t_{i^{'}} {\hat{w}}_{i} C^{- 1} t_{i}}) (T^{'} \hat{W} \hat{u} - t_{i} {\hat{w}}_{i} {\hat{u}}_{i}),

(18)

\begin{array}{l} {\hat{γ}}_{K L N B R_i} & = C^{- 1} T^{'} \hat{W} \hat{u} - C^{- 1} t_{i} {\hat{w}}_{i} {\hat{u}}_{i} + \frac{C^{- 1} t_{i} {\hat{w}}_{i} t_{i^{'}} C^{- 1} t_{i^{'}} {\hat{w}}_{i} {\hat{u}}_{i}}{1 - t_{i^{'}} {\hat{w}}_{i} C^{- 1} t_{i}} - \frac{C^{- 1} t_{i} {\hat{w}}_{i} t_{i^{'}} C^{- 1}}{1 - t_{i^{'}} {\hat{w}}_{i} C^{- 1} t_{i}} t_{i} {\hat{w}}_{i} {\hat{u}}_{i}, \\ = {\hat{γ}}_{K L N B R} - C^{- 1} {\hat{w}}_{i} {\hat{u}}_{i} (1 + \frac{t_{i^{'}} {\hat{w}}_{i} C^{- 1} t_{i^{'}}}{1 - t_{i^{'}} {\hat{w}}_{i} C^{- 1} t_{i}}) + \frac{C^{- 1} t_{i} {\hat{w}}_{i} t_{i^{'}} C^{- 1} t_{i} {\hat{w}}_{i} {\hat{u}}_{i}}{1 - t_{i^{'}} {\hat{w}}_{i} C^{- 1} t_{i}}, \\ = {\hat{γ}}_{K L N B R} - C^{- 1} t_{i} {\hat{w}}_{i} {\hat{u}}_{i} (\frac{1}{1 - t_{i^{'}} {\hat{w}}_{i} C^{- 1} t_{i}}) + \frac{C^{- 1} t_{i} {\hat{w}}_{i} t_{i^{'}} {\hat{γ}}_{K L N B}}{1 - t_{i^{'}} {\hat{w}}_{i} C^{- 1} t_{i}}, \\ = {\hat{γ}}_{K L N B R} - C^{- 1} t_{i} {\hat{w}}_{i} (\frac{{\hat{u}}_{i} - t_{i^{'}} {\hat{w}}_{i} {\hat{γ}}_{K L N B}}{1 - t_{i^{'}} {\hat{w}}_{i} C^{- 1} t_{i}}), \\ = {\hat{γ}}_{K L N B R} - (\frac{C^{- 1} t_{i} {\hat{w}}_{i} e_{i}}{1 - h_{i}}), \end{array}

(19)

where

h_{i} = 1 - {t_{i}}^{'} {\hat{w}}_{i} C^{- 1} t_{i}

,

{t_{i}}^{'}

is the ith row of

T

, and

e_{i}

is the ith KLNBR residual.

Using the weighted pseudo-values defined by Khurana et al. [36], the jackknife KL negative binomial regression estimator is obtained as

{\hat{γ}}_{J K L N B R} = {\hat{γ}}_{K L N B R} + C^{- 1} (1 - [{(Ω + k I)}^{- 1} (Ω - k I)]) T^{'} \hat{W} \hat{u},

(20)

{\hat{γ}}_{J K L N B R} = {\hat{γ}}_{K L N B R} + C^{- 1} T^{'} \hat{W} \hat{u} - C^{- 1} (1 - [{(Ω + k I)}^{- 1} (Ω - k I)]) T^{'} \hat{W} \hat{u},

(21)

{\hat{γ}}_{J K L N B R} = {\hat{γ}}_{K L N B R} + {\hat{γ}}_{K L N B R} - {(Ω + k I)}^{- 1} (Ω - k I) {\hat{γ}}_{K L N B R},

(22)

{\hat{γ}}_{J K L N B R} = {\hat{γ}}_{K L N B R} (2 I - {(Ω + k I)}^{- 1} (Ω - k I),

(23)

{\hat{γ}}_{J K L N B R} = {(Ω + k I)}^{- 1} [2 (Ω + k I) (Ω - k I)] {\hat{γ}}_{K L N B R} .

(24)

Consequently, the jackknife KL negative binomial regression estimator is

{\hat{γ}}_{J K L N B R} = {\hat{γ}}_{M L E} [I - {(2 k {(Ω + k I)}^{- 1})}^{2}] .

(25)

Following Duran and Akdeniz [11], subsequently, we modified Equation (25)

{\hat{γ}}_{J K L N B R} = {\hat{γ}}_{K L N B R} [I - {(2 k {(Ω + k I)}^{- 1})}^{2}],

(26)

where

{\hat{γ}}_{K L N B R} = {\hat{γ}}_{M L} [I - 2 k {(Ω + k I)}^{- 1}]

.

Therefore,

{\hat{γ}}_{J K L N B R} = {\hat{γ}}_{M L E} [I + 2 k {(Ω + k I)}^{- 1}] [I - {(2 k {(Ω + k I)}^{- 1})}^{2}],

(27)

The expectation of

{\hat{γ}}_{J K L N B R}

is as follows:

E ({\hat{γ}}_{J K L N B R}) = [I - {(2 k {(Ω + k I)}^{- 1})}^{2}] [I + 2 k {(Ω + k I)}^{- 1}] γ .

(28)

The bias of

{\hat{γ}}_{J K L N B R}

is given by

B i a s ({\hat{γ}}_{J K L N B R}) = \{[I - {(2 k {(Ω + k I)}^{- 1})}^{2}] [I + 2 k {(Ω + k I)}^{- 1}] - I\} γ .

(29)

The MSEM for the JKLNBR estimator is

M S E M ({\hat{γ}}_{J K L N B R}) = C o v ({\hat{γ}}_{J K L N B R}) + B i a s ({\hat{γ}}_{J K L N B R}) \cdot B i a s {({\hat{γ}}_{J K L N B R})}^{T},

(30)

M S E M ({\hat{γ}}_{J K L N B R}) = {[I - {(2 k {(Ω + k I)}^{- 1})}^{2}]}^{2} Ω [I - {(2 k {(Ω + k I)}^{- 1})}^{2}] B i a s ({\hat{γ}}_{J K L N B R}) \cdot B i a s {({\hat{γ}}_{J K L N B R})}^{T}

(31)

The MSE of the JKLNBR estimator is

\begin{array}{l} M S E ({\hat{γ}}_{J K L N B R}) & = t r (M S E M ({\hat{γ}}_{J K L N B R})), \\ = \sum_{j = 1}^{p} \frac{{({(ω_{j} + k)}^{2} - 4 k^{2})}^{2} {(ω_{j} - k)}^{2}}{ω_{j} {(ω_{j} + k)}^{6}} + \sum_{j = 1}^{p} \frac{{({(ω_{j} - k)}^{2} (ω_{j} + 3 k) - {(ω_{j} + k)}^{3})}^{2} {\hat{γ}}^{2}}{{(ω_{j} + k)}^{6}} . \end{array}

(32)

4. Comparison of Efficiency

Following Oranye and Ugwuowo [37], Algamal and Abonazel [34], and Ugwuowo et al. [27], we provide the efficiency of the proposed jackknife KL negative binomial regression estimator over the existing estimators. The Farebrother [38] and Trenkler and Toutenburg [39] lemmas are adopted for theoretical comparisons.

The suggested estimator ${\hat{γ}}_{J K L N B R}$ is more efficient than to estimate ${\hat{γ}}_{M L}$ if and only if $γ^{'} {[I - 2 k {(Ω + k I)}^{- 1}]}^{2} [I + 2 k {(Ω + k I)}^{- 1} - I] [Ω^{- 1} - {[I - {(2 k {(Ω + k I)}^{- 1})}^{2}]}^{2} Ω {[I - {(2 k {(Ω + k I)}^{- 1})}^{2}]}^{- 1}]$ ${[I - (2 k {(Ω + k I)}^{- 1})]}^{2} [I + 2 k {(Ω + k I)}^{- 1} - I] γ < 1$ for the NBR model.

Proof.

The difference between the MSEM functions of the MLE and JKLNBR estimators are obtained as follows:

M S E M ({\hat{γ}}_{M L E}) - M S E M ({\hat{γ}}_{J K L N B R}) = d i a g {\{\frac{1}{ω_{j}} - \frac{{({(ω_{j} + k)}^{2} - 4 k^{2})}^{2} {(ω_{j} - k)}^{2}}{ω_{j} {(ω_{j} + k)}^{6}}\}}_{j = 1}^{p} .

(33)

Therefore,

Ω^{- 1} - {[I - {(2 k {(Ω + k I)}^{- 1})}^{2}]}^{2} Ω {[I - (2 k {(Ω + k I)}^{- 1})]}^{2}

is positive definite provided

ω_{j} {(ω_{j} + k)}^{6} > {ω_{j} ({(ω_{j} + k)}^{2} - 4 k^{2})}^{2} {(ω_{j} - k)}^{2} .

Thus, the proof is finished. □

ii.: The suggested estimator ${\hat{γ}}_{J K L N B R}$ is more efficient than to estimate ${\hat{γ}}_{L N B R}$ if and only if $γ^{'} {[I - 2 k {(Ω + k I)}^{- 1}]}^{2} [I + 2 k {(Ω + k I)}^{- 1} - I] [{(Ω + d I)}^{2} Ω^{- 1} {(Ω + I)}^{2} - {[I - {(2 k {(Ω + k I)}^{- 1})}^{2}]}^{2}$ $Ω {[I - (2 k {(Ω + k I)}^{- 1})]}^{2} + {(d - 1)}^{2} {(Ω + I)}^{- 2} γ^{'} γ]^{- 1} [{[I - 2 k {(Ω + k I)}^{- 1}]}^{2} [I + 2 k {(Ω + k I)}^{- 1} - I]] γ < 1$ under the NRB model.

Proof.

The MSEM of the LNBR estimator

M S E M ({\hat{γ}}_{L N B R}) = (Q {Ω_{k}}^{- 1} Ω_{d} Ω^{- 1} Ω_{d} {Ω_{k}}^{- 1} Q^{T}) + {(d - 1)}^{2} Q {Ω_{k}}^{- 1} γ γ^{T} {Ω_{k}}^{- 1} Q^{T},

(34)

where d is the Liu parameter,

Ω_{k} = d i a g (ω_{1} + k, \dots, ω_{p} + k)

and

Ω_{d} = d i a g (ω_{1} + d, \dots, ω_{p} + d)

.

The difference between the MSEM functions of the LNBR and JKLNBR estimators are obtained as follows:

M S E M ({\hat{γ}}_{L N B R}) - M S E M ({\hat{γ}}_{J K L N B R}) = d i a g {\{\frac{1}{ω_{j}} {(\frac{ω_{j} + d}{ω_{j} + 1})}^{2} - \frac{{({(ω_{j} + k)}^{2} - 4 k^{2})}^{2} {(ω_{j} - k)}^{2}}{ω_{j} {(ω_{j} + k)}^{6}}\}}_{j = 1}^{p} .

(35)

Thus,

[{(Ω + d I)}^{2} Ω^{- 1} {(Ω + I)}^{- 2}] - {[I - {(2 k {(Ω + k I)}^{- 1})}^{2}]}^{2} Ω {[I - 2 k {(Ω + k I)}^{- 1}]}^{2}

is positive definite provided

{(ω_{j} + d)}^{2} {(ω_{j} + k)}^{6} > {({(ω_{j} + k)}^{2} - 4 k^{2})}^{2} {(ω_{j} - k)}^{2} {(ω_{j} + 1)}^{2}

. □

iii.: The suggested estimator ${\hat{γ}}_{J K L N B R}$ is more efficient than to estimate ${\hat{γ}}_{R N B R}$ if and only if $γ^{'} {[I - 2 k {(Ω + k I)}^{- 1}]}^{2} [I + 2 k {(Ω + k I)}^{- 1} - I] [Ω^{2} {(Ω + k I)}^{- 2} - {[I - {(2 k {(Ω + k I)}^{- 1})}^{2}]}^{2}$ $Ω {[I - (2 k {(Ω + k I)}^{- 1})]}^{2} + k^{2} {(Ω + k I)}^{- 2} γ^{'} γ]^{- 1} [{[I - 2 k {(Ω + k I)}^{- 1}]}^{2} [I + 2 k {(Ω + k I)}^{- 1} - I]] γ < 1$ under the NBR model.

Proof.

The MSEM of the RNBR estimator is

M S E M ({\hat{γ}}_{R N B R}) = ({Q Ω}_{k}^{- 1} Ω {Ω_{k}}^{- 1} Q^{T}) + k^{2} (Q {Ω_{k}}^{- 1} γ γ^{T} {Ω_{k}}^{- 1} Q^{T}),

(36)

where k > 0 and

Ω_{k} = d i a g (ω_{1 + k}, ω_{2 + k}, \dots, ω_{p + k})

The difference between the MSEM functions of the RNBR and JKLNBR estimators are obtained as follows:

M S E M ({\hat{γ}}_{R N B R}) - M S E M ({\hat{γ}}_{J K L N B R}) = d i a g {\{\frac{{ω_{j}}^{2}}{{(ω_{j} + k)}^{2}} - \frac{{({(ω_{j} + k)}^{2} - 4 k^{2})}^{2} {(ω_{j} - k)}^{2}}{ω_{j} {(ω_{j} + k)}^{6}}\}}_{j = 1}^{p} .

(37)

Therefore,

[Ω^{2} {(Ω + k I)}^{- 2} - {[I - {(2 k {(Ω + k I)}^{- 1})}^{2}]}^{2} Ω [I - {(2 k {(Ω + k I)}^{- 1})}^{2}]]

is positive definite provided

{ω_{j}}^{3} {(ω_{j} + k)}^{4} > {({(ω_{j} + k)}^{2} - 4 k^{2})}^{2} {(ω_{j} - k)}^{2} .

□

iv.: The suggested estimator ${\hat{γ}}_{J K L N B R}$ is more efficient than to estimate ${\hat{γ}}_{K L N B R}$ if and only if $γ^{'} {[I - 2 k {(Ω + k I)}^{- 1}]}^{2} [I + 2 k {(Ω + k I)}^{- 1} - I] [{(Ω - k I)}^{2} Ω^{- 1} {(Ω + k I)}^{- 2} - {[I - {(2 k {(Ω + k I)}^{- 1})}^{2}]}^{2}$ $Ω {[I - (2 k {(Ω + k I)}^{- 1})]}^{2} + 4 k^{2} {(Ω + k I)}^{- 2} γ^{'} γ]^{- 1} [{[I - 2 k {(Ω + k I)}^{- 1}]}^{2} [I + 2 k {(Ω + k I)}^{- 1} - I]] γ < 1$ under the NBR model.

Proof.

The MSE of the KLNBR estimator is

M S E ({\hat{γ}}_{K L N B R}) = \sum_{j = 1}^{p} (\frac{{(ω_{j} - k)}^{2}}{ω_{j} {(ω_{j} + k)}^{2}}) + 4 k^{2} \sum_{j = 1}^{p} (\frac{{γ_{j}}^{2}}{{(ω_{j} + k)}^{2}}) .

(38)

The difference between the MSEM functions of the KLNBR and JKLNBR estimators is obtained as follows:

M S E M ({\hat{γ}}_{K L N B R}) - M S E M ({\hat{γ}}_{J K L N B R}) = d i a g {\{\frac{1}{ω_{j}} {(\frac{ω_{j} - k}{ω_{j} + k})}^{2} - \frac{{({(ω_{j} + k)}^{2} - 4 k^{2})}^{2} {(ω_{j} - k)}^{2}}{ω_{j} {(ω_{j} + k)}^{6}}\}}_{j = 1}^{p} .

(39)

Consequently,

[{(Ω - k I)}^{2} Ω^{- 1} {(Ω + k I)}^{- 2}] - {[I - {(2 k {(Ω + k I)}^{- 1})}^{2}]}^{2} Ω {[I - 2 k {(Ω + k I)}^{- 1}]}^{2}

is positive definite provided

{(ω_{j} - k)}^{2} {(ω_{j} + k)}^{4} > {({(ω_{j} + k)}^{2} - 4 k^{2})}^{2} {(ω_{j} - k)}^{2}

. □

There is no definite rule for estimating the k and d parameters. In this paper, we use the parameter d derived by [27]. We consider one method to estimate the ridge parameter k. However, in practice, there are countless ways available to estimate the ridge parameter k. For different methods of estimating the ridge parameter k for both linear and non-linear regression models, we refer to Kibria [40], Kibria and Banik [41], and, very recently, Kibria [42], among others. The biasing parameter for the ridge and KL estimators is as follows:

\hat{k} = \sqrt{\frac{p}{\sum_{j = 1}^{p} {\hat{γ}}_{j}^{2}}} j = 1,2, \dots, p,

(40)

where p is the number of estimated coefficients.

\hat{d} = m i n [\frac{{\hat{γ}}_{j}^{2}}{\frac{1}{ω_{j}} + {\hat{γ}}_{j}^{2}}] .

(41)

5. Application

In this section, we conducted three numerical studies to evaluate the performance of the proposed jackknife KL negative binomial regression estimator in comparison to traditional estimators. These numerical studies consisted of two Monte Carlo simulations and an analysis of a real dataset. For a good-quality MC simulation, the use of a random number generator alone is not good enough [43]. So, in the simulation designs, we generated multiple datasets for negative binomial regression models according to the following procedure:

μ = e x p (β_{1} X_{1} + \dots + β_{p} X_{p}) .

(42)

The response variable is simulated as

Y ~ N B (μ, μ + θ^{- 1} μ)

, and

θ = 1

is a suitable parameter value [44].

We calculate the mean square error of the negative binomial regression coefficient estimates in the simulation parts as follows:

M S E (β) = \frac{1}{100} \sum_{i = 1}^{100} {(β_{i} - β_{r})}^{T} (β_{i} - β_{r}),

(43)

where

β_{r}

are the true regression coefficients and

β_{i}

are the estimated negative binomial regression coefficients. The simulation settings aim to denote the precision of the estimates

β

values. The number of runs is 100, and the sample size is n = 20, 25, 30, 50,100, and 300 in the simulation settings.

In the real data analysis part, we divided the data as test–train and evaluated the predictive performance of the negative binomial estimators. The prediction performances of the estimators are computed using the following MSE formula:

M S E (Y) = \frac{1}{n} \sum_{i = 1}^{n} {(y_{p r e d (i)} - y_{r e a l (i)})}^{2},

(44)

where

y_{p r e d}

shows the predictions and

y_{r e a l}

shows the real response values. This formula is used to demonstrate the predictive performance of the negative binomial regression estimators for both the test and train sets in the real data analysis part.

Both the simulation parts and real dataset analysis were chosen to have multicollinearity among the variables. We obtained the entire results using R software 4.3.2 [45].

5.1. Simulation Studies

In this section, we present two distinct simulation designs that incorporate different structures for the independent variables. Each simulation design aims to assess the performance of the proposed jackknife KL negative binomial regression estimator and traditional estimators under these specific structures.

5.1.1. Simulation Study 1

In this setting, we generate the independent variables (X) as follows:

X_{i j} = \sqrt{1 - r^{2}} Z_{i j} + r Z_{i (p + 1)} i = 1, 2, \dots n j = 1, 2, \dots, p

(45)

where r controls the correlation value among the predictors, and Z~N(0,1) shows the standard normally distributed variables [46,47]. We set the correlation as r = 0.90, 0.95, 0.99 with p = 3, 6.

Table 1 and Table 2 show the results of the first simulation for p = 3, 6, respectively. The MSE values decrease as the sample size increases for all the estimators. When looking at the results, we see that our proposed estimators are superior to the maximum likelihood, Liu-type, ridge, and KL negative binomial estimators. In all cases, the JKLNBR has the lowest MSE values. Even in a very high correlation (r = 0.99), the JKLNBR achieves the best for all the sample sizes.

5.1.2. Simulation Study 2

In this subsection, we generated highly correlated predictors by following the method of Kartal Koc and Bozdogan [48] and Dünder [49,50]. The multicollinearity level is controlled using the c parameter. In total, we simulated five different predictors, as follows:

X_{1} = 10 + ε_{1}

(46)

X_{2} = 10 + {0.3 ε}_{1} + c ε_{2}

(47)

X_{3} = 10 + {0.3 ε}_{1} + 0.5604 c ε_{2} + 0.8282 c ε_{2}

(48)

X_{4} = - 8 + X_{1} + 0.5 X_{2} 0.3 X_{3} + 0.5 ε_{4}

(49)

X_{5} = - 5 + {0.5 X}_{1} + X_{2} + 0.5 ε_{5}

(50)

where

ε_{1}

,

ε_{2}

,

ε_{3}

,

ε_{4}

, and

ε_{5} ~

N(0,1) are the standard normally distributed variables. To introduce strong increasing multicollinearity, we set the parameter ‘c’ to three different values: c = 0.3, 0.6, and 0.9. These values ensure a progressively stronger correlation among the independent variables in the simulation designs.

We report the results of the second simulation design in Table 3. Based on the Mean Square Error (MSE) values, the JKLNBR estimator consistently demonstrates the lowest prediction error across all the cases in the negative binomial regression models. On average, the JKLNBR estimator outperforms the other estimators in terms of the MSE values, indicating superior performance.

5.2. Real Dataset Example

In this section, we evaluate the performance of the negative binomial estimators using a real dataset called the “docvisit” data. The “docvisit” dataset is suitable for count data analysis as it contains the number of doctor visits as the response variable. The description of the regressors of the docvisits dataset is given in Table 4. The dataset consists of 14 different regressors and a sample size of n = 1812. The “docvisit” dataset can be accessed from the “zic” package in R [51].

We divided the “docvisit” data into two parts: a training set and a test set. The training set comprised the first 1500 observations, while the remaining 318 observations were allocated to the test set. Using the training set, we estimated negative binomial regression models employing five different estimators. Subsequently, we evaluated the predictive performance of these models by examining their ability to generalize to unseen data, specifically the test set. This evaluation allows us to assess the models’ effectiveness in predicting the number of doctor visits based on data not used for the model estimation.

To investigate the existence of multicollinearity, we computed the condition index via the following formulation:

κ = \sqrt{\frac{λ_{m a x}}{λ_{m i n}}}

(51)

where

λ

values represent the eigenvalues of

X^{T} X

[50]. The regressors have the multicollinearity problem due to the high condition index = 362.881, which is greater than 100 [52]. Besides this fact, there are two correlations greater than 0.90 (correlation = 0.993, 0.913) between the regressors. These indicators demonstrate the existence of the multicollinearity problem in our dataset.

We also investigate the overdispersion problem using the chi-squared statistic (CHISQ) and the overdispersion parameter (OP), as follows:

C H I S Q = \sum_{i = 1}^{n} \frac{{(y_{i} - μ_{i})}^{2}}{μ_{i}}

(52)

O P = \sum_{i = 1}^{n} \frac{C H I S Q}{n - p}

(53)

where

μ_{i}

denotes the predictions and

y_{i}

shows the actual vales. In the formulation, df = n − p shows the degrees of freedom of the residuals. Based on the overdispersion test, we conclude that the model is overdispersed (CHISQ = 2850.934, p < 0.001). The overdispersion parameter is equal to 1.918, which is also greater than 1, and it indicates the existence of the overdispersion problem as well [53]. The correlation matrix of the predictor variables is given in Table 5.

Table 6 shows the regression coefficients of the negative binomial regression estimators. The regression coefficients have different signs, and the intercept terms slightly differ in the KL estimators. The estimator with the highest mean square error is the MLE due to the presence of multicollinearity. The proposed estimator, JKLNBR, has outperformed the other estimators because it has the smallest MSE.

Table 7 reports the average MSE(Y) values (given in Equation (46)) for the test and train datasets for five type negative binomial regression estimators. We divided the dataset for training and testing with the proportions 20–80%, respectively. In both the train and test data, the proposed JKLNBR estimators apparently have the lowest MSE values. Even the classical KL estimator suffers; our proposed one is better than the other competitors.

6. Conclusions

The NBR model, which is a mixed distribution, is a solution to the overdispersion problem in modeling overdispersed count data. However, when faced with the problem of multicollinearity, the variance of the MLE of the NBR model tends to increase, leading to unreliable results. To address this problem, various biased estimators have been proposed in the literature. In this study, we introduce a new estimator for NBR models using the jackknife approach. Our proposed estimator effectively reduces the negative impact of multicollinearity on the standard error of the estimates, resulting in a smaller MSE with minimal bias. The theoretical results show that our proposed estimators outperform several existing estimators, including the MLE, LNBR, RNBR, and KLNBR. Furthermore, we evaluate the performance of our proposed JKLNBR estimator through two simulation studies and a real data application. The empirical results strongly support the effectiveness of the proposed JKLNBR estimator on several fronts. In our simulation results, the JKLNBR estimator exhibits the highest estimation accuracy and consistently provides the smallest MSE values when assessing the correctness of the regression coefficient estimates. In addition, the JKLNBR estimator outperforms other estimators in the real data example, showing the lowest prediction error on the test data. Even in the presence of high multicollinearity, our new estimator is the best when compared to the classical MLE, Liu-type, ridge, and classical KL estimators in terms of the NBRM. In future studies, we aim to investigate the performance of the Kibria–Lukman estimator in other types of regression models where the response variable is in the form of count data. In addition, we will investigate the behavior of the shrinkage parameters under different selection procedures and develop estimators that can deal not only with the multicollinearity problem but also with the outlier problem.

Author Contributions

T.K. and H.K. contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Pittman, B.; Buta, E.; Krishnan-Sarin, S.; O’Malley, S.S.; Liss, T.; Gueorguieva, R. Models for analyzing zero-inflated and overdispersed count data: An application to cigarette and marijuana use. Nicotine Tob. Res. 2020, 22, 1390–1398. [Google Scholar] [CrossRef] [PubMed]
Alrumayh, A.; Khogeer, H.A. A New Two-Parameter Discrete Distribution for overdispersed and Asymmetric Data: Its Properties, Estimation, Regression Model, and Applications. Symmetry 2023, 15, 1289. [Google Scholar] [CrossRef]
Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
Mamun, A.; Paul, S. Model Selection in Generalized Linear Models. Symmetry 2023, 15, 1905. [Google Scholar] [CrossRef]
Liu, K. A new class of biased estimate in linear regression. Commun. Stat.-Theory Methods 1993, 22, 393–402. [Google Scholar]
Månsson, K. On ridge estimators for the negative binomial regression model. Econ. Model. 2012, 29, 178–184. [Google Scholar] [CrossRef]
Månsson, K. Developing a Liu estimator for the negative binomial regression model: Method and application. J. Stat. Comput. Simul. 2013, 83, 1773–1780. [Google Scholar] [CrossRef]
Asar, Y. Liu-type negative binomial regression: A comparison of recent estimators and applications. In Trends and Perspectives in Linear Statistical Inference; Springer: Cham, Switzerland, 2018; pp. 23–39. [Google Scholar]
Türkan, S.; Özel, G. A Jackknifed estimator for the negative binomial regression model. Commun. Stat.-Simul. Comput. 2018, 47, 1845–1865. [Google Scholar] [CrossRef]
Alobaidi, N.N.; Shamany, R.E.; Algamal, Z.Y. A new ridge estimator for the negative binomial regression model. Thail. Stat. 2021, 19, 116–125. [Google Scholar]
Akdeniz Duran, E.; Akdeniz, F. Efficiency of the modified jackknifed Liu-type estimator. Stat. Pap. 2012, 53, 265–280. [Google Scholar] [CrossRef]
Arum, K.C.; Ugwuowo, F.I.; Oranye, H.E. Robust Modified Jackknife Ridge Estimator for the Poisson Regression Model with Multicollinearity and outliers. Sci. Afr. 2022, 17, e01386. [Google Scholar] [CrossRef]
Akram, M.N.; Abonazel, M.R.; Amin, M.; Kibria, B.M.G.; Afzal, N. A new Stein estimator for the zero-inflated negative binomial regression model. Concurr. Comput. Pract. Exp. 2022, 34, e7045. [Google Scholar] [CrossRef]
Dawoud, I.; Awwad, F.A.; Tag Eldin, E.; Abonazel, M.R. New Robust Estimators for Handling Multicollinearity and Outliers in the Poisson Model: Methods, Simulation and Applications. Axioms 2022, 11, 612. [Google Scholar] [CrossRef]
Akram, M.N.; Amin, M.; Sami, F.; Mastor, A.B.; Egeh, O.M.; Muse, A.H. A new Conway Maxwell–Poisson Liu regression estimator method and application. J. Math. 2022, 2022, 3323955. [Google Scholar] [CrossRef]
Amin, M.; Akram, M.N.; Kibria, B.G. A new adjusted Liu estimator for the Poisson regression model. Concurr. Comput. Pract. Exp. 2021, 33, e6340. [Google Scholar] [CrossRef]
Sami, F.; Amin, M.; Butt, M.M. On the ridge estimation of the Conway-Maxwell Poisson regression model with multicollinearity: Methods and applications. Concurr. Comput. Pract. Exp. 2022, 34, e6477. [Google Scholar] [CrossRef]
Amin, M.; Akram, M.N.; Amanullah, M. On the James-Stein estimator for the Poisson regression model. Commun. Stat.-Simul. Comput. 2022, 51, 5596–5608. [Google Scholar] [CrossRef]
Amin, M.; Akram, M.N.; Majid, A. On the estimation of Bell regression model using ridge estimator. Commun. Stat.-Simul. Comput. 2023, 52, 854–867. [Google Scholar] [CrossRef]
Batool, A.; Amin, M.; Elhassanein, A. On the performance of some new ridge parameter estimators in the Poisson-inverse Gaussian ridge regression. Alex. Eng. J. 2023, 70, 231–245. [Google Scholar] [CrossRef]
Abonazel, M.R. New modified two-parameter Liu estimator for the Conway–Maxwell Poisson regression model. J. Stat. Comput. Simul. 2023, 93, 1976–1996. [Google Scholar] [CrossRef]
Algamal, Z.; Lukman, A.; Golam, B.K.; Taofik, A. Modified Jackknifed Ridge Estimator in Bell Regression Model: Theory, Simulation and Applications. Iraqi J. Comput. Sci. Math. 2023, 4, 146–154. [Google Scholar]
Algamal, Z.Y.; Abonazel, M.R.; Awwad, F.A.; Eldin, E.T. Modified Jackknife Ridge Estimator for the Conway-Maxwell-Poisson Model. Sci. Afr. 2023, 19, e01543. [Google Scholar] [CrossRef]
Kibria, B.M.G.; Lukman, A.F. A new ridge-type estimator for the linear regression model: Simulations and applications. Scientifica 2020, 2020, 9758378. [Google Scholar] [CrossRef] [PubMed]
Lukman, A.F.; Amin, M.; Kibria, B.G. K-L estimator for the linear mixed models: Computation and simulation. Concurr. Comput. Pract. Exp. 2022, 34, e6780. [Google Scholar] [CrossRef]
Aladeitan, B.B.; Adebimpe, O.; Lukman, A.F.; Oludoun, O.; Abiodun, O.E. Modified Kibria-Lukman (MKL) estimator for the Poisson Regression Model: Application and simulation. F1000Research 2021, 10, 548. [Google Scholar] [CrossRef]
Ugwuowo, F.I.; Oranye, H.E.; Arum, K.C. On the jackknife Kibria-Lukman estimator for the linear regression model. Commun. Stat.-Simul. Comput. 2021, 1–13. [Google Scholar] [CrossRef]
Rasheed, H.A.; Sadik, N.J.; Algamal, Z.Y. Jackknifed Liu-type estimator in the Conway-Maxwell Poisson regression model. Int. J. Nonlinear Anal. Appl. 2022, 13, 3153–3168. [Google Scholar]
Jabur, D.M.; Rashad, N.K.; Algamal, Z.Y. Jackknifed Liu-type estimator in the negative binomial regression model. Int. J. Nonlinear Anal. Appl. 2022, 13, 2675–2684. [Google Scholar]
Dawoud, I.; Abonazel, M.R.; Awwad, F.A. Generalized Kibria-Lukman estimator: Method, simulation, and application. Front. Appl. Math. Stat. 2022, 8, 31. [Google Scholar] [CrossRef]
Abonazel, M.R.; Saber, A.A.; Awwad, F.A. Kibria–Lukman estimator for the Conway–Maxwell Poisson regression model: Simulation and applications. Sci. Afr. 2023, 19, e01553. [Google Scholar] [CrossRef]
Hilbe, J.M. Negative Binomial Regression; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Månsson, K.; Shukur, G. A Poisson ridge regression estimator. Econ. Model. 2011, 28, 1475–1481. [Google Scholar] [CrossRef]
Algamal, Z.Y.; Abonazel, M.R. Developing a Liu-type estimator in beta regression model. Concurr. Comput. Pract. Exp. 2022, 34, e6685. [Google Scholar] [CrossRef]
Singh, B.; Chaubey, Y.P.; Dwivedi, T.D. An almost unbiased ridge estimator. Sankhyā Indian J. Stat. Ser. B 1986, 48, 342–346. [Google Scholar]
Khurana, M.; Chaubey, Y.P.; Chandra, S. Jackknifing the ridge regression estimator: A revisit. Commun. Stat.-Theory Methods 2014, 43, 5249–5262. [Google Scholar] [CrossRef]
Oranye, H.E.; Ugwuowo, F.I. Modified jackknife Kibria–Lukman estimator for the Poisson regression model. Concurr. Comput. Pract. Exp. 2022, 34, e6757. [Google Scholar] [CrossRef]
Farebrother, R.W. Further results on the mean square error of ridge regression. J. R. Stat. Soc. Ser. B 1976, 38, 248–250. [Google Scholar] [CrossRef]
Trenkler, G.; Toutenburg, H. Mean squared error matrix comparisons between biased estimators—An overview of recent results. Stat. Pap. 1990, 31, 165–179. [Google Scholar] [CrossRef]
Kibria, B.M.G. Performance of some new ridge regression estimators. Commun. Stat.-Simul. Comput. 2003, 32, 419–435. [Google Scholar] [CrossRef]
Kibria, B.M.G.; Banik, S. Some Ridge Regression Estimators and Their Performances. J. Mod. Appl. Stat. Methods 2016, 15, 206–238. [Google Scholar] [CrossRef]
Kibria, B.M.G. More than hundred (100) estimators for estimating the shrinkage parameter in a linear and generalized linear ridge regression models. J. Econom. Stat. 2022, 2, 233–252. [Google Scholar]
Jäntschi, L. Detecting extreme values with order statistics in samples from continuous distributions. Mathematics 2020, 8, 216. [Google Scholar] [CrossRef]
Huang, J.; Yang, H. A two-parameter estimator in the negative binomial regression model. J. Stat. Comput. Simul. 2014, 84, 124–134. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, Austria, 2022. Available online: https://www.R-project.org/ (accessed on 25 January 2022).
Lukman, A.F.; Kibria, B.M.G.; Nziku, C.K.; Amin, M.; Adewuyi, E.T.; Farghali, R. KL estimator: Dealing with multicollinearity in the logistic regression model. Mathematics 2023, 11, 340. [Google Scholar] [CrossRef]
Koç, T.; Dünder, E. Jackknife Kibria-Lukman estimator for the beta regression model. Commun. Stat.-Theory Methods 2023, 1–17. [Google Scholar] [CrossRef]
Kartal Koc, E.; Bozdogan, H. Model selection in multivariate adaptive regression splines (MARS) using information complexity as the fitness function. Mach. Learn. 2015, 101, 35–58. [Google Scholar] [CrossRef]
Dünder, E. A hybridized consistent Akaike type information criterion for regression models in the presence of multicollinearity. Commun. Stat.-Simul. Comput. 2023, 1–10. [Google Scholar] [CrossRef]
Dünder, E.; Gümüştekin, S.; Murat, N.; Cengiz, M.A. Variable selection in linear regression analysis with alternative Bayesian information criteria using differential evaluation algorithm. Commun. Stat.-Simul. Comput. 2018, 47, 605–614. [Google Scholar] [CrossRef]
Jochmann, M. zic: Bayesian Inference for Zero-Inflated Count Models. R Package Version 0.9, 1. England. 2017. Available online: https://cran.r-project.org/web/packages/zic/zic.pdf (accessed on 25 January 2022).
Alin, A. Multicollinearity. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 370–374. [Google Scholar] [CrossRef]
Payne, E.H.; Gebregziabher, M.; Hardin, J.W.; Ramakrishnan, V.; Egede, L.E. An empirical approach to determine a threshold for assessing overdispersion in Poisson and negative binomial models for count data. Commun. Stat.-Simul. Comput. 2018, 47, 1722–1738. [Google Scholar] [CrossRef]

Table 1. Simulated MSE values for the first simulation design when p = 3.

		$θ = 1$					$θ = 2$
r	n	MLE	LNBR	RNBR	KLNBR	JKLNBR	MLE	LNBR	RNBR	KLNBR	JKLNBR
r = 0.90	20	0.82281	0.80917	0.73124	0.61829	0.59465	0.52961	0.51995	0.48750	0.41220	0.39651
	25	0.57873	0.54248	0.51045	0.43776	0.40874	0.36949	0.34634	0.32589	0.27948	0.26096
	30	0.35849	0.33246	0.32299	0.28693	0.25846	0.22667	0.21021	0.20422	0.18147	0.16342
	50	0.19054	0.18237	0.18016	0.16644	0.14805	0.12097	0.11578	0.11438	0.10567	0.09399
	100	0.08354	0.08036	0.08084	0.07586	0.06927	0.05342	0.05139	0.05170	0.04851	0.04430
	300	0.02408	0.02368	0.02382	0.02324	0.02249	0.01541	0.01516	0.01525	0.01487	0.01439
r = 0.95	20	0.98488	0.89712	0.73530	0.62715	0.61519	0.62685	0.57381	0.47031	0.40113	0.39343
	25	0.76901	0.69927	0.60834	0.52946	0.49644	0.47459	0.43155	0.37543	0.32675	0.30638
	30	0.55637	0.50926	0.46567	0.40748	0.35669	0.35479	0.32475	0.29695	0.25984	0.22746
	50	0.34771	0.32462	0.30729	0.27100	0.22617	0.21601	0.21409	0.20167	0.16835	0.14050
	100	0.14216	0.13461	0.13320	0.12002	0.10488	0.09171	0.08684	0.08593	0.07743	0.06766
	300	0.03964	0.03872	0.03899	0.03761	0.03583	0.02518	0.02459	0.02476	0.02389	0.02276
r = 0.99	20	6.78309	5.53115	2.25137	2.25111	1.72358	4.52296	3.68817	1.50121	1.50104	1.14928
	25	4.85253	4.07813	1.97189	1.92584	1.43208	2.87613	2.41714	1.16875	1.14146	0.84880
	30	3.17543	2.75877	1.57596	1.49806	1.08658	1.92094	1.66888	0.95335	0.90623	0.65731
	50	1.75179	1.57307	1.06358	0.97317	0.69003	1.08999	0.97878	0.66177	0.60552	0.42935
	100	0.58161	0.52103	0.43475	0.35117	0.24148	0.37236	0.33365	0.27833	0.22482	0.15460
	300	0.13537	0.12723	0.12440	0.10977	0.09105	0.08632	0.08133	0.07932	0.07000	0.05809
Average		1.23209	1.06463	0.64223	0.59502	0.47787	0.77630	0.67125	0.40537	0.37493	0.30162

Table 2. Simulated MSE values for the first simulation design when p = 6.

		$θ = 1$					$θ = 2$
r	n	MLE	LNBR	RNBR	KLNBR	JKLNBR	MLE	LNBR	RNBR	KLNBR	JKLNBR
r = 0.90	20	1.76150	1.74362	1.71281	1.60892	1.46444	1.06560	1.05478	1.03615	0.97330	0.88590
	25	1.15564	1.12731	1.12589	1.08779	0.90920	0.71065	0.69323	0.69235	0.66892	0.55910
	30	0.71259	0.67670	0.64511	0.59164	0.49423	0.44048	0.41830	0.39877	0.36572	0.30551
	50	0.29787	0.28684	0.28329	0.26057	0.21953	0.19296	0.18582	0.18352	0.16880	0.14221
	100	0.10262	0.09947	0.10043	0.09453	0.08510	0.06751	0.06544	0.06607	0.06219	0.05598
	300	0.03446	0.03408	0.03428	0.03369	0.03257	0.02144	0.02121	0.02133	0.02096	0.02027
r = 0.95	20	3.76635	3.46502	2.99998	2.82094	2.28779	2.23235	2.05375	1.77811	1.67200	1.35599
	25	2.27831	2.10201	1.85031	1.71650	1.37987	1.58131	1.45895	1.28425	1.19138	0.95773
	30	1.18608	1.10448	1.00059	0.90914	0.72249	0.78217	0.72836	0.65985	0.59954	0.47645
	50	0.49967	0.47245	0.45084	0.39886	0.31565	0.32582	0.30807	0.29397	0.26008	0.20582
	100	0.20908	0.20175	0.20106	0.18566	0.15935	0.13429	0.12958	0.12914	0.11925	0.10235
	300	0.06094	0.06010	0.06043	0.05890	0.05605	0.03609	0.03559	0.03579	0.03488	0.03320
r = 0.99	20	19.08371	17.55004	11.24236	11.10921	10.01920	11.51473	10.58935	6.78342	6.70308	6.04539
	25	11.17668	10.27346	6.67784	6.26168	5.55210	6.87297	6.31754	4.10646	3.85055	3.41420
	30	5.50593	5.06716	3.41039	3.49166	2.47234	3.40347	3.13225	2.10812	2.15836	1.52827
	50	2.08247	1.93114	1.44001	1.29915	0.77994	1.29572	1.20156	0.89598	0.80834	0.48528
	100	0.90630	0.86540	0.76670	0.68415	0.47490	0.58710	0.56061	0.49667	0.44319	0.30764
	300	0.24833	0.24247	0.24001	0.22519	0.19391	0.16335	0.15950	0.15788	0.14813	0.12756
Average		2.83714	2.62797	1.90235	1.82434	1.53437	1.74600	1.61744	1.17377	1.12493	0.94494

Table 3. Simulated MSE values for the second simulation design when p = 5.

		$θ = 1$					$θ = 2$
r	n	MLE	LNBR	RNBR	KLNBR	JKLNBR	MLE	LNBR	RNBR	KLNBR	JKLNBR
c = 0.3	20	1.25489	0.38792	0.29644	0.28027	0.16622	0.75913	0.23467	0.17933	0.16955	0.10055
	25	0.89141	0.36293	0.30554	0.29336	0.16624	0.54816	0.22318	0.18789	0.18040	0.10223
	30	0.63844	0.33503	0.31864	0.26668	0.17622	0.39465	0.20710	0.19697	0.16485	0.10893
	50	0.47549	0.30699	0.33580	0.23874	0.18365	0.30803	0.19887	0.21753	0.15466	0.11897
	100	0.37725	0.27441	0.33559	0.25214	0.20106	0.24817	0.18052	0.22077	0.16587	0.13227
	300	0.33704	0.27174	0.32888	0.27919	0.23617	0.20971	0.16908	0.20463	0.17372	0.14695
c = 0.6	20	1.22176	0.53134	0.38674	0.27531	0.16582	0.72415	0.31493	0.22922	0.16318	0.09828
	25	0.83048	0.42347	0.35960	0.25951	0.17069	0.57641	0.29392	0.24959	0.18012	0.11847
	30	0.57322	0.33557	0.33807	0.24391	0.17606	0.37801	0.22129	0.22294	0.16085	0.11610
	50	0.42685	0.27269	0.32254	0.22820	0.18142	0.27833	0.17781	0.21032	0.14880	0.11830
	100	0.36879	0.27047	0.33222	0.24679	0.19795	0.23687	0.17372	0.21338	0.15851	0.12714
	300	0.33852	0.27097	0.33074	0.27906	0.23501	0.20049	0.16049	0.19588	0.16528	0.13919
c = 0.9	20	0.86408	0.35249	0.39052	0.25198	0.16802	0.52137	0.21269	0.23563	0.15204	0.10138
	25	0.70937	0.34565	0.37079	0.24876	0.17214	0.43622	0.21255	0.22801	0.15297	0.10586
	30	0.57578	0.33235	0.35001	0.24682	0.17845	0.35592	0.20544	0.21636	0.15257	0.11031
	50	0.46243	0.30905	0.35338	0.23802	0.18442	0.28773	0.19229	0.21987	0.14810	0.11475
	100	0.36648	0.26186	0.33526	0.24317	0.19546	0.23741	0.16963	0.21718	0.15753	0.12662
	300	0.33285	0.26179	0.32622	0.27294	0.22891	0.21895	0.17221	0.21459	0.17954	0.15058
Average		0.61362	0.32815	0.33983	0.25805	0.18800	0.38443	0.20669	0.21445	0.16270	0.11871

Table 4. Description of the regressors of the docvisits dataset.

Regressor	Description
age	Age of the people
agesq	Square of the age
health	Health satisfaction point between [0, 10]
handicap	A dummy variable (0–1) the handicapping situation
hdegree	The percentage of the handicap degree
married	A dummy variable (0–1) for the marital status
schooling	The number of years for schooling
hhincome	Household income per month
self	A dummy variable (0–1) for the self-employed status
civil	A dummy variable (0–1) for the civil servant
bluec	A dummy variable (0–1) for the blue-collar employee
employed	A dummy variable (0–1) for the employed status
public	A dummy variable (0–1) for the health insurance
addon	A dummy variable (0–1) for the add-on insurance

Table 5. Correlation matrix.

	Age	Agesq	Health	Handicap	Hdegree	Married	Schooling	Hhincome	Self	Civil	Bluec	Employed	Public	Addon
age	1
agesq	0.993	1
health	−0.242	−0.243	1
handicap	0.290	0.303	−0.313	1
hdegree	0.274	0.290	−0.325	0.913	1
married	0.447	0.414	−0.067	0.084	0.056	1
schooling	−0.078	−0.092	0.097	−0.154	−0.162	−0.055	1
hhincome	0.039	0.020	0.121	−0.113	−0.123	0.132	0.310	1
self	0.005	−0.007	0.048	−0.083	−0.082	0.022	0.075	0.106	1
civil	−0.004	−0.016	0.057	−0.043	−0.049	0.014	0.263	0.146	−0.106	1
bluec	−0.114	−0.122	0.021	−0.069	−0.094	0.040	−0.351	−0.124	−0.210	−0.229	1
employed	−0.207	−0.254	0.200	−0.311	−0.364	0.082	0.112	0.229	0.144	0.157	0.311	1
public	0.001	0.017	−0.077	0.078	0.084	0.010	−0.323	−0.247	−0.169	−0.639	0.275	−0.117	1
addon	0.006	0.003	0.002	−0.006	−0.017	0.024	0.051	0.022	0.028	−0.013	−0.037	0.049	0.056	1

Table 6. Regression coefficients of the negative binomial estimators and MSEs of the docvisit data.

Coefficient	MLE	LNBR	RNBR	KLNBR	JKLNBR
(Intercept)	3.37579	2.08590	2.25278	−0.71307	−0.21224
age	−0.06361	−0.00910	−0.01601	0.10531	0.10237
agesq	0.89263	0.27995	0.35745	−0.99982	−0.50932
health	−0.26084	−0.24793	−0.24971	−0.21675	−0.21561
handicap	0.15441	0.12473	0.12958	0.03661	0.00472
hdegree	0.00191	0.00305	0.00289	0.00606	0.00606
married	−0.11917	−0.16474	−0.15907	−0.25701	−0.23221
schooling	0.00166	0.00670	0.00597	0.01976	0.01963
hhincome	0.00665	0.00743	0.00735	0.00870	0.00867
self	−0.23726	−0.22652	−0.22812	−0.19742	−0.16125
civil	−0.05374	−0.02211	−0.02617	0.04492	0.03463
bluec	0.19122	0.20849	0.20629	0.24496	0.21506
employed	0.03120	−0.01845	−0.01260	−0.11041	−0.09170
public	0.15695	0.21742	0.20944	0.35222	0.28912
addon	0.42515	0.36128	0.37208	0.16040	0.03457
MSE	6.71325	4.13946	3.41933	0.76437	0.25113
K = 1.255031
d = 7.794738e-10

Table 7. Predictive performance of the negative binomial regression estimators.

Data	ML	LNBR	RNBR	KLNBR	JKLNBR
Test	30.6046	30.7806	30.7560	31.2358	26.5837
Train	22.1274	22.3768	22.3426	23.0007	19.4763

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Koç, T.; Koç, H. A New Effective Jackknifing Estimator in the Negative Binomial Regression Model. Symmetry 2023, 15, 2107. https://doi.org/10.3390/sym15122107

AMA Style

Koç T, Koç H. A New Effective Jackknifing Estimator in the Negative Binomial Regression Model. Symmetry. 2023; 15(12):2107. https://doi.org/10.3390/sym15122107

Chicago/Turabian Style

Koç, Tuba, and Haydar Koç. 2023. "A New Effective Jackknifing Estimator in the Negative Binomial Regression Model" Symmetry 15, no. 12: 2107. https://doi.org/10.3390/sym15122107

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Effective Jackknifing Estimator in the Negative Binomial Regression Model

Abstract

1. Introduction

2. Existing Estimators

2.1. Negative Binomial Regression Model MLE Estimator

2.2. Liu-Type Negative Binomial Regression Estimator (LNBR)

2.3. Ridge Negative Binomial Regression Estimator (RNBR)

3. Proposed Estimator

4. Comparison of Efficiency

5. Application

5.1. Simulation Studies

5.1.1. Simulation Study 1

5.1.2. Simulation Study 2

5.2. Real Dataset Example

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI