Prediction Interval for Compound Conway–Maxwell–Poisson Regression Model with Application to Vehicle Insurance Claim Data

Merupula, Jahnavi; Vaidyanathan, V. S.; Chesneau, Christophe

doi:10.3390/mca28020039

Open AccessArticle

Prediction Interval for Compound Conway–Maxwell–Poisson Regression Model with Application to Vehicle Insurance Claim Data

by

Jahnavi Merupula

¹,

V. S. Vaidyanathan

¹ and

Christophe Chesneau

^2,*

¹

Department of Statistics, Pondicherry University, Puducherry 605014, India

²

Department of Mathematics, LMNO, University of Caen, 14032 Caen, France

^*

Author to whom correspondence should be addressed.

Math. Comput. Appl. 2023, 28(2), 39; https://doi.org/10.3390/mca28020039

Submission received: 16 January 2023 / Revised: 15 February 2023 / Accepted: 27 February 2023 / Published: 9 March 2023

(This article belongs to the Special Issue Statistical Inference in Linear Models)

Download

Browse Figure

Versions Notes

Abstract

:

Regression models in which the response variable has a compound distribution have applications in actuarial science. For example, the aggregate claim amount in a vehicle insurance portfolio can be modeled using a compound Poisson distribution. In this paper, we propose a regression model, wherein the response variable is assumed to have a compound Conway–Maxwell–Poisson (CMP) distribution. This distribution is a parsimonious two-parameter Poisson distribution that accounts for both over- and under-dispersed count data, making it more suitable for application in various fields. A two-part methodology in the framework of a generalized linear model is proposed to estimate the parameters. Additionally, a method to obtain the prediction interval of the response variable is developed. The workings of the proposed methodology are illustrated through simulated data. An application of the compound CMP regression model to real-life vehicle insurance claims data is presented.

Keywords:

aggregate claims distribution; compound CMP regression model; generalized linear models; prediction intervals

1. Introduction

Compound regression models have applications in various research fields, including economics and finance. In economic consumer theory, for example, compound Poisson regression models are often used to examine the factors that account for the expenditures incurred by tourists during their stay at a location. The factors may include length of stay, type of holiday accommodations, age, occupation, socio-economic status of the tourist, etc. See Gómez-Déniz and Pérez-Rodríguez [1]. In actuarial risk theory, the aggregate claim amount incurred by the insurance company against the claims made by the policyholders is modeled using compound models. See Klugman et al. [2] and Bahnemann [3] for a detailed discussion on compound models, their distributional properties and applications in insurance claim modeling. Jørgensen and Paes De Souza [4] applied the compound Poisson regression model to determine the impact on the conditional mean of the aggregate claim amount caused by factors such as age and model of the vehicle, exposure, deductibles, etc., in the context of car insurance. In this paper, we propose a compound regression model using a two-parameter Poisson distribution. On this topic, some mathematical backgrounds are presented below in order to fix the notations. Let

S = \sum_{j = 1}^{N} Y_{j},

(1)

denote the random sum, where the distributions of the random variables N and

Y_{1}, Y_{2}, \dots, Y_{N}

are assumed to be discrete and continuous, respectively. Moreover,

(Y_{j})

s are assumed to be independent and identically distributed. Therefore, in the sequel, we refer to

Y_{j}

s as Y. Further, N and Y in general are assumed to be independent. The above-mentioned S is a compound random variable. Suppose

Y_{j}

represents the claim amounts on an insurance portfolio, N denotes the number of claims made, then S represents the aggregate claim amount. When N has a Poisson distribution, the distribution of S is known as the compound Poisson distribution. Though the Poisson distribution is often used in constructing compound distributions, it is not suitable for modeling over- or under-dispersed count data. As an alternative to the Poisson distribution, one can use a generalized Poisson distribution (Consul and Jain [5]) to model count data that are either over- or under-dispersed. Recently, Shmueli et al. [6] studied a two-parameter Poisson distribution developed by Conway and Maxwell [7] known as the Conway–Maxwell–Poisson (CMP) distribution. This is a two-parameter flexible generalization of the Poisson distribution that can model both over- and under-dispersed data and has the feature to include the Poisson, geometric and Bernoulli distributions as special cases. A detailed discussion on the properties of this distribution and its applications can be found in Sellers et al. [8]. Also, Sellers and Premeaux [9] contains a detailed review on CMP regression models. In the context of compound distributions, assuming the CMP and binomial distributions for N and Y in Equation (1), a discrete compound CMP-binomial distribution is developed by Saavithri et al. [10].

Considering the Poisson distribution as the counting distribution, compound Poisson regression models are available in the literature. See Frees et al. [11], Andersen and Bonat [12], and Delong et al. [13]. However, its applicability is limited to data with equi-dispersed counts. To allow for flexibility in the compound regression models in terms of accommodating dispersed counts, a counting distribution that can model both over- and under-dispersed data should be considered. This serves as motivation to use the CMP distribution as the counting distribution to build a compound regression model.

The goal of this work is to create a regression model for S using a CMP distribution for N. The present work is novel because of the distribution used for N and its convolution with the distribution of Y. The problem of obtaining prediction intervals for the response variable S is also addressed. The parameters of the compound regression model are estimated using the generalized linear model (GLM) approach in two cases. In the first case, we assume that data on S are available but not on N and Y. We assume data on both N and Y are available in the latter case. For this case, a two-part likelihood-based estimation procedure is developed within the framework of the GLM. A methodology to obtain the prediction interval (PI) for the response variable of the proposed compound regression model is developed.

The rest of the paper is organized as follows: The compound CMP regression model is given in Section 2. In Section 3, the estimation of the parameters of the proposed regression model using the GLM approach is discussed. Section 4 deals with the suggested methodology for obtaining the prediction intervals for the compound CMP regression model. A numerical illustration of the estimation procedure using simulated data and an application to real-life vehicle insurance claims data is presented in Section 5. The conclusion of the paper is given in Section 6.

2. Compound CMP Regression Model

The probability mass function (pmf) of the random variable N having the CMP distribution is given by

P (N = n) = \frac{λ^{n}}{{(n!)}^{ν} Z (λ, ν)}, n = 0, 1, 2, \dots, λ > 0, ν \geq 0,

(2)

where

Z (λ, ν) = \sum_{j = 0}^{\infty} λ^{j} / {(j!)}^{ν}

is the normalizing constant. Some important remarks on this distribution are given below. The parameters

λ

and

ν

are the location and dispersion parameters, respectively. This pmf is not defined for

λ \geq 1

and

ν = 0

. The mean and variance of N are given by

E (N) = λ \frac{\partial ln Z (λ, ν)}{\partial λ}

and

V (N) = λ \frac{\partial E (N)}{\partial λ}

, respectively. When

ν = 1

, the CMP distribution reduces to the Poisson distribution. For

ν > 1

, the distribution is under-dispersed, and for

ν < 1

, it is over-dispersed.

Since the location parameter

λ

of the CMP distribution does not represent its mean, a mean reparameterized form of the distribution is used in building the compound regression model. The pmf of N under the mean-reparametrization is given by

P (N = n) = {(μ_{1} + \frac{e^{ϕ} - 1}{2 e^{ϕ}})}^{n e^{ϕ}} \frac{{(n!)}^{- e^{ϕ}}}{Z (μ_{1}, ϕ)}, n = 0, 1, 2, \dots, μ_{1} > 0, ϕ \in R,

(3)

where

Z (μ_{1}, ϕ) = \sum_{j = 0}^{\infty} {(μ_{1} + \frac{e^{ϕ} - 1}{2 e^{ϕ}})}^{j e^{ϕ}} \frac{1}{{(j!)}^{e^{ϕ}}}

is the normalizing constant. When

ϕ = 0,

the distribution reduces to the Poisson distribution. For

ϕ > 0,

the distribution is under-dispersed, and for

ϕ < 0,

it is over-dispersed. See Ribeiro Jr et al. [14]. Here,

μ_{1} \approx λ^{1 / ν} - \frac{ν - 1}{2 ν}

corresponds to the mean of the distribution and

ϕ = ln (ν)

. This approximation works reasonably well for

ν \leq 1

or

λ > 10^{ν}

. The mean and variance of N are

E (N) = μ_{1}

and

V (N) = μ_{1} e^{- ϕ}

, respectively.

Convolutions can be used to obtain the probability density function (pdf) of the random sum S defined in Equation (1). In Equation (1),

N = 0

implies

S = 0

. Let

p_{0}

denote the probability mass at

S = 0

. Since S is not continuous at zero, the pdf of S is represented as a generalized pdf in terms of Dirac delta function as

f (s) = p_{0} δ (s) + \sum_{i = 1}^{\infty} g_{Y}^{* i} (s) P (N = i), s \geq 0,

(4)

where

δ (s)

is the Dirac delta function such that

\int_{0}^{\infty} δ (s) d s = 1 .

Here,

P (N = i)

denotes the pmf of the CMP distribution defined in Equation (3), and

g_{Y}^{* i} (.)

denotes the pdf of the i-fold convolution of Y, whose distribution is assumed to be continuous with support in

R^{+}

. Note that

p_{0} = P (N = 0) = Z {(μ_{1}, ϕ)}^{- 1}

. In this paper, the distribution of Y is considered to be a mean reparameterized gamma distribution. Based on Jorgensen [15] (Chapter 3), the pdf of Y is given by

g_{Y} (y; μ_{2}, ψ) = \frac{1}{Γ (ψ)} {(\frac{ψ}{μ_{2}})}^{ψ} y^{ψ - 1} exp (- \frac{ψ y}{μ_{2}}), y > 0, μ_{2} > 0, ψ > 0,

(5)

where

μ_{2}

denotes the mean of Y,

ψ

denotes the dispersion parameter and

Γ (.)

denotes the gamma function. This form is taken for mathematical convenience and to accommodate asymmetry in the distribution of Y. For example, in the context of insurance claim modeling, the individual claim amounts are always positive and often right-skewed. Since the gamma distribution is closed under convolution, we obtain

g_{Y}^{* i} (y) = \frac{1}{Γ (ψ)} {(\frac{ψ}{i μ_{2}})}^{ψ} y^{ψ - 1} exp (- \frac{ψ y}{i μ_{2}}), y > 0, μ_{2} > 0, ψ > 0 .

(6)

Using Equations (3) and (6) in Equation (4), we obtain

f (s) = p_{0} δ (s) + \frac{s^{ψ - 1} ψ^{ψ}}{Z (μ_{1}, ϕ) μ_{2}^{ψ} Γ (ψ)} \sum_{i = 1}^{\infty} {(μ_{1} + \frac{e^{ϕ} - 1}{2 e^{ϕ}})}^{i e^{ϕ}} \frac{{(i!)}^{- e^{ϕ}}}{i^{ψ}} exp (\frac{- ψ s}{i μ_{2}}), s \geq 0 .

(7)

The pdf of S defined in Equation (7) is called the compound CMP gamma pdf. For the random sum defined in Equation (1), we have

\begin{matrix} \{\begin{matrix} E (S) = E (N) E (Y), \\ V (S) = E (N) V (Y) + V (N) {[E (Y)]}^{2} . \end{matrix} \end{matrix}

(8)

See, for instance, Bahnemann [3] (Chapter 4). Using Equation (8), the mean and variance of the compound CMP gamma distribution given in Equation (7) are obtained as

\{\begin{matrix} E (S) = μ_{1} μ_{2}, \\ V (S) = μ_{2}^{2} μ_{1} [ψ^{- 1} + e^{- ϕ}] . \end{matrix}

(9)

To build a compound regression model for S, let

X = (\vec{1}, {\vec{X}}_{1}, {\vec{X}}_{2}, \dots, {\vec{X}}_{p})

denote the design matrix where

{\vec{X}}_{i}, i = 1, 2, \dots, p

are the column vectors corresponding to the covariates

X_{i}, i = 1, 2, \dots, p

and

\vec{1}

is the vector of

1^{'} s

. Following the GLM procedure given in De Jong et al. [16] (Chapter 5), the model is built by regressing S on X using the log-link function. This is because the log-link function guarantees that the expected value of the response variable is positive. Let

μ

denote the expected value of S. Then, the compound CMP gamma regression model is given by

μ = exp (X δ),

(10)

where

δ = {(δ_{0}, δ_{1}, \dots, δ_{p})}^{'}

is a

(p + 1) \times 1

vector of regression parameters. In the context of modeling vehicle insurance claims data, S may denote the aggregate claim amount, and the covariates may denote the driver’s age, vehicle type, and so on. In the sequel, the method of estimating the regression parameters using the likelihood approach is discussed.

3. Parameter Estimation

Consider a sample

\vec{s} = {(s_{1}, s_{2}, \dots, s_{r})}^{'}

of r observations on S. Let

D (> 0)

positive values in

\vec{s}

and

r - D

zeros exist. Note that D can be assimilated to be random and

D \sim B i n o m i a l (r, 1 - p_{0})

, where

p_{0} = Z {(μ_{1}, ϕ)}^{- 1}

. Therefore, the likelihood function L based on

\vec{s}

and

D = d

is

\begin{matrix} L & = (\binom{r}{d}) p_{0}^{r - d} {(1 - p_{0})}^{d} \prod_{k = 1}^{d} f (s_{k}^{+}) \\ = (\binom{r}{d}) {(\frac{1}{Z (μ_{1}, ϕ)})}^{r - d} {(1 - \frac{1}{Z (μ_{1}, ϕ)})}^{d} \prod_{k = 1}^{d} f (s_{k}^{+}), \end{matrix}

(11)

where

f (s_{k}^{+}) = \frac{s_{k}^{ψ - 1} ψ^{ψ}}{(Z (μ_{1}, ϕ) - 1) μ_{2}^{ψ} Γ (ψ)} \sum_{i = 1}^{\infty} {(μ_{1} + \frac{e^{ϕ} - 1}{2 e^{ϕ}})}^{i e^{ϕ}} \frac{{(i!)}^{- e^{ϕ}}}{i^{ψ}} exp (\frac{- ψ s_{k}}{i μ_{2}}) .

Thus, the log-likelihood function l based on

\vec{s}

and

D = d

is obtained as

\begin{matrix} l (μ_{1}, μ_{2}, ϕ, ψ; \vec{s}) & = ln ((\binom{r}{d})) - r ln (Z (μ_{1}, ϕ)) + (ψ - 1) \sum_{k = 1}^{d} ln (s_{k}) - \sum_{k = 1}^{d} ψ ln (μ_{2}) + d ψ ln (ψ) \\ - d ln (Γ (ψ)) + \sum_{k = 1}^{d} ln [\sum_{i = 1}^{\infty} {(μ_{1} + \frac{e^{ϕ} - 1}{2 e^{ϕ}})}^{i e^{ϕ}} \frac{{(i!)}^{- e^{ϕ}}}{i^{ψ}} exp (\frac{- ψ s_{k}}{i μ_{2}})] . \end{matrix}

(12)

Since

E (N) = μ_{1}

and

E (Y) = μ_{2},

from Equation (9), we obtain

μ = μ_{1} μ_{2}

. Let the elements of the design matrix X be

x_{k l}, l = 0, 1, \dots, p; k = 1, 2, \dots, d

with the

k^{t h}

row given by

x_{k} = (1, x_{k 1}, x_{k 2}, \dots, x_{k p}) .

Replacing

μ_{2}

with

\frac{μ}{μ_{1}}

and

μ

with

exp (X δ)

in Equation (12), the log-likelihood function based on

\vec{s}

and

D = d

becomes

\begin{matrix} l (δ, μ_{1}, ϕ, ψ; \vec{s}) & = ln ((\binom{r}{d})) - r ln (Z (μ_{1}, ϕ)) + (ψ - 1) \sum_{k = 1}^{d} ln (s_{k}) - \sum_{k = 1}^{d} ψ ln (\frac{e^{\sum_{l = 0}^{p} x_{k l} δ_{l}}}{μ_{1}}) \\ + d ψ ln (ψ) - d ln (Γ (ψ)) + \sum_{k = 1}^{d} ln [\sum_{i = 1}^{\infty} \{{(μ_{1} + \frac{e^{ϕ} - 1}{2 e^{ϕ}})}^{i e^{ϕ}} \frac{{(i!)}^{- e^{ϕ}}}{i^{ψ}} exp (\frac{- ψ s_{k} μ_{1}}{i e^{\sum_{l = 0}^{p} x_{k l} δ_{l}}})\}] . \end{matrix}

(13)

The maximum likelihood (ML) estimates of the parameters in Equation (13) can be obtained by solving the (

p + 4

) log-likelihood equations simultaneously. However, these equations are non-linear, and therefore closed-form solutions cannot be obtained. Hence, iterative algorithms based on numerical methods can be used to solve the equations to get the estimates for the parameters. Let

\hat{δ}

denote the ML estimate of

δ

. By the asymptotic property of the ML estimators, for large r, the following distribution approximation holds:

Σ_{δ}^{1 / 2} (\hat{δ} - δ) \sim N_{p + 1} (0, I),

where

δ

and

Σ_{δ}

denote the mean vector and the covariance matrix of

\hat{δ}

, respectively. Using Equation (10), an estimate of the expected value of S given the covariates X can be obtained as

\hat{μ} = exp (X \hat{δ}) .

Assume that data on S are unavailable, but data on N and Y are. This can happen in such situations as, for example, when modeling the aggregate claim amount when one has data on the claim frequency (N) and the individual claim amounts (Y). Using N and Y, we can compute the value of S and then build the regression model using the method described above. However, it is computationally more challenging to compute the estimates due to the presence of an infinite sum in the log-likelihood function. To reduce the computational difficulty, we can use N and Y to build two separate regression models to obtain

\hat{μ}

. Towards this, a two-part GLM methodology is proposed to estimate

μ

assuming N and Y to be (1) independent and (2) dependent.

3.1. Independent Compound Regression Model

Using Equation (9), we have

μ = μ_{1} μ_{2}

. The proposed two-part GLM method is implemented by building two separate regression models, namely, the CMP regression model and the gamma regression model, for the means of N and Y, respectively. Given the data on

N, Y

and X, the estimated mean of S is computed as

\hat{μ} = {\hat{μ}}_{1} {\hat{μ}}_{2}

. Here,

{\hat{μ}}_{1}

and

{\hat{μ}}_{2}

are obtained by regressing N and Y separately on X. Using the log-link function, we have

μ_{1} = E (N) = e^{X α}, μ_{2} = E (Y) = e^{X β},

where

α = {(α_{0}, α_{1}, \dots, α_{p})}^{'}

and

β = {(β_{0}, β_{1}, \dots, β_{p})}^{'}

denote the set of regression parameters.

Let

\vec{n} = {(n_{1}, \dots, n_{m})}^{'}

denote m observations on N. For each

n_{k} > 0

, let there be

n_{k}

observations on Y denoted by

y_{k j}, j = 1, 2, \dots, n_{k}, k = 1, 2, \dots, m

. Let

\vec{\bar{y}} = {({\bar{y}}_{1}, {\bar{y}}_{2}, \dots, {\bar{y}}_{m})}^{'}

where

{\bar{y}}_{k} = \{\begin{matrix} \sum_{j = 1}^{n_{k}} y_{k j} / n_{k} & if n_{k} > 0 \\ 0 & if n_{k} = 0 . \end{matrix}

Let the design matrix X be of order

m \times (p + 1)

with elements

x_{k l}, k = 1, 2, \dots, m; l = 0, 1, \dots, p

. Since the distribution of Y has positive support, zeros in

\vec{\bar{y}},

if any, are not to be considered. The corresponding sample observation in

\vec{\bar{y}}

and the observed covariate matrix X are not included when building the gamma regression model. Let q denote the number of observations for which

{\bar{y}}_{k} = 0, k = 1, 2, \dots, m

and let

t = m - q

. Following Garrido et al. [17], the distribution of

Y \sim g a m m a (μ_{2}, ψ)

is equivalent to

\bar{Y} | N \sim g a m m a (μ_{2}, \frac{ψ}{N})

for independently identically distributed

Y_{1}, \dots, Y_{N}

. Using the pmf of N given in Equation (3) with

μ_{1} = e^{X α}

, the corresponding log-likelihood function is given by

l (α, ϕ; \vec{n}) = \sum_{k = 1}^{m} e^{ϕ} [n_{k} ln (e^{\sum_{l = 0}^{p} x_{k l} α_{l}} + \frac{e^{ϕ} - 1}{2 e^{ϕ}}) - ln (n_{k}!)] - \sum_{k = 1}^{m} ln (Z (e^{\sum_{l = 0}^{p} x_{k l} α_{l}}, ϕ)) .

(14)

The ML estimates for the

(p + 1)

regression parameters are obtained by simultaneously solving the corresponding log-likelihood equations. Let

\hat{α} = {(\hat{α_{0}}, \hat{α_{1}}, \dots, \hat{α_{p}})}^{'}

denote the ML estimate of

α

. Then the ML estimate of

μ_{1}

is obtained as

{\hat{μ}}_{1} = e^{X \hat{α}}

. In similar lines, the ML estimate of

β

, namely,

\hat{β} = {(\hat{β_{0}}, \hat{β_{1}}, \dots, \hat{β_{p}})}^{'}

, is obtained using the likelihood function corresponding to the conditional pdf of

\bar{Y}

given

N = n

. The conditional pdf is given by

f (\bar{y} | n; μ_{2}, ψ) = \frac{1}{Γ (ψ / n)} {(\frac{ψ / n}{μ_{2}})}^{ψ / n} {\bar{y}}^{(ψ / n) - 1} exp (- \frac{ψ \bar{y}}{n μ_{2}}), \bar{y} > 0 .

(15)

Taking

μ_{2} = e^{X β}

in Equation (15), the log-likelihood function is obtained as

l (β, ψ; \vec{\bar{y}}) = - t ln (Γ (\frac{ψ}{n})) + \frac{t ψ}{n} ln (\frac{ψ}{n}) + \sum_{k = 1}^{t} [(\frac{ψ}{n} - 1) ln ({\bar{y}}_{k}) - \frac{ψ {\bar{y}}_{k}}{n e^{\sum_{l = 0}^{p} x_{k l} β_{l}}} - \frac{ψ}{n} \sum_{l = 0}^{p} x_{k l} β_{l}] .

(16)

The likelihood equations for

α

and

β

are, respectively, given by

\sum_{k = 1}^{m} x_{k l} (n_{k} - e^{\sum_{l = 0}^{p} x_{k l} α_{l}}) = 0

(17)

and

\sum_{k = 1}^{t} \frac{x_{k l} n_{k}}{e^{\sum_{l = 0}^{p} x_{k l} β_{l}}} ({\bar{y}}_{k} - e^{\sum_{l = 0}^{p} x_{k l} β_{l}}) = 0, l = 0, 1, \dots, p .

(18)

Since Equations (17) and (18) are non-linear, iterative procedures can be used to solve them. As an alternate, one can use the in-built functions cmp() and glm(., family=“gamma”) available in R to obtain

\hat{α}

and

\hat{β}

. Using

\hat{α}

and

\hat{β}

, the ML estimate of the expected value of S, namely,

\hat{μ} = {\hat{μ}}_{1} {\hat{μ}}_{2}

, can be computed. By the asymptotic property of the ML estimators, we have

Σ_{α}^{1 / 2} (\hat{α} - α) \sim N_{p + 1} (0, I)

and

Σ_{β}^{1 / 2} (\hat{β} - β) \sim N_{p + 1} (0, I) .

Here,

α

and

Σ_{α}

denote the mean vector and covariance matrix of

\hat{α}

, respectively. Similarly,

β

and

Σ_{β}

denote the mean vector and covariance matrix of

\hat{β}

, respectively. The standard errors of

\hat{α}

and

\hat{β}

are the square root of the diagonal elements of the corresponding covariance matrices. Since

\hat{α}

and

\hat{β}

do not have closed-form expressions, their standard errors can be obtained using the sample Hessian matrix. The sample Hessian matrices of

\hat{α}

and

\hat{β}

, namely,

H_{\hat{α}}

and

H_{\hat{β}}

, are given by

H_{\hat{α}} = e^{\hat{ϕ}} e^{X \hat{α}} X X^{'}

and

H_{\hat{β}} = \hat{ψ} X X^{'}

, respectively. Since the expressions of the standard errors of the parameters

α

and

β

contain the dispersion parameters

ϕ

and

ψ

, respectively, they may be estimated using the following formulas:

\hat{ϕ} = ln \{(m - (p + 1)) \sum_{k = 1}^{m} \frac{{\hat{μ}}_{1 k}}{{(n_{k} - {\hat{μ}}_{1 k})}^{2}}\}

(19)

and

\hat{ψ} = \frac{1}{(t - (p + 1))} \sum_{k = 1}^{t} {(\frac{{\bar{y}}_{k} - {\hat{μ}}_{2 k}}{{\hat{μ}}_{2 k}})}^{2},

(20)

where

{\hat{μ}}_{1 k}

and

{\hat{μ}}_{2 k}

are the estimated values of

μ_{1}

and

μ_{2}

, respectively, corresponding to the

k^{t h}

observation.

3.2. Dependent Compound Regression Model

Although independence between N and Y is commonly assumed in compound regression models, it is rarely observed in practice. For instance, in the framework of modeling the aggregate claim amounts, it is typical to observe that the claim amounts depend on the claim frequency as well. See, for example, the work of Garrido et al. [17]. As a result, N is included as a covariate in the regression model of

\bar{Y}

. Let

θ

represent the regression parameter associated with N. Since S denotes a random sum, it can be written as

S = N \bar{Y}

. The GLM of S through the log-link function is given by Garrido et al. [17] as

μ = e^{X β} M_{N}^{'} (θ),

where

M_{N}^{'} (θ)

represents the derivative of the moment generating function of N with respect to

θ

. Taking N as CMP,

M_{N}^{'} (θ)

is obtained as

M_{N}^{'} (θ) = \sum_{n = 0}^{\infty} n e^{θ n} {(μ_{1} + \frac{e^{ϕ} - 1}{2 e^{ϕ}})}^{n e^{ϕ}} \frac{{(n!)}^{- e^{ϕ}}}{Z (μ_{1}, ϕ)} .

Note that if

θ = 0

, i.e., when N is independent of

\bar{Y}

,

M_{N}^{'} (θ) = E (N)

, and thus the dependent compound regression model will coincide with the independent compound regression model. The pdf of S under dependent case is given by

f_{S} (s) = f_{\bar{Y} | N} (\bar{y} | n) f_{N} (n),

where

f_{\bar{Y} | N} (\bar{y} | n)

is indicated in Equation (15) with

μ_{2} = μ_{θ}

and

ψ = ψ_{θ}

. The corresponding log-likelihood function is

l (α, β, ϕ, ψ, θ) = l (α, ϕ; \vec{n}) + l (β, ψ, θ; \vec{\bar{y}} | \vec{n}),

where

l (α, ϕ; \vec{n})

corresponds to Equation (14). Let the ML estimates of

α, β

and

θ

be denoted as

\tilde{α}, \tilde{β}

and

\tilde{θ}

, where

\tilde{α}

is obtained using Equation (17). The function

l (β, ψ, θ; \vec{\bar{y}} | \vec{n})

corresponds to Equation (16) with

μ_{2}

replaced with

μ_{θ}

. To obtain the estimates of

β

and

θ

, the GLM of

E (\bar{Y} | N, X)

is used with the log-link function and is defined by

μ_{θ} = e^{X β + θ N}

. The corresponding likelihood equations of the regression parameters are

\sum_{k = 1}^{t} \frac{n_{k} x_{k l}}{e^{\sum_{l = 0}^{p} x_{k l} β_{l} + θ n_{k}}} (\bar{y_{k}} - e^{\sum_{l = 0}^{p} x_{k l} β_{l} + θ n_{k}}) = 0

(21)

and

\sum_{k = 1}^{t} \frac{n_{k}^{2}}{e^{\sum_{l = 0}^{p} x_{k l} β_{l} + θ n_{k}}} (\bar{y_{k}} - e^{\sum_{l = 0}^{p} x_{k l} β_{l} + θ n_{k}}) = 0, l = 0, 1, \dots, p .

(22)

The dispersion parameter

ψ_{θ}

can be estimated using

{\hat{ψ}}_{θ} = \frac{1}{(t - (p + 1))} \sum_{k = 1}^{t} {(\frac{{\bar{y}}_{k} - {\hat{μ}}_{θ k}}{{\hat{μ}}_{θ k}})}^{2},

where

{\hat{μ}}_{θ k}

is the estimated value of

μ_{θ}

corresponding to the

k^{t h}

observation. In addition,

\tilde{β}

and

\tilde{θ}

can be obtained by solving Equations (21) and (22) through iterative algorithms. Thus, the estimate of

μ

is given by

\tilde{μ} = e^{X \tilde{β}} M_{N}^{'} (\tilde{θ})

. Denote

β_{θ} = {[\begin{matrix} β \\ θ \end{matrix}]}_{(p + 2) \times 1}

and its ML estimate as

{\tilde{β}}_{θ} = {[\begin{matrix} \tilde{β} \\ \tilde{θ} \end{matrix}]}_{(p + 2) \times 1} .

By the asymptotic property of the ML estimators, we have

Σ_{β_{θ}}^{1 / 2} (\tilde{β_{θ}} - β_{θ}) \sim N_{p + 2} (0, I) .

Here,

β_{θ}

and

Σ_{β_{θ}}

denote the mean vector and covariance matrix of

\tilde{β_{θ}}

, respectively. The standard error of

\tilde{β_{θ}}

corresponds to the square root of the diagonal elements of the sample Hessian matrix, which is given by

H_{{\tilde{β}}_{θ}} = {\hat{ψ}}_{θ} X^{*'} A X^{*},

where

X^{*}

is a matrix of order

t \times (p + 2)

that denotes the design matrix which includes

\vec{n} .

A is a

t \times t

diagonal matrix with positive elements of

\vec{n}

. Note that

H_{\tilde{α}} = H_{\hat{α}} .

4. Prediction Intervals

From the estimates of the regression parameters, we can obtain an estimate of the expected value of S for some fixed values of the covariates. Given the covariates, it is frequently useful to predict the actual value of S. In a regression setup, the actual value of S is related to its expected value as

S = \hat{E} (S | X) + ϵ,

where

ϵ

is the error term. Since

ϵ

is unobserved, it is not possible to predict the actual S. In contrast, the prediction interval is a constructed interval that contains the predicted value of actual S. In this section, a method for calculating the PI for S is proposed. Let

S_{0}

denote the response given the covariate

x_{0} = (1, x_{01}, \dots, x_{0 p})

. Thus, we have

S_{0} = \hat{E} (S_{0} | x_{0}) + ϵ

, where

\hat{E} (S_{0} | x_{0}) = exp (x_{0} \hat{δ}) = {\hat{μ}}_{0}

(say). Assuming

E (ϵ) = 0,

we get,

E (S_{0}) = {\hat{μ}}_{0}

. Additionally, we have

V (S_{0}) = V ({\hat{μ}}_{0}) + V (ϵ)

. Hence, the

100 (1 - α) %

PI for

S_{0}

is given by

[k_{1}, k_{2}]

, such that

P [k_{1} \leq S_{0} \leq k_{2}] = 1 - α,

(23)

where

α \in (0, 1)

. Here,

k_{1}

and

k_{2}

correspond, respectively, to the lower

{(\frac{α}{2})}^{t h}

and upper

{(\frac{α}{2})}^{t h}

percentiles of the distribution of

S_{0}

, which is the compound CMP gamma distribution with mean

E (S_{0})

and variance

V (S_{0})

. Since

V (S_{0})

depends on

V ({\hat{μ}}_{0}),

we proceed as below to obtain an expression for

V ({\hat{μ}}_{0})

. To begin, consider

\begin{matrix} {\hat{μ}}_{0} = exp (x_{0} \hat{δ}) \Rightarrow & ln ({\hat{μ}}_{0}) = x_{0} \hat{δ} . \end{matrix}

(24)

Using the Taylor series expansion of

ln (A)

at

E (A)

, we have

\begin{matrix} ln (A) & \approx ln (E (A)) + (A - E (A)) \frac{1}{E (A)} . \end{matrix}

Thus, we have

\begin{matrix} E (ln (A)) \approx ln (E (A)) \end{matrix}

and

\begin{matrix} V (ln (A)) \approx \frac{V (A)}{E {(A)}^{2}} . \end{matrix}

Taking A to be

{\hat{μ}}_{0}

in Equations (25) and (26), we obtain

E (ln ({\hat{μ}}_{0})) \approx ln E ({\hat{μ}}_{0})

and

V (ln ({\hat{μ}}_{0})) \approx \frac{V ({\hat{μ}}_{0})}{E {({\hat{μ}}_{0})}^{2}}

. From Equation (24), we establish that

\begin{matrix} E (ln ({\hat{μ}}_{0})) & \approx E (x_{0} \hat{δ}) = x_{0} E (\hat{δ}) \\ \Rightarrow E ({\hat{μ}}_{0}) & \approx exp (x_{0} E (\hat{δ})) = exp (x_{0} δ) = μ_{0} . \end{matrix}

In a similar manner, we obtain

\begin{matrix} V ({\hat{μ}}_{0}) & \approx V (ln ({\hat{μ}}_{0})) E {({\hat{μ}}_{0})}^{2} = V (x_{0} \hat{δ}) μ_{0}^{2} = x_{0} V (\hat{δ}) x_{0}^{'} μ_{0}^{2} \\ = x_{0} diag (Σ_{δ}) x_{0}^{'} μ_{0}^{2} . \end{matrix}

An estimate of

V (ϵ),

namely,

\hat{V} (ϵ),

can be obtained by dividing the residual sum of squares (RSS) of the compound CMP regression model by

m - (p + 1)

. Using

V ({\hat{μ}}_{0})

and

\hat{V} (ϵ),

we obtain

V (S_{0})

. However, obtaining the values of

k_{1}

and

k_{2}

from Equation (23) is not easy since the cumulative distribution function of the compound CMP gamma distribution is not invertible. One may use bootstrap procedures to identify

k_{1}

and

k_{2}

. We propose below a heuristic method to obtain the PI using the two-part GLM methodology given in the previous section.

The PI for

S_{0}

is obtained using the PIs of

N_{0}

and

\bar{Y_{0}}

, where

N_{0} = \hat{E} (N_{0} | x_{0}) + ϵ

and

\bar{Y_{0}} = \hat{E} (\bar{Y_{0}} | x_{0}) + ϵ

. Note that

\hat{E} (N_{0} | x_{0})

is obtained from the GLM of N on X and

\hat{E} (\bar{Y_{0}} | x_{0})

is obtained using the GLM of

\bar{Y}

on X. Denoting

\hat{E} (N_{0} | x_{0}) = {\hat{μ}}_{01}

and

\hat{E} (\bar{Y_{0}} | x_{0}) = {\hat{μ}}_{02},

we have,

{\hat{μ}}_{01} = exp (x_{0} \hat{α})

and

{\hat{μ}}_{02} = exp (x_{0} \hat{β})

. Proceeding along similar lines for obtaining the PI for

S_{0}

, the PIs for

N_{0}

and

\bar{Y_{0}}

can be obtained, respectively, as

[a_{1}, a_{2}]

and

[b_{1}, b_{2}]

, such that

P [a_{1} \leq N_{0} \leq a_{2}] = 1 - α

and

P [b_{1} \leq \bar{Y_{0}} \leq b_{2}] = 1 - α,

where

α \in (0, 1) .

Since

N_{0}

has a mean reparameterized CMP distribution given in Equation (3),

a_{1}

and

a_{2}

are respectively, the lower

{(\frac{α}{2})}^{t h}

and upper

{(\frac{α}{2})}^{t h}

percentiles of the CMP distribution with mean

{\hat{μ}}_{01}

and dispersion parameter

ϕ = \frac{{\hat{μ}}_{01}}{V ({\hat{μ}}_{01}) + \hat{V} (ϵ)}

, where

V ({\hat{μ}}_{01}) = x_{0} diag (Σ_{α}) x_{0}^{'} μ_{01}^{2}

. Likewise,

b_{1}

and

b_{2}

correspond respectively, to the lower

{(\frac{α}{2})}^{t h}

and upper

{(\frac{α}{2})}^{t h}

percentiles of the mean reparameterized gamma distribution given in Equation (15) with mean

{\hat{μ}}_{02}

and dispersion parameter

ψ = \frac{V ({\hat{μ}}_{02}) + \hat{V} (ϵ)}{{\hat{μ}}_{02}^{2}},

where

V ({\hat{μ}}_{02}) = x_{0} diag (Σ_{β}) x_{0}^{'} μ_{02}^{2}

. Supposing

Σ_{α}

and

Σ_{β}

are not known, the corresponding sample Hessian matrices can be used to compute

V ({\hat{μ}}_{01})

and

V ({\hat{μ}}_{02})

. The values of

\hat{V} (ϵ)

of the CMP and gamma regression models can be obtained by dividing the RSS of the corresponding regression models by

m - h

and

t - h

, where h denotes the number of regression parameters in the model.

The PI for

S_{0}

given

x_{0}

can be constructed using the PIs of

N_{0}

and

\bar{Y_{0}}

. By virtue of equality

S = N \bar{Y},

a trivial PI for

S_{0}

given

x_{0}

can be taken to be

[k_{1}, k_{2}] = [a_{1} b_{1}, a_{2} b_{2}]

. When N is large, it may be useful to know the PI for

S_{0}

. For example, in modeling aggregate claim amounts from insurance data, the company may want to know the PI for the aggregate claim amount for high claim frequencies so that enough funds can be maintained. In this case, the PI for

S_{0}

given

x_{0}

can be defined as

[a_{2} b_{1}, a_{2} b_{2}]

. This definition of PI is used in the remaining part.

5. Numerical Illustration

5.1. Simulation Study

This section provides a numerical illustration of how to compute the PI for S using simulated data for the independent and dependent compound regression models. To generate random samples from the CMP and gamma regression models with a single covariate

{\vec{X}}_{1} = {(x_{11}, x_{21}, \dots, x_{m 1})}^{'}

, generated from a standard normal distribution, the following steps are implemented:

Generate $n_{k}, k = 1, 2, \dots, m$ , from the CMP distribution given in Equation (3) with mean $μ_{1 k} = exp (α_{0} + α_{1} x_{k 1})$ by fixing $α_{0}, α_{1}$ and $ϕ$ . Obtain $\vec{n} = {(n_{1}, n_{2}, \dots, n_{m})}^{'}$ .
For each $n_{k} > 0,$ generate $y_{k j}, j = 1, 2, \dots, n_{k}$ from the gamma distribution given in Equation (5) with mean $μ_{2 k}$ by fixing $ψ, β_{0}, β_{1},$ and $θ,$ where $μ_{2 k} = exp (β_{0} + β_{1} x_{k 1})$ for the independent compound regression model and $exp (β_{0} + β_{1} x_{k 1} + θ n_{k})$ for the dependent compound regression model. Compute $\bar{y_{k}}$ and obtain $\vec{\bar{y}} = {({\bar{y}}_{1}, {\bar{y}}_{2}, \dots, {\bar{y}}_{m})}^{'}$ .

For simulation, the values of the regression parameters are taken as

α_{0} = 0.5, α_{1} = 0.3, β_{0} = 1, β_{1} = 0.5

and

θ = 0.5

. The dispersion parameter

ψ

of the gamma distribution is set to 1.5. To accommodate over-, equi- and under-dispersion in

N,

three choices of the dispersion parameter

ϕ,

namely,

ϕ = - 1.6, 0,

and

1.6

, are considered. The CMP and gamma GLMs are fitted to the generated

\vec{n}

and

\vec{\bar{y}}

values, using their respective log-link functions for both the independent and dependent compound regression models. All the computations are carried out in R (version 4.1.1). The cmp() function in cmpreg package (Ribeiro Jr [18]) and the glm() function are used to carry out the CMP and gamma regression, respectively. To compute the value of

M_{N}^{'} (\hat{θ})

in the dependent compound regression model, the com.expectation() function in compoisson package is employed. qcom() function in the compoisson package is used to determine the quantile values from the CMP distribution and the function qgammaAlt() in the EnvStats package is used to determine quantile values from the gamma distribution. For the above choices of the parameters, the

95 %

PI for S is obtained for the independent and dependent compound regression models under three choices of sample size (m), namely,

m = 25, 50

and 100. The actual S observations, denoted by

\vec{s} = {(s_{1}, s_{2}, \dots, s_{m})}^{'}

, are computed by

s_{k} = n_{k} {\bar{y}}_{k}, k = 1, 2, \dots, m

.

The proportion of

\vec{s}

lying within its PI is presented in Table 1 for the various choices of m and

ϕ

. Additionally, the plots of the corresponding prediction bands are displayed in Table 2 and Table 3. From Table 1, it can be observed that, for the choices of the covariate and coefficients considered, the proportion is large for

ϕ = 1.6

in the independent compound regression model and for

ϕ = - 1.6

in the dependent compound regression model.

5.2. Real-Life Application

In this section, the proposed two-part methodology to obtain the PI for the compound CMP gamma regression is applied to real-life vehicle insurance claims data. The dataset pertains to the average damage claims for privately owned and insured vehicles in Britain in the year 1975. See Dutang and Charpentier [19]. It consists of 128 observations on five variables, namely, the owner’s age (

X_{1}

), car age (

X_{2}

), model (

X_{3}

), number of claims (N) and average claim amount (

\bar{Y}

) in pounds. The variable

X_{1}

consists of eight categories of age group; the variable

X_{2}

, four categories of car age; and the variable

X_{3}

, four categories of model. The aggregate claim amount (S) for each observation is obtained by multiplying the average claim amount by the number of claims. A dispersion test on N, performed using the function dispersiontest() available in R under AER package, resulted in a dispersion index of

119.8246

and a p-value of 2.091

\times 10^{- 6}

, indicating that N is over-dispersed. Similarly, the Kolmogorov–Smirnov test on

\bar{Y}

yielded a p-value of

0.7191

to assess the goodness-of-fit of the gamma distribution. As a result, the CMP distribution can be used to model N, whereas the gamma distribution can be used to model

\bar{Y}

. To implement the proposed estimation methodology and validate its performance,

80 %

of the observations are randomly chosen as training data and the rest

20 %

as test data. The observations in the training data are used to fit the independent and dependent compound regression models. The owner’s age, car age and car model are the considered covariates in the model. The in-built functions cmp() function in cmpreg package and the glm() function are used to obtain the estimates of CMP and gamma regression models, respectively. The estimates of the regression parameters, their corresponding p-values (in parenthesis) and the AIC values are given in Table 4. Using the AIC values for the CMP and gamma regression models, the combined AIC values for the compound regression models are obtained as

2110.31

and

2108.31

, respectively. For each observation in the test data, the PI for S is computed using the estimates of the fitted model. The corresponding prediction band of the independent and dependent compound regression model is displayed in Figure 1. From this figure, it can be noted that some observations do not fall within the prediction band. One reason for this is that these observations have large claim frequencies when compared with the other observations, and the corresponding limits of the PI based on the CMP regression are also large. As a result, the limits of the PI of such observations deviate from their observed values. The proportion of observed S in the test data lying within its PI is found to be

0.4782

and

0.6956

for the independent and dependent compound regression models, respectively. Based on the combined AIC values and the proportions, it can be inferred that the dependent compound regression model provides a relatively better fit for modeling the aggregate claim amount.

6. Conclusions

The Poisson distribution is generally used in compound regression models as the counting distribution. In practice, the Poisson distribution’s equi-dispersion assumption is frequently violated. The methodology presented in this paper provided a way to handle non-equi-dispersed count data in the context of compound regression models by using the CMP distribution. The proposed compound regression model can be used when the count data are over- or under-dispersed. The estimation of the parameters was carried out using a two-part GLM approach for the independent and dependent compound regression models. This approach is less complex and provides separate estimates for the count and the continuous distribution involved in the model. Since, in practice, knowledge of the actual value of the response variable rather than its predicted value is more useful, a methodology to obtain the prediction interval of the response variable was proposed. An application of the two-part GLM method to real-life data revealed that the dependent compound regression model performs relatively better than the independent compound regression model. Thus, in practice, one can start with the dependent compound regression model and look for the significance of the count variable in the model. If the count variable is found to be not significant, then the independent compound regression model can be used. To conclude, the proposed compound CMP regression model could be an alternative to modeling a compound random variable when the count data are not equi-dispersed.

Author Contributions

J.M. has contributed to the conceptualization, methodology, mathematical derivation and simulation. V.S.V. and C.C. have contributed equally to mathematical derivation and original draft preparation. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gómez-Déniz, E.; Pérez-Rodríguez, J.V. Modelling distribution of aggregate expenditure on tourism. Econ. Model. 2019, 78, 293–308. [Google Scholar] [CrossRef]
Klugman, S.A.; Panjer, H.H.; Willmot, G.E. Loss Models: From Data to Decisions; John Wiley & Sons: New York, NY, USA, 2012; Volume 715. [Google Scholar]
Bahnemann, D. Distributions for Actuaries; Casualty Actuarial Society: Arlington, VA, USA, 2015; Volume 2. [Google Scholar]
Jørgensen, B.; Paes De Souza, M.C. Fitting Tweedie’s compound Poisson model to insurance claims data. Scand. Actuar. J. 1994, 1994, 69–93. [Google Scholar] [CrossRef]
Consul, P.C.; Jain, G.C. A generalization of the Poisson distribution. Technometrics 1973, 15, 791–799. [Google Scholar] [CrossRef]
Shmueli, G.; Minka, T.P.; Kadane, J.B.; Borle, S.; Boatwright, P. A useful distribution for fitting discrete data: Revival of the Conway-Maxwell-Poisson distribution. J. R. Stat. Soc. Ser. (Appl. Stat.) 2005, 54, 127–142. [Google Scholar] [CrossRef]
Conway, R.W.; Maxwell, W.L. A queuing model with state dependent service rates. J. Ind. Eng. 1962, 12, 132–136. [Google Scholar]
Sellers, K.F.; Borle, S.; Shmueli, G. The COM-Poisson model for count data: A survey of methods and applications. Appl. Stoch. Model. Bus. Ind. 2012, 28, 104–116. [Google Scholar] [CrossRef]
Sellers, K.F.; Premeaux, B. Conway-Maxwell-Poisson regression models for dispersed count data. Wiley Interdiscip. Rev. Comput. Stat. 2021, 13, e1533. [Google Scholar] [CrossRef]
Saavithri, V.; Priyadharshini, J.; Banu, Z.P. Compound COM-Poisson Distribution with Binomial Compounding Distribution. Available online: https://www.internationaljournalssrg.org/uploads/specialissuepdf/ICRMIT/2018/MTT/ICRMIT-P122.pdf (accessed on 15 January 2023).
Frees, E.W.; Gao, J.; Rosenberg, M.A. Predicting the frequency and amount of health care expenditures. N. Am. Actuar. J. 2011, 15, 377–392. [Google Scholar] [CrossRef]
Andersen, D.A.; Bonat, W.H. Double generalized linear compound Poisson models to insurance claims data. Electron. J. Appl. Stat. Anal. 2017, 10, 384–407. [Google Scholar]
Delong, Ł; Lindholm, M.; Wüthrich, M.V. Making Tweedie’s compound Poisson model more accessible. Eur. Actuar. J. 2021, 11, 185–226. [Google Scholar] [CrossRef]
Ribeiro, E.E., Jr.; Zeviani, W.M.; Bonat, W.H.; Demétrio, C.G.; Hinde, J. Reparametrization of COM-Poisson regression models with applications in the analysis of experimental data. Stat. Model. 2020, 20, 443–466. [Google Scholar] [CrossRef]
Jorgensen, B. The Theory of Dispersion Models; CRC Press: Boca Raton, FL, USA, 1997. [Google Scholar]
De Jong, P.; Heller, G.Z. Generalized Linear Models for Insurance Data; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
Garrido, J.; Genest, C.; Schulz, J. Generalized linear models for dependent frequency and severity of insurance claims. Insur. Math. Econ. 2016, 70, 205–215. [Google Scholar] [CrossRef] [Green Version]
Ribeiro, E.E., Jr. Cmpreg: Reparametrized COM-Poisson Regression Models, R Package Version 0.0.1; Available online: https://rdrr.io/github/JrEduardo/cmpreg/ (accessed on 15 January 2023).
Dutang, C.; Charpentier, A. CASdatasets: Insurance Datasets. 2019. R Package Version 1.0-11. Available online: http://cas.uqam.ca/ (accessed on 15 January 2023).

Figure 1. Prediction band for the test data under independent model and dependent model.

Table 1. Proportion of S lying in its respective PIs.

m	$ϕ$	Independent Model	Dependent Model
25	−1.6	0.6667	0.9444
	0	0.7777	0.8333
	1.6	0.8400	0.8400
50	−1.6	0.7353	0.8529
	0	0.6500	0.7000
	1.6	0.7656	0.8297
100	−1.6	0.6615	0.9077
	0	0.7088	0.9493
	1.6	0.7777	0.9393

Table 2. Prediction bands of independent compound regression model for over-, equi- and under-dispersed data.

m	$ϕ = - 1.6$	$ϕ = 0$	$ϕ = 1.6$
25
50
100

Table 3. Prediction bands of dependent compound regression model for over-, equi- and under-dispersed data.

m	$ϕ = - 1.6$	$ϕ = 0$	$ϕ = 1.6$
25
50
100

Table 4. Parameter estimates, p-values and AIC for the CMP and gamma regression models for the real-life data.

Covariates	CMP Regression Model	Gamma Regression Model (Independent Case)	Gamma Regression Model (Dependent Case)
(Intercept)	1.5007 (< 2 × 10 $^{- 16}$ )	5.7421 (< 2 × 10 $^{- 16}$ )	5.7754 (< 2 × 10 $^{- 16}$ )
OwnerAge21–24	1.5885 (< 2 × 10 $^{- 16}$ )	−0.2010 (0.0670)	−0.1800 (0.0964)
OwnerAge25–29	2.6237 (< 2 × 10 $^{- 16}$ )	−0.1129 (0.2705)	−0.0497 (0.6357)
OwnerAge30–34	2.7585 (< 2 × 10 $^{- 16}$ )	−0.3276 (0.0034)	−0.2542 (0.0262)
OwnerAge35–39	2.8854 (< 2 × 10 $^{- 16}$ )	−0.3150 (0.0047)	−0.2271 (0.0496)
OwnerAge40–49	3.5362 (< 2 × 10 $^{- 16}$ )	−0.2722 (0.0081)	−0.1140 (0.3528)
OwnerAge50–59	3.3678 (< 2 × 10 $^{- 16}$ )	−0.1854 (0.0843)	−0.0590 (0.6219)
OwnerAge60+	3.0280 (< 2 × 10 $^{- 16}$ )	−0.3054 (0.0036)	−0.2120 (0.0553)
ModelB	1.0255 (< 2 × 10 $^{- 16}$ )	0.0584 (0.4260)	0.1414 (0.0877)
ModelC	0.6930 (< 2 × 10 $^{- 16}$ )	0.1083 (0.1387)	0.1500 (0.0450)
ModelD	−0.1889 (0.00485)	0.4041 (6.01 × 10 $^{- 7}$ )	0.3762 (2.40 × 10 $^{- 6}$ )
CarAge10+	−1.9174 (< 2 × 10 $^{- 16}$ )	−0.8138 (< 2 × 10 $^{- 16}$ )	−0.9494 (5.87 × 10 $^{- 16}$ )
CarAge4–7	−0.1558 (6.65 × 10 $^{- 5}$ )	−0.0615 (0.3959)	−0.0727 (0.3089)
CarAge8–9	−1.4876 (< 2 × 10 $^{- 16}$ )	−0.4188 (8.64 × 10 $^{- 8}$ )	−0.5323 (2.02 × 10 $^{- 8}$ )
NClaims	-	-	−0.0010 (0.0301)
$\hat{ϕ}$	−0.8374 (< 2 × 10 $^{- 16}$ )	-	-
$\hat{ψ}$	-	0.0667	0.0644
AIC	984.7148	1125.6	1123.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Merupula, J.; Vaidyanathan, V.S.; Chesneau, C. Prediction Interval for Compound Conway–Maxwell–Poisson Regression Model with Application to Vehicle Insurance Claim Data. Math. Comput. Appl. 2023, 28, 39. https://doi.org/10.3390/mca28020039

AMA Style

Merupula J, Vaidyanathan VS, Chesneau C. Prediction Interval for Compound Conway–Maxwell–Poisson Regression Model with Application to Vehicle Insurance Claim Data. Mathematical and Computational Applications. 2023; 28(2):39. https://doi.org/10.3390/mca28020039

Chicago/Turabian Style

Merupula, Jahnavi, V. S. Vaidyanathan, and Christophe Chesneau. 2023. "Prediction Interval for Compound Conway–Maxwell–Poisson Regression Model with Application to Vehicle Insurance Claim Data" Mathematical and Computational Applications 28, no. 2: 39. https://doi.org/10.3390/mca28020039

Article Menu

Prediction Interval for Compound Conway–Maxwell–Poisson Regression Model with Application to Vehicle Insurance Claim Data

Abstract

1. Introduction

2. Compound CMP Regression Model

3. Parameter Estimation

3.1. Independent Compound Regression Model

3.2. Dependent Compound Regression Model

4. Prediction Intervals

5. Numerical Illustration

5.1. Simulation Study

5.2. Real-Life Application

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI