# Modelling Unobserved Heterogeneity in Claim Counts Using Finite Mixture Models







## Abstract



## 1. Introduction and Aims

## 2. Finite Mixture of Regression Models

#### 2.1. Finite Mixture of Poisson Regressions

#### 2.2. Finite Mixture of Negative Binomial Regressions

#### 2.3. Other Models

#### 2.4. Estimation via EM Algorithm

- M1
- Update the mixing proportions using$${\widehat{\pi}}_{j}=\frac{{\displaystyle \sum _{i=1}^{n}}{w}_{ij}}{n},\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}j=1,\dots ,k$$
- M2
- Update the regression coefficients and the component-specific parameters by fitting a single regression model for the j-th component with response ${y}_{i}$, covariates ${\mathbf{x}}_{i}$ using a weighted likelihood approach with weights ${w}_{ij}$.

#### 2.5. Computational Details

We used our own code, while some of the models can be fitted using the

gamlss,

VGAM and

flexmix packages in

R. However, we found some convergence problems and less flexibility while using the standard packages.

## 3. Data and Results

#### 3.1. Data Description

#### 3.2. Fitted Models

`R`. Table 2 compares the fitted models for Poisson and negative binomial distributions, resulting in the best fit being obtained with a 2-finite mixture of negative binomial regression models (2FMNB). Finite mixture models with $k>2$ were also fitted, but no improvement in terms of AIC or BIC was achieved. This result gives rise to the conclusion that this portfolio is comprised of two groups of policyholders.

#### 3.3. Usage of FM Models for Actuarial Purposes

## 4. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

Variable | Definition | Mean | St. dev. |
---|---|---|---|

N | total number of claims reported by policyholders | 0.1833 | 0.5873 |

(0: 71,087; 1: 6,744; 2: 2,067; 3: 690; 4: 248; 5: 95; 6: 34; >6: 29) | |||

GEN | equals 1 for women and 0 for men | 0.1600 | 0.3666 |

URB | equals 1 when driving in urban area, 0 otherwise | 0.6690 | 0.4706 |

ZON | equals 1 when driving in Madrid, Catalonia or northern Spain, 0 otherwise | 0.4326 | 0.4954 |

LIC | equals 1 if the driving license is 4 or more years old, 0 otherwise | 0.9766 | 0.1511 |

LOY | equals 1 if the client is in the company for more than 5 years, 0 otherwise | 0.1441 | 0.3512 |

COV | equals 1 if includes comprehensive and collision coverage, 0 otherwise | 0.5087 | 0.4999 |

POW | equals 1 if horsepower is greater than or equal to 5500cc, 0 otherwise | 0.8058 | 0.3955 |

Model | Log-Likelihood | Parameters | AIC | BIC |
---|---|---|---|---|

Poisson | −42,585.08 | 8 | 85,186.15 | 85,260.57 |

Negative binomial | −38,453.13 | 9 | 76,924.27 | 77,007.98 |

Zero-inflated Poisson | −38,836.59 | 9 | 77,691.19 | 77,774.91 |

Zero-inflated negative binomial | −38,453.13 | 10 | 76,926.27 | 77,019.28 |

2-Finite Poisson mixture | −38,449.61 | 17 | 76,933.21 | 77,091.36 |

2-Finite negative binomial mixture | −38,347.81 | 19 | 76,733.62 | 76,910.36 |

**Table 3.**The fitted models for both the negative binomial and the 2FMNB. The p-value for the 2FMNB refers to that of LRT when the variables is removed from both components, whereas for the simple negative binomial it refers to the Wald test.

2FMNB | Negative Binomial | ||||||
---|---|---|---|---|---|---|---|

1st comp. | 2nd comp. | p-Value | Estimate | p-Value | |||

Intercept | −6.1420 | −1.1364 | <0.0001 | Intercept | −2.4144 | <0.0001 | |

GEN | 0.2633 | 0.0086 | 0.0124 | GEN | 0.0774 | 0.0103 | |

URB | 0.3407 | −0.0762 | 0.0017 | URB | 0.0165 | 0.4870 | |

ZON | 0.3745 | 0.0564 | <0.0001 | ZON | 0.1324 | <0.0001 | |

LIC | 0.2413 | −0.2423 | 0.0448 | LIC | −0.1610 | 0.0230 | |

LOY | 0.3707 | 0.1289 | <0.0001 | LOY | 0.2019 | <0.0001 | |

COV | 3.1438 | 0.6373 | <0.0001 | COV | 1.0024 | <0.0001 | |

POW | 0.2502 | 0.1148 | <0.0001 | POW | 0.1440 | <0.0001 | |

$\varphi $ | 0.2321 | 0.6051 | $\varphi $ | 0.2527 | |||

$\pi $ | 0.6686 | 0.3314 |

Profile Name | GEN | URB | ZON | LIC | LOY | COV | POW |
---|---|---|---|---|---|---|---|

Best | 0 | 1 | 0 | 1 | 0 | 0 | 0 |

Good | 1 | 1 | 0 | 0 | 0 | 0 | 1 |

Average | 0 | 0 | 0 | 1 | 0 | 1 | 0 |

Bad | 1 | 1 | 0 | 0 | 0 | 1 | 1 |

Worst | 1 | 1 | 1 | 0 | 1 | 1 | 1 |

**Table 5.**The mean and the variance derived from the simple negative binomial model (NB) and the 2FMNB.

Profile | Mean | Variance | |||||||
---|---|---|---|---|---|---|---|---|---|

NB | 2FMNB | 2FMNB-1 | 2FMNB-2 | NB | 2FMNB | 2FMNB-1 | 2FMNB-2 | ||

Best | 0.077 | 0.080 | 0.004 | 0.233 | 0.101 | 0.183 | 0.004 | 0.323 | |

Good | 0.113 | 0.115 | 0.005 | 0.336 | 0.164 | 0.279 | 0.005 | 0.524 | |

Average | 0.207 | 0.200 | 0.063 | 0.476 | 0.378 | 0.496 | 0.081 | 0.852 | |

Bad | 0.309 | 0.289 | 0.117 | 0.636 | 0.688 | 0.756 | 0.176 | 1.306 | |

Worst | 0.432 | 0.419 | 0.247 | 0.766 | 1.170 | 1.159 | 0.509 | 1.735 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

