# Machine Learning in P&C Insurance: A Review for Pricing and Reserving

## Abstract

## 1. Introduction

#### 1.1. Research Methodology

- Query research databases (Google Scholar, ProQuest, SSRN, arXiv, ResearchGate) for a combination of machine learning keywords (machine learning, data science, decision tree (DT), classification and regression trees (CART), neural network (NN) convolutional neural networks (CNN), recurrent neural networks (RNN), random forest (RF), gradient boosting (GBM/GBT/XGBoost), generalized additive model (GAM, GAMLSS), support vector machine (SVM, SVR, SVC), principal component analysis (PCA), autoencoders (AE), computer science) AND the subjects of interest in our review (actuarial science, general insurance, home insurance, auto insurance, P&C insurance, ratemaking, reserving).
- Query actuarial journals (in no particular order, Risks, ASTIN Bulletin, Insurance: Mathematics and Economics (IME), Scandinavian Actuarial Journal (SAJ), Variance, North American Actuarial Journal (NAAJ), European Actuarial Journal (EAJ)).
- For each pertinent article, we searched references therein for similar contributions.

#### 1.2. Scope of This Review and Similar Work

#### 1.3. Generalized Data on This Review

## 2. Neural Networks

#### 2.1. Basics and Notation

#### 2.2. Estimating Probability Distribution Parameters with Neural Networks

## 3. Pricing with Machine Learning

#### 3.1. Conventional Pricing

#### 3.2. Neural Pricing

#### 3.3. Telematics Pricing

#### 3.3.1. Pay-as-You-Drive

- be proportional to the expected loss;
- be practical (objective and inexpensive to obtain and verify);
- consider preexisting exposure base established within the industry.

#### 3.3.2. Pay-How-You-Drive

#### 3.4. Outlook on Pricing

## 4. Reserving with Machine Learning

#### 4.1. Aggregate Reserving

- aggregation of multiple claims at the portfolio level or other grouping types if the actuary believes that development patterns are heterogeneous within the portfolio;
- aggregation of continuous-time into interval time, usually yearly, quarterly or monthly.

#### 4.2. Neural Aggregate Reserving

#### 4.3. Individual Reserving

- claim status (open, close, reopen), a classification task;
- activity status (presence of claim or change in case reserve indicator during the period), a classification task;
- individual payment value or change in case reserve value conditional on the presence of claim during the period, a regression task;
- involvement of lawyers or doctors, a classification task.

#### 4.4. Neural Individual Reserving

#### 4.5. Outlook on Reserving

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Abbreviations

AVB | adversarial variational Bayes |

CART | classification and regression trees |

DT | decision tree |

EF | exponential family |

GAM | generalized additive model |

GBM | gradient boosting machine |

GBT | gradient boosted trees |

GLM | generalized linear model |

knn | k-nearest neighbour |

LDA | linear discriminant analysis |

LR | logistic regression |

NB | naïve Bayes |

NLL | negative log-likelihood |

NN | neural network |

P&C | property and casualty |

RF | random forest |

RNN | recurrent neural network |

SVM | support vector machine |

SVR | support vector regression |

SVC | support vector classifier |

1. | Overviews consist of white papers, case studies, reviews, surveys and reports if published in research journals or conference proceedings, sponsored by professional actuarial organizations or large insurance companies. |

2. | |

3. | |

4. | Claim Models Taylor (2020), Machine Learning Asimit et al. (2020) and Finance, insurance and risk management (https://www.mdpi.com/journal/risks/special_issues/Machine_Learning_Finance_Insurance_Risk_Management). |

5. | Embeddings are vectorial representations of data created with deep neural networks to compress high dimensional data, categorical data or unstructured data. |

6. | See also https://github.com/kasaai/simulationmachine for a user-friendly package. |

**Figure 5.**Visualizing a GLM in a neural network graph diagram. (

**a**) Graph in GLM notation. (

**b**) Graph in NN notation.

**Figure 6.**Examples of neural network architectures for exponential family (EF) distributions. (

**a**) An approach for nonlinear Poisson regression. (

**b**) The approach proposed by Denuit et al. (2019a).

**Figure 7.**Examples of alternate distributions in neural networks with the NLL approach. (

**a**) Negative binomial neural network. (

**b**) Tweedie neural network.

Description | Reference | Methodologies/Approaches |
---|---|---|

Book | Frees et al. (2014a, 2014b) | GLM, GAM |

Comparative study | Dugas et al. (2003) | GLM, DT, NN, SVM |

Comparative study | Noll et al. (2018) | GLM, DT, GBT, NN |

Comparative study | Diana et al. (2019) | GLM, RF, GBT, NN |

Comparative study | Lee and Antonio (2015) | GLM, GAM, NN, GBT, CART |

Comparative study | Kašćelan et al. (2016) | SVR, Kernel LR |

Comparative study | Fauzan and Murfi (2018) | GBT, AdaBoost, RF, NN |

Comparative study | Maynard et al. (2019) | XGBoost, RF, LR, NN, |

Lecture notes | Wuthrich and Buser (2019) | GLM, GAM, NN, RF, GBM, SVM |

Lecture notes | Denuit et al. (2019a, 2019b, 2019c) | GLM, GAM, GBM, NN |

Report | Bothwell et al. (2016) | – |

Report | Harej et al. (2017) | NN |

Report | Jamal et al. (2018) | RF, NN, GBM |

Review | Corlosquet-Habart and Janssen (2018) | NN, RF, GBM, SVM |

Review | Albrecher et al. (2019) | – |

Review | Grize et al. (2020) | CART, NN, XGBoost |

Review | Śmietanka et al. (2020) | – |

Review | Richman(2020a, 2020b) | NN |

Survey | Rioux et al. (2019) | – |

White paper | Bruer et al. (2015) | – |

White paper | Panlilio et al. (2018) | GLM, GBT, NN |

White paper | Richman et al. (2019) | NN |

Reference | Models |
---|---|

Christmann (2004) | LR, SVR |

Denuit and Lang (2004) | GAM |

Paglia and Phelippe-Guinvarc’h (2011) | CART |

Guelman (2012) | GBT |

Liu et al. (2014) | SVC |

Klein et al. (2014) | GAMLSS |

Sakthivel and Rajitha (2017) | NN |

Henckaerts et al. (2018) | GAMLSS |

Quan and Valdez (2018) | DT |

Yang et al. (2018) | GBT |

Lee and Lin (2018) | Boosting |

Yang et al. (2019) | NN |

Wüthrich and Merz (2019) | GLM, NN |

Fontaine et al. (2019) | GLM |

Diao and Weng (2019) | CART |

Wüthrich (2019) | NN |

So et al. (2020) | Adaboost |

Zhou et al. (2020) | GBT |

Henckaerts et al. (2020) | GBT |

Reference | Models |
---|---|

Boucher et al. (2017) | GAM |

Wüthrich (2017) | k-means |

Gao and Wüthrich (2018) | PCA, AE |

Gao et al. (2018) | GAM |

Gao et al. (2019) | GAM |

Pesantez-Narvaez et al. (2019) | LR, GBT |

Gao and Wüthrich (2019) | CNN |

Narwani et al. (2020) | LR, GBT, k-means |

Gao et al. (2020) | CNN |

Description | Reference | Approaches | Type |
---|---|---|---|

ODP | England and Verrall (2001) | GAM | Agg |

ODP | Spedicato et al. (2014) | GAMLSS | Agg |

IBNR | Lopes et al. (2012) | SVR | Agg |

CL | Wüthrich (2018b) | NN | Agg |

Tot Res | Kuo (2019a) | RNN | Agg |

ODP | Gabrielli et al. (2019) | NN | Agg |

ODP | Gabrielli (2020b) | NN | Agg |

— | Mulquiney (2006) | NN | Ind |

Tot Res | Wüthrich (2018a) | CART | Ind |

RBNS | Llaguno et al. (2017) | — | Ind |

RBNS | Lopez et al. (2016) | CART | Ind |

Tot Res | Baudry and Robert (2019) | CART | Ind |

Simulation | Gabrielli and Wüthrich (2018) | NN | Ind |

Tot Res | Pigeon and Duval (2019) | GBT | Ind |

RBNS | Lopez et al. (2019) | CART | Ind |

RBNS | Kuo (2020) | RNN | Ind |

RBNS | Lopez and Milhaud (2020) | CART | Ind |

RBNS | Gabrielli (2020a) | NN | Ind |

RBNS | De Felice and Moriconi (2019) | CART | Ind |

CL | Carrato and Visintin (2019) | k-means | Ind |

Tot Res | Delong et al. (2020) | NN | Ind |

RBNS | Crevecoeur and Antonio (2020) | GBM | Ind |

RBNS | Lopez and Milhaud (2020) | CART | Ind |

