# Penalising Unexplainability in Neural Networks for Predicting Payments per Claim Incurred

## Abstract

**:**

## 1. Introduction

#### 1.1. Rationale

#### 1.2. Granular Claims Models

#### 1.3. Neural Networks

#### 1.4. Residual Neural Networks

_{1}, B

_{2}, B

_{3}and B

_{4}be weight matrices and a(x) be the activation function. Examples of activation functions include the Rectified Linear Unit (ReLU):

_{1}, b

_{2}, b

_{3}and b

_{4}to approximately minimise actual loss versus predicted Loss($\widehat{y}$, y) where:

**x**

_{1}= a(B

_{1}

**x**

_{input}),

**x**

_{2}= a(B

_{2}

**x**

_{1}),

#### 1.5. Embeddings

#### 1.6. Hyperparameter Selection

_{1}, B

_{2}, B

_{3}and B

_{4}weights individually as above. For a given loss function Loss($\widehat{y}$, y), a fully parametrised regularised loss function would be as below (the model in the next section uses mean-squared error loss):

## 2. Model

#### 2.1. Definitions

- The sum of ‘claim counts reported by delay’ by origin period is the incremental claims reported triangle,
- The sum of ‘claim payments by delay’ by origin period is the incremental payments triangle,
- If the claim counts and payments were the ultimates, it would be a typical GLM risk pricing dataset.
- The dataset remains reasonably compact as claims are in cross-tab format, which is beneficial from a memory usage perspective.

**x**_{i}be features, for policy exposures i, including the accident or origin period,**w**_{i,t}be exposure weights for exposure i in delay period t, with missing data after the balance date being weighted as 0,- C
_{i,t}be claim count reported for the exposure in delay period t, with C_{i}= ΣC_{i,t}and - P
_{i,t}be claim payments for the exposure in delay period t, with P_{i}= ΣP_{i,t}.

_{i,t}) and E(P

_{i,t}) based on x and w.

#### 2.2. From Payments Per Claim Incurrred to Granular Model

_{i,t}) = c

_{i}q

_{t},

_{i}is an estimate of ultimate claim count, and q

_{t}is the payments per claim incurred.

_{i,t}) = c

_{i}q b

_{t}, Σ

_{t}b

_{t}= 1.

#### 2.3. Network

**c**

_{linear}= B

_{c}

**x**

_{linear}= B

_{p}

**x**or a frequency-severity model by adding the frequency component at each step, such as with p

_{linear}=

**c**

_{linear}+ B

_{p}

**x**.

_{c}and d

_{p}were fitted with a log-softmax transform, let:

**c**

_{interpretable,t}=

**w**

_{c,t}× exp(

**c**+ log(softmax(

**d**

_{c,t}))).

**n**

_{1}= SELU(B

_{1}

**x**)

**n**

_{2}= SELU(B

_{2}

**n**

_{1})

**n**

_{3}= SELU(B

_{3}

**n**

_{2})

**c**

_{residual}= B

_{4}

**n**

_{3}

**d**

_{c-residual}= B

_{5}

**n**

_{3}

**c**

_{neural, t}= w

_{c,t}× exp(

**c**

_{linear}+

**c**

_{residual}+ log_softmax(

**d**

_{c,t}))

_{c}such that:

_{c}= exp(w

_{c})

**/**(1 + exp(w

_{c})),

**c**

_{blended,t}= Weight

_{c}

**c**

_{interpretable,t}+ (1 − Weight

_{c}

**c**

_{neural,t})

#### 2.4. Loss

#### 2.5. Training: Dropout, Optimiser, Shuffling, Batch Size, and Learn Rate Scheduler

#### 2.6. Language and Package

## 3. Dataset Used

#### 3.1. Dataset Details

#### 3.2. Cleaning and Sampling

#### 3.3. Comparison with Manual Selections for Chain Ladder and PPCI

## 4. Results

#### 4.1. Sampling Sizes

#### 4.2. Bayesian Optimisation of Hyperparameters

#### 4.3. Regularisation

## 5. Extending with Freeform Data—Claim Description

## 6. Conclusions

## Funding

## Conflicts of Interest

## References

- Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Bidirectional Transformers for Language Understanding. arXiv, arXiv:1810.04805. [Google Scholar]
- Fotso, Stephane. 2018. Deep Neural Networks for Survival Analysis Based on a Multi-Task Framework. arXiv, arXiv:1801.05512v1. [Google Scholar]
- He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv, arXiv:1512.03385. [Google Scholar]
- Kingma, Diederik, and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. arXiv, arXiv:1412.6980. [Google Scholar]
- Kuo, Kevin. 2018. DeepTriangle: A Deep Learning Approach to Loss Reserving. arXiv, arXiv:1804.09253v3. [Google Scholar]
- Miller, Hugh, Gráinne McGuire, and Greg Taylor. 2016. Self-Assembling Insurance Claim models. Available online: https://actuaries.asn.au/Library/Events/GIS/2016/4dHughMillerSelfassemblingclaimmodels.pdf (accessed on 24 August 2019).
- Poon, Jacky. 2018. Analytics Snippet: Multitasking Risk Pricing Using Deep Learning. Available online: https://www.actuaries.digital/2018/08/23/analytics-snippet-multitasking-risk-pricing-using-deep-learning (accessed on 13 June 2019).
- Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. arXiv, arXiv:1602.04938v3. [Google Scholar]
- Schelldorfer, Jürg, and Mario V. Wuthrich. 2019. Nesting Classical Actuarial Models into Neural Networks. Available online: https://ssrn.com/abstract=3320525 (accessed on 22 January 2019).
- Semenovich, Dimitri, Ian Heppell, and Yang Cai. 2010. Convex Models: A New Approach to Common Problems in Premium Rating. Sydney: The Institute of Actuaries of Australia. [Google Scholar]
- Smith, Leslie N. 2018. A disciplined approach to neural network hyper-parameters: Part 1—Learning rate, batch size, momentum, and weight decay. arXiv, arXiv:1803.09820v2. [Google Scholar]
- Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. The Journal of Machine Learning Research 15: 1929–58. [Google Scholar]
- Taylor, Greg, Gráinne McGuire, and James Sullivan. 2008. Individual claim loss reserving conditioned by case estimates. Annals of Actuarial Science 3: 215–56. [Google Scholar] [CrossRef]
- Vaughan, Joel, Agus Sudjianto, Erind Brahimi, Jie Chen, and Vijayan N. 2018. Nair. Explainable Neural Networks based on Additive Index Models. arXiv, arXiv:1806.01933v1. [Google Scholar]

**Figure 1.**(

**a**) A residual block comprising of a group of three layers, showing the skip connection, which can skip one or multiple layers; (

**b**) The same block as a model, showing interpretable linear or GLM model and ‘unexplainable’ neural network.

**Figure 2.**Simplified model network graph. Each block represents a group of layers, and the arrows represent the data flow. Grey arrows represent the “interpretable” network flow, whilst blue lines represent the “neural” network flow.

Origin Period | Feature 1 | … | Feature m | Claim Reported by Delay | Claim Paid Reported by Delay | Data Exists Flag by Delay ^{1} | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

x | c | p | w | |||||||||

1 | … | n | 1 | … | n | 1 | … | n | ||||

1 January 2015 | 1 | 0 | 0 | $2000 | 1 | 1 | ||||||

1 December 2016 | - | 0 | NaN | 0 | NaN | 1 | 0 | |||||

1 December 2015 | 0 | 1 | 0 | $3000 | 1 | 1 |

^{1}Exposure weights if needed would be multiplied to the data existence flag.

Test | Sample Size | Seed | Chain Ladder SE Count | PPCI SE Paid | Model SE Count | Model SE Paid | Better Paid SE |
---|---|---|---|---|---|---|---|

0 | 5000 | 1 | 1453.14 | 3436.19 | NaN | NaN | N |

1 | 5000 | 2 | 472.17 | 2447.26 | 256.78 | 2183.37 | Y |

2 | 5000 | 3 | 431.78 | 5638.29 | NaN | NaN | N |

3 | 5000 | 4 | 339.55 | 1652.14 | 224.21 | 3401.45 | N |

4 | 5000 | 5 | 1590.66 | 58,388.27 | NaN | NaN | N |

5 | 10,000 | 1 | 540.19 | 4070.24 | 823.89 | 5181.73 | N |

6 | 10,000 | 2 | 2533.18 | 51,956.42 | 389.81 | 44,884.23 | Y |

7 | 10,000 | 3 | 606.79 | 11,418.01 | 272.21 | 7511.17 | Y |

8 | 10,000 | 4 | 1810.21 | 209,956.66 | 976.48 | 202,351.99 | Y |

9 | 10,000 | 5 | 1057.81 | 52,508.37 | 270.61 | 42,912.06 | Y |

10 | 25,000 | 1 | 5575.31 | 35,060.89 | 2583.43 | 23,732.57 | Y |

11 | 25,000 | 2 | 4644.56 | 61,120.81 | 750.18 | 51,177.28 | Y |

12 | 25,000 | 3 | 1969.43 | 295,767.11 | NaN | NaN | N |

13 | 25,000 | 4 | 4054.21 | 309,086.25 | 1985.41 | 299,889.39 | Y |

14 | 25,000 | 5 | 1703.19 | 57,953.14 | 1362.94 | 67,186.30 | N |

15 | 50,000 | 1 | 25,487.11 | 133,176.94 | 2407.64 | 78,762.94 | Y |

16 | 50,000 | 2 | 6313.54 | 125,112.45 | 9515.98 | 116,294.24 | Y |

17 | 50,000 | 3 | 3744.60 | 196,970.08 | 6727.98 | 220,488.81 | N |

18 | 50,000 | 4 | 13,261.61 | 334,846.96 | 19,096.74 | 284,140.74 | Y |

19 | 50,000 | 5 | 4430.90 | 67,546.64 | 1606.70 | 57,256.59 | Y |

20 | 100,000 | 1 | 23,441.53 | 312,086.87 | 29,796.32 | 441,193.89 | N |

21 | 200,000 | 1 | 64,274.85 | 5,147,858.37 | 28,841.52 | 5,076,289.65 | Y |

250,000 with Bayes Opt | 151,959.56 | 1,260,099.15 | 201,002.33 | 961,150.56 | Y |

Percentile | Model Paid | Actual Paid |
---|---|---|

5% | 0.25 | 0.23 |

10.0% | 0.33 | 0.32 |

15.0% | 0.47 | 0.47 |

20.0% | 0.6 | 0.38 |

25.0% | 0.66 | 0.5 |

30.0% | 0.71 | 0.54 |

35.0% | 0.76 | 0.54 |

40.0% | 0.8 | 0.51 |

45.0% | 0.85 | 0.6 |

50.0% | 0.9 | 0.67 |

55.0% | 0.96 | 1.16 |

60.0% | 1.02 | 1.24 |

65.0% | 1.1 | 1.58 |

70.0% | 1.18 | 1.18 |

75.0% | 1.28 | 1.17 |

80.0% | 1.39 | 0.96 |

85.0% | 1.49 | 1.03 |

90.0% | 1.62 | 1.8 |

95.0% | 1.77 | 2.01 |

100.0% | 2.24 | 3.06 |

Weight Decay (l2) | Paid w * | Dense Layer 1 | Dense Layer 2 | Dense Layer 3 |
---|---|---|---|---|

0.0001 | 0.97 | 0.043 | 0.026 | 0.031 |

0.001 | 2.51 | 0.016 | 0.009 | 0.007 |

0.01 | 2.39 | 8.26 × 10^{−5} | 1.57 × 10^{−5} | 2.71 × 10^{−4} |

0.1 | 0.64 | 0.004 | 2.36 × 10^{−5} | 4.02 × 10^{−5} |

^{w}/(1 + e

^{w}).

© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

H. L. Poon, J.
Penalising Unexplainability in Neural Networks for Predicting Payments per Claim Incurred. *Risks* **2019**, *7*, 95.
https://doi.org/10.3390/risks7030095

**AMA Style**

H. L. Poon J.
Penalising Unexplainability in Neural Networks for Predicting Payments per Claim Incurred. *Risks*. 2019; 7(3):95.
https://doi.org/10.3390/risks7030095

**Chicago/Turabian Style**

H. L. Poon, Jacky.
2019. "Penalising Unexplainability in Neural Networks for Predicting Payments per Claim Incurred" *Risks* 7, no. 3: 95.
https://doi.org/10.3390/risks7030095