# Concordance Probability for Insurance Pricing Models

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Datasets and Models

#### 2.1. Datasets

`R`-package

`CASdatasets`1 and contain data on which both a frequency and severity model can be applied.

#### 2.1.1. 2015 Pricing Game

**claimNumb**and

**claimCharge**, which will be the dependent variables of the frequency and severity analysis respectively. The variable

**claimNumb**shows the number of third-party bodily injury claims. For policies for which more than two claims were filed during the considered exposure, the value was set to 2. This adaptation is needed for the measures that are presented in Section 3. The variable

**claimCharge**represents the total cost of third-party bodily injury claims, in euro. Finally,

**exposure**will be used as an offset variable during the analysis of the frequency data. It is the percentage of a full policy year, corresponding to the run time of the respective policy. Note that $72.58\%$ of the observations have an exposure equal to one.

#### 2.1.2. 2016 Pricing Game

`R`-package

`CASdatasets`. The first dataset contains 87,226 policies for private motor insurance and can be used for the frequency model. The pg16trainclaim dataset contains 4568 claims of those 87,226 TPL policies and combined with the pg16trainpol dataset, the severity model can be constructed. Policies are guaranteed for all kinds of material damages, but not bodily injuries.

**claimNumb**and

**claimCharge**, which will be the dependent variables of the frequency and severity analysis respectively. The variable

**claimNumb**shows the number of claims. The policies for which more than two claims were filed during the considered exposure, the value was once again set to 2. This adaptation is needed for the measures that are presented in Section 3. The variable

**claimCharge**represent the claim size. Moreover,

**exposure**will be used as an offset variable during the analysis of the frequency data. It is the percentage of a full policy year, corresponding to the run time of the respective policy. In this dataset, 14.16% of the observations have an exposure equal to one.

#### 2.2. Models

#### 2.2.1. Frequency

**claimNumb**is the response variable. The exposure is used as an offset variable, and all other variables of the training set, apart from

**claimCharge**, are considered as predictor variables. Applying the frequency model on the test set of the 2015 (2016) pricing game, we obtain 40,008 (34,890) pairs of observations and their corresponding predictions. However, the goal of this paper is to calculate the concordance probability of these frequency models for big datasets. Therefore, we will also consider a bootstrap of these pairs of observations and predictions, resulting in 1,000,000 pairs for each dataset.

#### 2.2.2. Severity

**claimCharge**over

**claimNumb**is the response variable, and the weights are equal to the variable

**claimNumb**. This is a popular approach for severity models, as explained in Appendix B, based on the book of Denuit et al. (2007). All other variables of the training set, apart from

**exposure**and

**claimNumb**, are considered as predictors. Applying the severity model on the test set of the 2015 (2016) pricing game, we obtain 1837 (1588) pairs of observations and their corresponding predictions. However, the goal of this paper is to calculate the concordance probability of these severity models for big datasets. Therefore, we will also consider a bootstrap of these pairs of observations and predictions, resulting in 1,000,000 pairs for each dataset.

## 3. Concordance Probability in an Insurance Setting

#### 3.1. Frequency Models

- $\mathbb{O}$-group: group with the largest number of elements, hence the group with the smallest number of events,
- $\mathbb{1}$-group: group with the smallest number of elements, hence the group containing the largest number of events.

- Determine the pairs of observations and predictions belonging to the $\mathbb{O}$-group and the ones to the $\mathbb{1}$-group.
- Define the number of unique exposures $\lambda $ within $\mathbb{1}$ and apply a for-loop on them:
- Select the elements in $\mathbb{1}$ with exposure ${\lambda}_{i}$.
- Select the elements in $\mathbb{O}$ with exposure in $[max(0,{\lambda}_{i}-\gamma ),min(1,{\lambda}_{i}+\gamma )]$.
- Determine $\mathrm{C}({\lambda}_{i},\gamma )$, the concordance probability on these two subsets.
- Define ${m}_{i}$, the number of comparable pairs used to calculate $\mathrm{C}({\lambda}_{i},\gamma )$.

- The global concordance probability $\mathrm{C}\left(\gamma \right)$ can be rewritten as:$$\begin{array}{cc}\hfill \mathrm{C}\left(\gamma \right)=& \frac{{\sum}_{i=1}^{n-1}{\sum}_{j=i+1}^{n}I\left(\widehat{\pi}\left({\mathit{x}}_{i}\right)>\widehat{\pi}\left({\mathit{x}}_{j}\right)\phantom{\rule{0.277778em}{0ex}},\phantom{\rule{0.277778em}{0ex}}{y}_{i}\in \mathbb{1}\phantom{\rule{0.277778em}{0ex}},\phantom{\rule{0.277778em}{0ex}}{y}_{j}\in \mathbb{O}\phantom{\rule{0.277778em}{0ex}},\phantom{\rule{0.277778em}{0ex}}|{\lambda}_{j}-{\lambda}_{i}|<\gamma \right)}{{\sum}_{i=1}^{n-1}{\sum}_{j=i+1}^{n}I\left(\widehat{\pi}\left({\mathit{x}}_{i}\right)\ne \widehat{\pi}\left({\mathit{x}}_{j}\right)\phantom{\rule{0.277778em}{0ex}},\phantom{\rule{0.277778em}{0ex}}{y}_{i}\in \mathbb{1}\phantom{\rule{0.277778em}{0ex}},\phantom{\rule{0.277778em}{0ex}}{y}_{j}\in \mathbb{O}\phantom{\rule{0.277778em}{0ex}},\phantom{\rule{0.277778em}{0ex}}|{\lambda}_{j}-{\lambda}_{i}|<\gamma \right)}\hfill \\ \hfill =& \frac{{\sum}_{i=1}^{{n}_{1}}{\sum}_{j=1}^{{n}_{0}}I\left(\widehat{\pi}\left({\mathit{x}}_{i}\right)>\widehat{\pi}\left({\mathit{x}}_{j}\right)\phantom{\rule{0.277778em}{0ex}},\phantom{\rule{0.277778em}{0ex}}|{\lambda}_{j}-{\lambda}_{i}|<\gamma \right)}{{\sum}_{i=1}^{{n}_{1}}{\sum}_{j=1}^{{n}_{0}}I\left(\widehat{\pi}\left({\mathit{x}}_{i}\right)\ne \widehat{\pi}\left({\mathit{x}}_{j}\right)\phantom{\rule{0.277778em}{0ex}},\phantom{\rule{0.277778em}{0ex}}|{\lambda}_{j}-{\lambda}_{i}|<\gamma \right)}\hfill \\ \hfill =& \frac{{\sum}_{i=1}^{{n}_{1}^{\prime}}{m}_{i}C({\lambda}_{i},\gamma )}{{\sum}_{i=1}^{{n}_{1}^{\prime}}{m}_{i}}\hfill \\ \hfill =& {\displaystyle \sum _{i}}{w}_{i}C({\lambda}_{i},\gamma ),\hfill \end{array}$$
- Construct the plot of $\mathrm{C}({\lambda}_{i},\gamma )$ in function of ${\lambda}_{i}$.

- For every observation i, construct $C({\lambda}_{i},\gamma )$, with ${\lambda}_{i}$ the exposure of the considered element.
- For every considered exposure ${\lambda}_{i}$, determine the weighted mean of $C({\lambda}_{i},\gamma )$, where the weights are based on the total number of comparable pairs.

#### 3.2. Severity Models

## 4. Time-Efficient Computation

#### 4.1. Frequency

#### 4.1.1. Marginal Approximation

#### 4.1.2. k-Means Approximation

#### 4.2. Severity

#### 4.2.1. Marginal Approximation

#### 4.2.2. k-Means Approximation

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

`R`-package

`CASdatasets`.

## Conflicts of Interest

## Appendix A. Description of the Datasets

**pg15training**dataset, we selected and renamed the following variables:

**CalYear**renamed as**uwYear**: The underwriting year or the year in which the run time of the policy started. Categorical variable with 2 levels (2009, 2010).**Gender**renamed as**gender**: The gender of the car driver. Categorical variable with 2 levels (Male, Female).**Type**renamed as**carType**: The car type. Categorical variable with 6 levels (A, B, C, D, E, F).**Category**renamed as**carCat**: The car category. Categorical variable with 3 levels (Small, Medium, Large).**Occupation**renamed as**job**: The occupation of the driver. Categorical variable with 5 levels (Employed, Housewife, Retired, Self-employed and Unemployed).**Age**renamed as**age**: The drivers’ age, expressed in years. Categorized variable with 6 levels (1, 2, …, 6).**Group1**renamed as**group1**: The group of the car. Categorical variable with 20 levels (integer value ranging from 1 to 20, with jumps of 1).**Bonus**renamed as**bm**: The bonus-malus or French no-claim discount:−30 means a 30 percent bonus while +20 means a 20 percent malus. Categorical variable with 21 levels (integer value ranging from −50 to 150, with jumps of 10).**Poldur**renamed as**nYears**: The number of years that the policy already exists at the beginning of the exposure. Categorical variable with 16 levels (integer value ranging from 0 to 15, with jumps of 1).**Value**renamed as**carVal**: The car value in euro. Categorized variable with 6 levels (1, 2, …, 6).**Adind**renamed as**cover**: A dummy variable indicating the material cover. Categorical variable with 2 levels (0, 1).**Density**renamed as**density**: The population density (number of inhabitants per square km) in the city that the driver of the car lives in. Categorized variable with 6 levels (1, 2, …, 6).**Exppdays**renamed as**exposure**: Percentage of a full policy year, corresponding to the run time of the respective policy.**Numtpbi**renamed as**claimNumb**: The number of third-party bodily injury claims. The policies for which more than two claims were filed during the considered exposure, the value was set to 2. This adaptation is needed for the measures that are presented in Section 3.**Indtpbi**renamed as**claimCharge**: The total cost of third-party bodily injury claims, in euro.

**age**,

**carVal**and

**density**were originally continuous variables that are transformed to categorical variables as explained by Van Oirbeek et al. (2021).

**pg16trainpol**dataset, we selected and renamed the following variables:

**Year**renamed as**covYear**: The covering year. Categorical variable with 3 levels (2011, 2012 and 2013).**VehiclPower**renamed as**vehPower**: The vehicle power. Categorical variable with 11 levels (P1, P2, …, P11).**Deduc**renamed as**deduc**: The deductible category. Categorical variable with 6 levels (0 euro, 1–200 euro, 201–300 euro, 301–400 euro, 401–600 euro, >600 euro).**BusinessType**renamed as**businessType**: The business type. Categorical variable with 8 levels (B1, B2, …, B8).**ChannelDist**renamed as**channelDist**: The distribution channel. Categorical variable with 3 levels (D1, D2, D3).**ClaimNb**renamed as**claimNumb**: The claim number. The policies for which more than two claims were filed during the considered exposure, the value was set to 2. This adaptation is needed for the measures that are presented in Section 3.**Exposure**renamed as**exposure**: Percentage of a full policy year, corresponding to the run time of the respective policy.**PolicyAgeCateg**renamed as**age**: The category of the policy age. Categorical variable with 6 levels (0–1 year, 1–2 years, 2–3 years, 3–4 years, 4–5 years, >5 years).**PolicyCateg**renamed as**polCat**: The category of the policy. Categorical variable with 4 levels (C2, C3, C4, C5).**CompanyCreation**renamed as**compCrea**: A dummy indicating if the company has been created.**FleetMgt**renamed as**fleet**: The fleet management category. Categorical variable with 2 levels (N, P).**FleetSizeCateg**renamed as**fleetSize**: The fleet size category. Categorical variable with 2 levels (S1, S2).**Area**renamed as**area**: The geographical area. Categorical variable with 6 levels (A1, A2, …, A6).**PayFreq**renamed as**payFreq**: The payment frequency. Categorical variable with 3 levels (quarter, semester, year).

**pg16trainclaim**dataset, we selected and renamed the following variables:

**DirectComp**renamed as**matDam**: As claims correspond only to material damage, the French claim convention (IDA) was applied. So the insurer may directly refund the insured (**matDam**=TRUE) even if the insurer will sue the third-party insurer to recover the indemnity afterwards.**ClaimCharge**renamed as**claimCharge**: The claim charge.

## Appendix B. The Gamma Distribution

## Appendix C. Extended and Original Marginal Approximation

**Table A1.**Bias and run time (s), the latter between brackets, for the extended marginal approximation of ${\mathrm{C}}_{0,1+}^{\approx}\left(0.05\right)$ on the 2015 and 2016 pricing game dataset. This is given for the fine grid version and for several different numbers of boundary values for the $\mathbb{O}$- and $\mathbb{1}$-group.

(a) 2015 Pricing Game | ||||||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | |||||

50 | 100 | 500 | 1000 | 5000 | 10,000 | |

50 | 0.0067 (2.47) | 0.0045 (2.39) | 0.0037 (2.90) | 0.0036 (3.75) | 0.0034 (10.55) | 0.0033 (20.04) |

100 | 0.0049 (2.65) | 0.0033 (2.66) | 0.0020 (3.28) | 0.0019 (4.14) | 0.0017 (12.09) | 0.0017 (21.82) |

500 | 0.0037 (4.36) | 0.0020 (4.15) | 0.0007 (4.56) | 0.0005 (5.74) | 0.0004 (17.11) | 0.0004 (32.09) |

1000 | 0.0035 (6.24) | 0.0019 (5.70) | 0.0005 (7.05) | 0.0003 (7.91) | 0.0002 (23.57) | 0.0002 (42.70) |

5000 | 0.0033 (26.00) | 0.0017 (23.61) | 0.0004 (23.24) | 0.0002 (25.33) | 0.0001 (69.28) | 0.0001 (129.04) |

10,000 | 0.0033 (49.43) | 0.0017 (43.43) | 0.0004 (44.55) | 0.0002 (46.43) | 0.0001 (102.01) | 0.0000 (216.58) |

(b) 2016 Pricing Game | ||||||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | |||||

50 | 100 | 500 | 1000 | 5000 | 10,000 | |

50 | 0.0035 (1.13) | 0.0028 (1.20) | 0.0020 (1.50) | 0.0019 (1.60) | 0.0018 (3.66) | 0.0018 (6.47) |

100 | 0.0029 (1.19) | 0.0019 (1.17) | 0.0011 (1.33) | 0.0010 (1.67) | 0.0009 (3.80) | 0.0009 (7.02) |

500 | 0.0019 (1.61) | 0.0010 (1.78) | 0.0004 (1.95) | 0.0003 (2.21) | 0.0002 (5.89) | 0.0002 (10.14) |

1000 | 0.0018 (2.44) | 0.0010 (2.20) | 0.0003 (2.44) | 0.0002 (3.08) | 0.0001 (8.06) | 0.0001 (13.81) |

5000 | 0.0018 (7.86) | 0.0009 (7.06) | 0.0002 (7.97) | 0.0001 (8.15) | 0.0000 (21.27) | 0.0000 (41.11) |

10,000 | 0.0018 (15.06) | 0.0009 (13.83) | 0.0002 (14.35) | 0.0001 (14.52) | 0.0001 (32.09) | 0.0000 (69.49) |

**Table A2.**Bias and run time (s), the latter between brackets, for the extended marginal approximation of ${\mathrm{C}}_{0,1+}^{\approx}\left(0.05\right)$ on the 2015 and 2016 pricing game dataset. This is given for the rough grid version and for several different numbers of boundary values for the $\mathbb{O}$- and $\mathbb{1}$-group.

(a) 2015 Pricing Game | ||||||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | |||||

50 | 100 | 500 | 1000 | 5000 | 10,000 | |

50 | 0.0068 (5.62) | 0.0046 (5.62) | 0.0038 (6.47) | 0.0035 (7.50) | 0.0034 (15.30) | 0.0033 (27.06) |

100 | 0.0047 (5.95) | 0.0032 (6.21) | 0.0021 (6.72) | 0.0018 (7.90) | 0.0017 (16.28) | 0.0017 (29.97) |

500 | 0.0036 (7.59) | 0.0021 (7.37) | 0.0007 (8.40) | 0.0005 (9.71) | 0.0004 (22.90) | 0.0003 (38.17) |

1000 | 0.0034 (8.92) | 0.0018 (8.43) | 0.0005 (9.31) | 0.0003 (11.61) | 0.0002 (29.47) | 0.0002 (46.94) |

5000 | 0.0033 (22.17) | 0.0017 (19.92) | 0.0004 (20.60) | 0.0002 (22.67) | 0.0001 (70.84) | 0.0000 (126.75) |

10,000 | 0.0033 (38.00) | 0.0017 (34.24) | 0.0003 (35.89) | 0.0002 (37.14) | 0.0000 (78.37) | 0.0000 (226.36) |

(b) 2016 Pricing Game | ||||||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | |||||

50 | 100 | 500 | 1000 | 5000 | 10,000 | |

50 | 0.0028 (3.20) | 0.0024 (3.12) | 0.0020 (3.34) | 0.0019 (3.91) | 0.0018 (6.95) | 0.0018 (11.12) |

100 | 0.0025 (3.26) | 0.0016 (3.47) | 0.0011 (3.43) | 0.0010 (3.83) | 0.0009 (7.09) | 0.0009 (11.59) |

500 | 0.0019 (3.95) | 0.0011 (3.90) | 0.0004 (4.40) | 0.0003 (5.08) | 0.0002 (9.25) | 0.0002 (15.86) |

1000 | 0.0018 (4.76) | 0.0010 (4.79) | 0.0003 (5.05) | 0.0002 (5.95) | 0.0001 (11.99) | 0.0001 (19.85) |

5000 | 0.0018 (10.48) | 0.0009 (9.93) | 0.0002 (10.34) | 0.0001 (10.65) | 0.0000 (27.11) | 0.0000 (50.09) |

10,000 | 0.0018 (17.40) | 0.0009 (15.68) | 0.0002 (16.31) | 0.0001 (17.21) | 0.0001 (36.67) | 0.0000 (83.13) |

**Figure A1.**Weighted-mean-plot for ${\widehat{\mathrm{C}}}_{M,0,1+}^{\approx}(\lambda ,0.05)$ based on the dataset of the 2015 and 2016 pricing game. It is obtained by the original marginal approximation, using the number of boundary values that resulted in the lowest bias.

**Figure A2.**Weighted-mean-plot for ${\widehat{\mathrm{C}}}_{M,0,1+}^{\approx}(\lambda ,0.05)$ based on the dataset of the 2015 and 2016 pricing game. It is obtained by the extended marginal approximation, using the number of boundary values that resulted in the lowest bias.

## Appendix D. k-Means Approximation

**Table A3.**Bias and run time (s), the latter between brackets, for the approximation ${\widehat{\mathrm{C}}}_{kM,0,1+}^{\approx}\left(0.05\right)$ on the 2015 and 2016 pricing game dataset. This is given for the fine grid approach and for several different numbers of clusters for the $\mathbb{O}$- and $\mathbb{1}$-group.

(a) 2015 Pricing Game | |||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | ||

50 | 100 | 500 | |

50 | −0.0024 (11.20) | 0.0003 (20.28) | −0.0002 (88.08) |

100 | −0.0002 (20.37) | −0.0001 (37.68) | 0.0001 (169.08) |

500 | 0.0003 (89.36) | 0.0001 (174.90) | 0.0000 (810.69) |

(b) 2016 Pricing Game | |||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | ||

50 | 100 | 500 | |

50 | 0.0000 (5.17) | −0.0001 (7.35) | −0.0001 (28.30) |

100 | 0.0000 (7.98) | −0.0001 (13.14) | 0.0000 (55.22) |

500 | 0.0000 (31.69) | 0.0000 (56.58) | 0.0000 (261.58) |

**Table A4.**Bias and run time (s), the latter between brackets, for the approximation ${\widehat{\mathrm{C}}}_{kM,0,1+}^{\approx}\left(0.05\right)$ on the 2015 and 2016 pricing game dataset. This is given for the rough grid approach and for several different numbers of clusters for the $\mathbb{O}$- and $\mathbb{1}$-group.

(a) 2015 Pricing Game | |||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | ||

50 | 100 | 500 | |

50 | −0.0004 (25.62) | 0.0000 (27.02) | −0.0002 (28.78) |

100 | 0.0000 (35.09) | 0.0001 (39.66) | −0.0001 (38.08) |

500 | 0.0000 (107.41) | −0.0001 (118.08) | 0.0000 (121.53) |

(b) 2016 Pricing Game | |||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | ||

50 | 100 | 500 | |

50 | 0.0002 (13.19) | 0.0000 (14.89) | −0.0001 (26.54) |

100 | −0.0004 (18.17) | 0.0001 (23.12) | 0.0000 (45.57) |

500 | 0.0000 (68.81) | 0.0000 (95.58) | 0.0000 (206.14) |

**Table A5.**Bias and run time (s), the latter between brackets, for the approximation ${\widehat{\mathrm{C}}}_{ep,kM,0,1+}^{\approx}\left(0.05\right)$ on the 2015 and 2016 pricing game dataset. This is given for the fine grid approach and for several different numbers of clusters for the $\mathbb{O}$- and $\mathbb{1}$-group.

(a) 2015 Pricing Game | |||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | ||

50 | 100 | 500 | |

50 | −0.0009 (3.10) | −0.0002 (5.63) | −0.0001 (9.37) |

100 | 0.0011 (6.71) | −0.0005 (11.60) | −0.0002 (13.84) |

500 | −0.0001 (53.88) | 0.0003 (104.72) | 0.0000 (446.11) |

(b) 2016 Pricing Game | |||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | ||

50 | 100 | 500 | |

50 | −0.0026 (2.19) | 0.0001 (5.35) | −0.0018 (10.68) |

100 | 0.0001 (4.46) | −0.0003 (7.47) | −0.0007 (29.90) |

500 | 0.0005 (17.66) | −0.0008 (32.20) | −0.0007 (143.61) |

**Table A6.**Bias and run time (s), the latter between brackets, for the approximation ${\widehat{\mathrm{C}}}_{ep,kM,0,1+}^{\approx}\left(0.05\right)$ on the 2015 and 2016 pricing game dataset. This is given for the rough grid approach and for several different numbers of clusters for the $\mathbb{O}$- and $\mathbb{1}$-group.

(a) 2015 Pricing Game | |||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | ||

50 | 100 | 500 | |

50 | −0.0009 (3.33) | −0.0002 (4.30) | −0.0001 (12.27) |

100 | 0.0011 (4.22) | −0.0005 (7.74) | −0.0002 (42.37) |

500 | −0.0001 (13.38) | 0.0003 (27.05) | 0.0000 (54.50) |

(b) 2016 Pricing Game | |||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | ||

50 | 100 | 500 | |

50 | −0.0026 (2.97) | 0.0001 (3.11) | −0.0012 (9.97) |

100 | 0.0004 (3.70) | 0.0001 (7.74) | −0.0002 (18.08) |

500 | 0.0000 (11.11) | −0.0003 (30.22) | −0.0002 (82.44) |

**Figure A3.**Weighted-mean-plot for ${\widehat{\mathrm{C}}}_{kM,0,1+}^{\approx}(\lambda ,0.05)$ based on the dataset of the 2015 and 2016 pricing game. It is obtained by using 100 clusters for each group.

**Figure A4.**Weighted-mean-plot for ${\widehat{\mathrm{C}}}_{ep,kM,0,1+}^{\approx}(\lambda ,0.05)$ based on the dataset of the 2015 and 2016 pricing game. It is obtained by using the number of clusters that resulted in the lowest bias.

## Note

1 | http://cas.uqam.ca, accessed on 24 September 2021. |

## References

- Bamber, Donald. 1975. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology 12: 387–415. [Google Scholar] [CrossRef]
- Denuit, Michel, Xavier Maréchal, Sandra Pitrebois, and Jean-François Walhin. 2007. Actuarial Modelling of Claim Counts: Risk Classification, Credibility and Bonus-Malus Systems. England: John Wiley & Sons. [Google Scholar]
- Denuit, Michel, Dominik Sznajder, and Julien Trufin. 2019. Model selection based on lorenz and concentration curves, gini indices and convex order. Insurance: Mathematics and Economics 89: 128–39. [Google Scholar] [CrossRef] [Green Version]
- Frees, Edward W. 2009. Regression Modeling with Actuarial and Financial Applications. Cambridge: Cambridge University Press. [Google Scholar]
- Frees, Edward W., Richard A. Derrig, and Glenn Meyers. 2014. Predictive Modeling Applications in Actuarial Science. Cambridge: Cambridge University Press, vol. 1. [Google Scholar]
- Frees, Edward W., Glenn Meyers, and Richard A. Derrig. 2016. Predictive Modeling Applications in Actuarial Science: Volume 2, Case Studies in Insurance. Cambridge: Cambridge University Press. [Google Scholar]
- Legrand, Catherine. 2021. Advanced Survival Models. Boca Raton: CRC Press. [Google Scholar]
- Liu, Xu-Ying, Jianxin Wu, and Zhi-Hua Zhou. 2008. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39: 539–50. [Google Scholar]
- Ohlsson, Esbjörn, and Björn Johansson. 2010. Non-Life Insurance Pricing with Generalized Linear Models. Berlin and Heidelberg: Springer, vols. 74. [Google Scholar]
- Pencina, Michael J., and Ralph B. D’Agostino. 2004. Overall c as a measure of discrimination in survival analysis: Model specific population value and confidence interval estimation. Statistics in Medicine 23: 2109–23. [Google Scholar] [CrossRef] [PubMed]
- Reddy, Chandan K., and Charu C. Aggarwal. 2015. Healthcare Data Analytics. Boca Raton: CRC Press, vols. 36. [Google Scholar]
- Shi, Peng, Xiaoping Feng, and Anastasia Ivantsova. 2015. Dependent frequency—Severity modeling of insurance claims. Insurance: Mathematics and Economics 64: 417–28. [Google Scholar] [CrossRef]
- Steyerberg, Ewout W., Andrew J. Vickers, Nancy R. Cook, Thomas Gerds, Mithat Gonen, Nancy Obuchowski, Michael J. Pencina, and Michael W. Kattan. 2010. Assessing the performance of prediction models: A framework for some traditional and novel measures. Epidemiology 21: 128. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Van Oirbeek, Robin, Emmanuel Jordy Menvouta, Jolien Ponnet, and Tim Verdonck. 2021. Mcube: Multinomial multi-state micro-level reserving model. Submitted. [Google Scholar]
- Van Oirbeek, Robin, Jolien Ponnet, and Tim Verdonck. 2021. Computational efficient approximations of the concordance probability in a big data setting. Under Review. [Google Scholar]
- Wuthrich, Mario V., and Christoph Buser. 2020. Data analytics for non-life insurance pricing. In Swiss Finance Institute Research Paper. Zurich: Swiss Finance Institute, pp. 16–68. [Google Scholar]
- Yan, Guofen, and Tom Greene. 2008. Investigating the effects of ties on measures of concordance. Statistics in Medicine 27: 4190–206. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Plot of the concordance probability ${\mathrm{C}}_{0,1+}^{\approx}(\lambda ,0.05)$ in function of the exposure $\lambda $, for the frequency model based on the dataset of the 2015 pricing game.

**Figure 2.**Plot of the concordance probability ${\mathrm{C}}_{0,1+}^{\approx}(\lambda ,0.05)$ in function of the exposure $\lambda $, for the frequency model based on the dataset of the 2016 pricing game.

**Figure 3.**Weighted-mean-plot for ${\mathrm{C}}_{0,1+}^{\approx}(\lambda ,0.05)$, constructed as the weighted mean of the fine grid and the rough grid plot.

**Figure 4.**The different regions of the grid in which the concordant pairs (downward dashed region, in green), the discordant pairs (upward dashed region, in red) and incomparable pairs (upward and downward dashed region, in grey) are highlighted. This is done for the original and the extended marginal approximation.

**Figure 5.**Weighted-mean-plot for ${\widehat{\mathrm{C}}}_{M,0,1+}^{\approx}(\lambda ,0.05)$ based on the dataset of the 2015 and 2016 pricing game. It is obtained by the original marginal approximation, using the same 50 boundary values for the $\mathbb{O}$- and $\mathbb{1}$-group.

**Figure 6.**Weighted-mean-plot for ${\widehat{\mathrm{C}}}_{M,0,1+}^{\approx}(\lambda ,0.05)$ based on the dataset of the 2015 and 2016 pricing game. It is obtained by the extended marginal approximation, using 50 boundary values that can differ for the $\mathbb{O}$- and $\mathbb{1}$-group.

**Figure 7.**Weighted-mean-plot for ${\widehat{\mathrm{C}}}_{kM,0,1+}^{\approx}(\lambda ,0.05)$ based on the dataset of the 2015 and 2016 pricing game. It is obtained by using the number of clusters that resulted in the highest bias.

**Figure 8.**Weighted-mean-plot for ${\widehat{\mathrm{C}}}_{ep,kM,0,1+}^{\approx}(\lambda ,0.05)$ based on the dataset of the 2015 and 2016 pricing game. It is obtained by using the number of clusters that resulted in the highest bias.

**Table 1.**The values for $\nu $ such that x% of the absolute differences between the observed values is smaller than $\nu $. This is done for the original test set and the bootstrap version, for the datasets of both the 2015 and 2016 pricing game.

(a) 2015 Pricing Game | (b) 2016 Pricing Game | ||||||
---|---|---|---|---|---|---|---|

$\mathit{x}$ | $\mathit{x}$ | ||||||

0% | 20% | 40% | 0% | 20% | 40% | ||

test | 0.0000 | 844.11 | 2395.93 | test | 0.0000 | 377.83 | 825.09 |

bootstrap | 0.0000 | 841.44 | 2391.00 | bootstrap | 0.0000 | 376.63 | 823.88 |

**Table 2.**Computing time (s) to calculate the exact concordance probability ${\mathrm{C}}_{0,1+}^{\approx}\left(\gamma \right)$ for the frequency model on the 2015 and 2016 pricing game dataset. This is done for the fine grid, rough grid and weighted-mean-plot approach.

$\mathit{\gamma}$ | Pricing Game | Fine Grid | Rough Grid | Weighted-Mean- Plot |
---|---|---|---|---|

0.05 | 2015 | 264.58 | 320.46 | 585.04 |

2016 | 73.42 | 80.12 | 153.54 | |

0.10 | 2015 | 286.73 | 331.85 | 618.58 |

2016 | 115.86 | 132.15 | 248.00 |

**Table 3.**The number of comparable pairs that are used to exactly calculate ${\mathrm{C}}_{0,1+}^{\approx}\left(\gamma \right)$ for the frequency model on the 2015 and 2016 pricing game dataset. This is done for the fine grid, rough grid and weighted-mean-plot approach.

$\mathit{\gamma}$ | Pricing Game | Fine Grid | Rough Grid | Weighted-Mean- Plot |
---|---|---|---|---|

0.05 | 2015 | 26,539,269,735 | 26,539,269,735 | 53,078,539,470 |

2016 | 5,631,834,056 | 5,631,834,056 | 11,263,668,112 | |

0.10 | 2015 | 28,067,838,660 | 28,067,838,660 | 56,135,677,320 |

2016 | 9,023,978,424 | 9,023,978,424 | 18,047,956,848 |

**Table 4.**Bias and run time (s), the latter between brackets, for the original marginal approximation of ${\mathrm{C}}_{0,1+}^{\approx}\left(0.05\right)$ on the 2015 and 2016 pricing game dataset. This is given for the fine grid, rough grid and weighted-mean-plot approach, all for several different numbers of boundary values.

(a) 2015 Pricing Game | |||

Fine Grid | Rough Grid | Weighted Mean | |

50 | 0.0032 (2.61) | 0.0033 (5.82) | 0.0033 (8.43) |

100 | 0.0017 (2.83) | 0.0017 (5.90) | 0.0017 (8.73) |

500 | 0.0003 (6.11) | 0.0004 (7.42) | 0.0004 (13.53) |

1000 | 0.0002 (10.43) | 0.0002 (9.00) | 0.0002 (19.43) |

5000 | 0.0000 (49.08) | 0.0001 (25.58) | 0.0001 (74.66) |

10,000 | 0.0000 (100.64) | 0.0001 (47.00) | 0.0001 (147.64) |

(b) 2016 Pricing Game | |||

Fine Grid | Rough Grid | Weighted Mean | |

50 | 0.0018 (1.38) | 0.0019 (3.18) | 0.0018 (4.56) |

100 | 0.0009 (1.28) | 0.0009 (3.17) | 0.0009 (4.45) |

500 | 0.0002 (2.30) | 0.0002 (4.29) | 0.0002 (6.59) |

1000 | 0.0001 (3.89) | 0.0001 (5.61) | 0.0001 (9.50) |

5000 | 0.0001 (18.01) | 0.0001 (17.27) | 0.0000 (35.28) |

10,000 | 0.0000 (34.46) | 0.0000 (32.63) | 0.0000 (67.09) |

**Table 5.**Bias and run time (s), the latter between brackets, for the extended marginal approximation of ${\mathrm{C}}_{0,1+}^{\approx}\left(0.05\right)$ on the 2015 and 2016 pricing game dataset. This is given for the weighted-mean-plot approach and for several different numbers of boundary values for the $\mathbb{O}$- and $\mathbb{1}$-group.

(a) 2015 Pricing Game | ||||||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | |||||

50 | 100 | 500 | 1000 | 5000 | 10,000 | |

50 | 0.0068 (8.09) | 0.0045 (8.01) | 0.0037 (9.37) | 0.0035 (11.25) | 0.0034 (25.85) | 0.0033 (47.10) |

100 | 0.0048 (8.60) | 0.0032 (8.87) | 0.0020 (10.00) | 0.0018 (12.04) | 0.0017 (28.37) | 0.0017 (51.79) |

500 | 0.0036 (11.95) | 0.0021 (11.52) | 0.0007 (12.96) | 0.0005 (15.45) | 0.0004 (40.01) | 0.0003 (70.26) |

1000 | 0.0035 (15.16) | 0.0019 (14.13) | 0.0005 (16.36) | 0.0003 (19.52) | 0.0002 (53.04) | 0.0002 (89.64) |

5000 | 0.0033 (48.17) | 0.0017 (43.53) | 0.0004 (43.84) | 0.0002 (48.00) | 0.0001 (140.12) | 0.0000 (255.79) |

10,000 | 0.0033 (87.43) | 0.0017 (77.67) | 0.0003 (80.44) | 0.0002 (83.57) | 0.0001 (180.38) | 0.0000 (442.94) |

(b) 2016 Pricing Game | ||||||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | |||||

50 | 100 | 500 | 1000 | 5000 | 10,000 | |

50 | 0.0031 (4.33) | 0.0026 (4.32) | 0.0020 (4.84) | 0.0019 (5.51) | 0.0018 (10.61) | 0.0018 (17.59) |

100 | 0.0027 (4.45) | 0.0018 (4.64) | 0.0011 (4.76) | 0.0010 (5.50) | 0.0009 (10.89) | 0.0009 (18.61) |

500 | 0.0019 (5.56) | 0.0011 (5.68) | 0.0004 (6.35) | 0.0003 (7.29) | 0.0002 (15.14) | 0.0002 (26.00) |

1000 | 0.0018 (7.20) | 0.0010 (6.99) | 0.0003 (7.49) | 0.0002 (9.03) | 0.0001 (20.05) | 0.0001 (33.66) |

5000 | 0.0018 (18.34) | 0.0009 (16.99) | 0.0002 (18.31) | 0.0001 (18.80) | 0.0000 (48.38) | 0.0000 (91.20) |

10,000 | 0.0018 (32.46) | 0.0009 (29.51) | 0.0002 (30.66) | 0.0001 (31.73) | 0.0001 (68.76) | 0.0000 (152.62) |

**Table 6.**Number of comparable pairs used in the original and extended marginal approximation of ${\mathrm{C}}_{0,1+}^{\approx}\left(0.05\right)$ on the bootstrap of the predictions and observations of the 2015 and 2016 pricing game dataset. This is done for several different numbers of boundary values.

(a) 2015 Pricing Game | (b) 2016 Pricing Game | ||||
---|---|---|---|---|---|

Original | Extended | ||||

50 | 26,370,518,133 | 25,831,089,271 | 50 | 5,282,878,933 | 5,175,036,361 |

100 | 26,633,484,294 | 26,360,949,926 | 100 | 5,335,780,675 | 5,281,182,645 |

500 | 26,843,801,543 | 26,788,712,306 | 500 | 5,378,070,475 | 5,366,876,818 |

1000 | 26,870,083,565 | 26,842,431,537 | 1000 | 5,383,349,280 | 5,377,637,107 |

5000 | 26,891,057,651 | 26,885,420,717 | 5000 | 5,387,563,752 | 5,386,254,164 |

10,000 | 26,893,659,347 | 26,890,793,882 | 10,000 | 5,388,075,313 | 5,387,331,03 |

**Table 7.**Bias and run time (s), the latter between brackets, for the approximation ${\widehat{\mathrm{C}}}_{kM,0,1+}^{\approx}\left(0.05\right)$ on the 2015 and 2016 pricing game dataset. This is given for the weighted-mean-plot approach and for several different numbers of clusters for the $\mathbb{O}$- and $\mathbb{1}$-group.

(a) 2015 Pricing Game | |||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | ||

50 | 100 | 500 | |

50 | −0.0014 (36.82) | 0.0002 (47.30) | −0.0002 (116.86) |

100 | −0.0001 (55.46) | 0.0000 (77.34) | 0.0000 (207.16) |

500 | 0.0001 (196.77) | 0.0000 (292.98) | 0.0000 (932.22) |

(b) 2016 Pricing Game | |||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | ||

50 | 100 | 500 | |

50 | 0.0001 (18.36) | −0.0001 (22.24) | −0.0001 (54.84) |

100 | −0.0002 (26.15) | 0.0000 (36.26) | 0.0000 (100.79) |

500 | 0.0000 (100.5) | 0.0000 (152.16) | 0.0000 (467.72) |

**Table 8.**Bias and run time (s), the latter between brackets, for the approximation ${\widehat{\mathrm{C}}}_{ep,kM,0,1+}^{\approx}\left(0.05\right)$ on the 2015 and 2016 pricing game dataset. This is given for the weighted-mean-plot approach and for several different numbers of clusters for the $\mathbb{O}$- and $\mathbb{1}$-group.

(a) 2015 Pricing Game | |||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | ||

50 | 100 | 500 | |

50 | −0.0009 (6.43) | −0.0002 (9.93) | −0.0001 (21.64) |

100 | 0.0011 (10.93) | −0.0005 (19.34) | −0.0002 (56.21) |

500 | −0.0001 (67.26) | 0.0003 (131.77) | 0.0000 (500.61) |

(b) 2016 pricing game | |||

$\mathbb{O}$-Group | $\mathbb{1}$-Group | ||

50 | 100 | 500 | |

50 | −0.0026 (5.16) | 0.0001 (8.46) | −0.0015 (20.65) |

100 | 0.0003 (8.16) | −0.0001 (15.21) | −0.0005 (47.98) |

500 | 0.0003 (28.77) | −0.0005 (62.42) | −0.0004 (226.05) |

**Table 9.**The exact concordance probabilities together with the computing times (s) for different values for $\nu $. The upper (lower) part focuses on the bootstrap version of the test set of the 2015 (2016) pricing game.

(a) 2015 Pricing Game | |||

$\mathit{\nu}$ | |||

0 | 841.44 | 2391.00 | |

$\mathit{C}$ | 0.5175 | 0.5202 | 0.5242 |

run time | 18,420.86 | 16,403.20 | 13,190.45 |

(b) 2016 Pricing Game | |||

$\mathit{\nu}$ | |||

0 | 376.63 | 823.88 | |

$\mathit{C}$ | 0.5165 | 0.5214 | 0.5291 |

run time | 17,998.00 | 16,091.08 | 14,088.95 |

**Table 10.**The bias and run time (s), the latter between brackets, for the marginal approximation and the k-means approximation of the concordance probability, both for the dataset of the 2015 (a) and 2016 (b) pricing game their severity model.

(a) 2015 Pricing Game | |||

Marginal | $\mathit{\nu}$ | ||

0.00 | 841.44 | 2391.00 | |

50 | 0.0014 (18.19) | 0.0023 (18.48) | 0.0032 (19.17) |

100 | 0.0008 (36.16) | 0.0012 (37.24) | 0.0014 (36.88) |

500 | 0.0001 (186.86) | 0.0001 (182.93) | 0.0001 (183.34) |

1000 | 0.0001 (367.72) | 0.0001 (370.40) | 0.0000 (363.45) |

$\mathit{k}$-means | $\nu $ | ||

0.00 | 841.44 | 2391.00 | |

50 | 0.0078 (1.85) | 0.0045 (1.44) | 0.0135 (1.41) |

100 | 0.0087 (1.59) | 0.0091 (1.59) | 0.0150 (1.64) |

500 | 0.0008 (4.75) | 0.0017 (4.52) | 0.0012 (4.31) |

1000 | 0.0003 (11.34) | 0.0005 (10.69) | 0.0003 (9.64) |

(b) 2016 Pricing Game | |||

Marginal | $\mathit{\nu}$ | ||

0.00 | 376.63 | 823.88 | |

50 | 0.0010 (16.91) | 0.0023 (16.22) | 0.0024 (16.38) |

100 | 0.0010 (32.83) | 0.0017 (33.01) | 0.0020 (32.06) |

500 | 0.0003 (163.61) | 0.0005 (154.98) | 0.0006 (156.66) |

1000 | 0.0001 (313.95) | 0.0002 (316.61) | 0.0003 (329.31) |

$\mathit{k}$-means | $\nu $ | ||

0.00 | 376.63 | 823.88 | |

50 | 0.0140 (1.70) | 0.0071 (1.04) | 0.0096 (0.79) |

100 | 0.0036 (1.04) | 0.0029 (1.30) | 0.0030 (1.14) |

500 | −0.0003 (4.25) | 0.0009 (4.28) | −0.0007 (4.09) |

1000 | 0.0003 (10.28) | −0.0002 (10.00) | 0.0006 (9.11) |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ponnet, J.; Van Oirbeek, R.; Verdonck, T.
Concordance Probability for Insurance Pricing Models. *Risks* **2021**, *9*, 178.
https://doi.org/10.3390/risks9100178

**AMA Style**

Ponnet J, Van Oirbeek R, Verdonck T.
Concordance Probability for Insurance Pricing Models. *Risks*. 2021; 9(10):178.
https://doi.org/10.3390/risks9100178

**Chicago/Turabian Style**

Ponnet, Jolien, Robin Van Oirbeek, and Tim Verdonck.
2021. "Concordance Probability for Insurance Pricing Models" *Risks* 9, no. 10: 178.
https://doi.org/10.3390/risks9100178