#
Estimating Territory Risk Relativity Using Generalized Linear Mixed Models and Fuzzy C-Means Clustering^{ †}

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. Materials and Methods

#### 3.1. Data

#### 3.2. Spatially Constrained K-Means Clustering

- A standard K-Means clustering was conducted, as an initial clustering, so that a set of clusters could be obtained.
- Based on the results obtained from the previous step, we searched all points that were entirely surrounded by points from other clusters. These points were denoted by non-contiguous points.
- The neighboring point at a minimal distance to the point that had no neighbors in the same cluster was found by performing a search.
- The points that had no neighbors were then reallocated to new clusters, and this process was continued until all clusters were formed into Delaunay Triangulations.

#### 3.3. Generalized Linear and Generalized Linear Mixed Models

#### 3.4. Estimating Risk Relativity via Fuzzy C-Means Clustering

#### 3.5. Discussion

## 4. Results

## 5. Concluding Remarks

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Aktas, Nihal, and Selcuk Cebi. 2022. Fraud Detection Using Fuzzy C Means. In Intelligent and Fuzzy Techniques for Emerging Conditions and Digital Transformation: Proceedings of the INFUS 2021 Conference, Istanbul, Turkey, August 24–26. Cham: Springer International Publishing, vol. 1, pp. 90–96. [Google Scholar]
- Ansari, Azarnoush, and Arash Riasi. 2016. Customer clustering using a combination of fuzzy C-means and genetic algorithms. International Journal of Business and Management 11: 59–66. [Google Scholar] [CrossRef]
- Antonio, Katrien, and Jan Beirlant. 2007. Actuarial statistics with generalized linear mixed models. Insurance: Mathematics and Economics 40: 58–76. [Google Scholar] [CrossRef]
- Bhowmik, Rekha. 2011. Detecting auto insurance fraud by data mining techniques. Journal of Emerging Trends in Computing and Information Sciences 2: 156–162. [Google Scholar]
- Blais, Philippe, Thierry Badard, Thierry Duchesne, and Marie-Pier Côté. 2020. From Massive Trajectory Data to Traffic Modeling for Better Behavior Prediction in a Usage-Based Insurance Context. ISPRS International Journal of Geo-Information 9: 722. [Google Scholar] [CrossRef]
- Brubaker, Randall E. 1996. Geographic Rating of Individual Risk Transfer Costs Without Territorial Boundaries. Casualty Actuarial Society Forum, 97–127. [Google Scholar]
- David, Mihaela. 2015. Auto insurance premium calculation using generalized linear models. Procedia Economics and Finance 20: 147–56. [Google Scholar] [CrossRef]
- De Andres, Javier, Pedro Lorca, Francisco Javier de Cos Juez, and Fernando Sánchez-Lasheras. 2011. Bankruptcy forecasting: A hybrid approach using Fuzzy C-means clustering and Multivariate Adaptive Regression Splines (MARS). Expert Systems with Applications 38: 1866–75. [Google Scholar] [CrossRef]
- Dean, C. B., and Jason D. Nielsen. 2007. Generalized linear mixed models: A review and some extensions. Lifetime Data Analysis 13: 497–512. [Google Scholar] [CrossRef]
- Dhieb, Najmeddine, Hakim Ghazzai, Hichem Besbes, and Yehia Massoud. 2019. Extreme gradient boosting machine learning algorithm for safe auto insurance operations. Paper presented at the 2019 IEEE International Conference on Vehicular Electronics and Safety (ICVES), Cairo, Egypt, September 4–6; pp. 1–5. [Google Scholar]
- Fang, Zhihan, Guang Yang, Dian Zhang, Xiaoyang Xie, Guang Wang, Yu Yang, and Desheng Zhang. 2021. MoCha: Large-scale driving pattern characterization for usage-based insurance. Paper presented at the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, August 14–18; pp. 2849–57. [Google Scholar]
- Goldburd, Mark, Anand Khare, Dan Tevet, and Dmitriy Guller. 2016. Generalized Linear Models for Insurance Rating. CAS Monographs Series 5; Arlington County: Casualty Actuarial Society. [Google Scholar]
- Grubesic, Tony H. 2008. Zip codes and spatial analysis: Problems and prospects. Socio-Economic Planning Sciences 42: 129–49. [Google Scholar] [CrossRef]
- Halder, Aritra, Shariq Mohammed, Kun Chen, and Dipak K. Dey. 2021. Spatial Tweedie exponential dispersion models: An application to insurance rate-making. Scandinavian Actuarial Journal 2021: 1017–36. [Google Scholar] [CrossRef]
- Hanafy, Mohamed, and Ruixing Ming. 2021. Machine learning approaches for auto insurance big data. Risks 9: 42. [Google Scholar] [CrossRef]
- Jafarzadeh, Ali Akbar, Ali Mahdavi, and Heydar Jafarzadeh. 2017. Evaluation of forest fire risk using the Apriori algorithm and fuzzy C-means clustering. Journal of forest Science 63: 370–380. [Google Scholar] [CrossRef]
- Jennings, Philip J. 2008. Using cluster analysis to define geographical rating territories. Applying Multivariate Statistical Models 34. [Google Scholar]
- Jeong, Himchan, Emiliano A. Valdez, Jae Youn Ahn, and Sojung Park. 2017. Generalized Linear Mixed Models for Dependent Compound Risk Models. SSRN 3045360. Available online: https://ssrn.com/abstract=3045360 (accessed on 1 February 2023). [CrossRef]
- Jiang, Jiming, and Thuan Nguyen. 2007. Linear and Generalized Linear Mixed Models and Their Applications. New York: Springer, vol. 1. [Google Scholar]
- Kafková, Silvie, and Lenka Křivánková. 2014. Generalized linear models in vehicle insurance. Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis 62: 383–88. [Google Scholar] [CrossRef]
- Litman, Todd. 2018. Toward more comprehensive evaluation of traffic risks and safety strategies. Research in Transportation Business & Management 29: 127–35. [Google Scholar]
- Ma, Yu-Luen, Xiaoyu Zhu, Xianbiao Hu, and Yi-Chang Chiu. 2018. The use of context-sensitive insurance telematics data in auto insurance rate making. Transportation Research Part A: Policy and Practice 113: 243–58. [Google Scholar] [CrossRef]
- Majhi, Santosh Kumar. 2021. Fuzzy clustering algorithm based on modified whale optimization algorithm for automobile insurance fraud detection. Evolutionary Intelligence 14: 35–46. [Google Scholar] [CrossRef]
- Nasseh, Kamyar, John R. Bowblis, and Marko Vujicic. 2021. Pricing in commercial dental insurance and provider markets. Health Services Research 56: 25–35. [Google Scholar] [CrossRef]
- Nian, Ke, Haofan Zhang, Aditya Tayal, Thomas Coleman, and Yuying Li. 2016. Auto insurance fraud detection using unsupervised spectral ranking for anomaly. The Journal of Finance and Data Science 2: 58–75. [Google Scholar] [CrossRef]
- Pranavi, P. Sai, H. D. Sheethal, Sharanya S. Kumar, Sonika Kariappa, and B. H. Swathi. 2020. Analysis of Vehicle Insurance Data to Detect Fraud using Machine Learning. International Journal for Research in Applied Science & Engineering Technology (IJRASET) 8: 2033–38. [Google Scholar]
- Regan, Laureen, Sharon Tennyson, and Mary A. Weiss. 2008. The Relationship Between Auto Insurance Rate Regulation and Insured Loss Costs: An Empirical Analysis. Journal of Insurance Regulation 27: 23–46. [Google Scholar]
- Stankevich, Ivan, Konstantin Korishchenko, Nikolay Pilnik, and Daria Petrova. 2022. Usage-based vehicle insurance: Driving style factors of accident probability and severity. Journal of Transportation Safety & Security 14: 1633–54. [Google Scholar]
- Stroup, Walter W. 2012. Generalized Linear Mixed Models: Modern Concepts, Methods and Applications. Boca Raton: CRC Press. [Google Scholar]
- Subudhi, Sharmila, and Suvasini Panigrahi. 2020. Two-Stage Automobile Insurance Fraud Detection by Using Optimized Fuzzy C-Means Clustering and Supervised Learning. International Journal of Information Security and Privacy (IJISP) 14: 18–37. [Google Scholar] [CrossRef]
- Sun, Meng, and Yi Lu. 2022. A Generalized Linear Mixed Model for Data Breaches and Its Application in Cyber Insurance. Risks 10: 224. [Google Scholar] [CrossRef]
- Thakur, Sweta S., and Jamuna Kanta Sing. 2013. Mining Customer’s Data for Vehicle Insurance Prediction System using K-Means Clustering-An Application. International Journal of Computer Applications in Engineering Sciences 3: 148. [Google Scholar]
- Xie, Shengkun. 2019. Defining Geographical Rating Territories in Auto Insurance Regulation by Spatially Constrained Clustering. Risks 7: 42. [Google Scholar] [CrossRef]
- Xie, Shengkun, and Anna T. Lawniczak. 2018. Estimating major risk factor relativities in rate filings using generalized linear models. International Journal of Financial Studies 6: 84. [Google Scholar] [CrossRef]
- Xie, Shengkun, and Chong Gan. 2022. Fuzzy Clustering and Non-negative Sparse Matrix Approximation on Estimating Territory Risk Relativities. Paper presented at the 2022 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Padua, Italy, July 18–23; pp. 1–8. [Google Scholar]
- Xie, Shengkun, Chong Gan, and Clare Chua-Chow. 2021. Estimating Territory Risk Relativity for Auto Insurance Rate Regulation using Generalized Linear Mixed Models. DATA Conference, 329–34. [Google Scholar]
- Yan, Chun, Jiahui Liu, Wei Liu, and Xinhong Liu. 2021. Research on automobile insurance fraud identification based on fuzzy association rules. Journal of Intelligent & Fuzzy Systems 41: 5821–34. [Google Scholar]
- Yao, Ji. 2008. Clustering in Ratemaking: Applications in Territories Clustering. Casualty Actuarial Society Discussion Paper Program. Arlington: Casualty Actuarial Society, pp. 170–92. [Google Scholar]
- Yau, Kelvin, Karen Yip, and H. K. Yuen. 2003. Modelling repeated insurance claim frequency data using the generalized linear mixed model. Journal of Applied Statistics 30: 857–65. [Google Scholar] [CrossRef]
- Yeo, Ai Cheo, Kate Amanda Smith, Robert J. Willis, and Malcolm Brooks. 2003. A comparison of soft computing and traditional approaches for risk classification and claim cost prediction in the automobile insurance industry. In Soft Computing in Measurement and Information Acquisition. Berlin and Heidelberg: Springer, pp. 249–61. [Google Scholar]

**Figure 1.**The empirical estimate of the risk relativity for the obtained five clusters. The triangle indicates the center of the fuzzy cluster.

**Figure 2.**The GLM estimate of the risk relativity for the obtained five clusters. The triangle in orange indicates the center of the fuzzy cluster.

**Figure 3.**The GLMM estimate of the risk relativity for the obtained five clusters. The cluster is the fixed effect and the city is considered as a random effect. The triangle in orange indicates the center of the fuzzy cluster.

**Figure 4.**The risk relativities of FSA that are included in one of the clusters shown in Figure 1 (i.e., black cluster), obtained from the fuzzy C-Means clustering approach.

**Figure 5.**The risk relativities of FSA that are included in one of the clusters shown in Figure 1 (i.e., red cluster), obtained from the fuzzy C-Means clustering approach.

**Figure 6.**The risk relativities of FSA that are included in one of the clusters shown in Figure 1 (i.e., green cluster), obtained from the fuzzy C-Means clustering approach.

**Figure 7.**The risk relativities of FSA that are included in one of the clusters shown in Figure 1 (i.e., blue cluster), obtained from the fuzzy C-Means clustering approach.

**Figure 8.**The risk relativities of FSAs that are included in one of the clusters shown in Figure 1 (i.e., light blue cluster), obtained from fuzzy C-Means clustering approach.

**Figure 9.**The plots show the FSA risk relativity for different numbers of clusters in fuzzy C-Means clustering. The black color represents the empirical estimates of FSA risk relativity, while the colored plots indicate the use of different numbers of clusters. (

**a**) 5 clusters. (

**b**) 10 clusters. (

**c**) 15 clusters. (

**d**) 20 clusters.

**Figure 10.**The smoothing errors are plotted in terms of MAD and RMSE by different clusters and different models and approaches. The results from (

**a**–

**d**) correspond to the spatially contained K-Means clustering and GLM and GLMM, while the results from (

**e**,

**f**) are from fuzzy C-Means clustering. (

**a**) MAD; GLM. (

**b**) RMSE; GLM. (

**c**) MAD; GLMM. (

**d**) RMSE; GLMM. (

**e**) MAD; Fuzzy C-Means. (

**f**) RMSE; Fuzzy C-Means.

**Table 1.**The GLM estimates of risk relativities for the obtained five clusters, using Gaussian, Poisson, Gamma and Inverse Gaussian error functions, along with AICs and BICs.

Relativity | Gaussian | Poisson | Gamma | Inverse Gaussian |
---|---|---|---|---|

cluster 1 | 0.87 | 0.87 | 0.87 | 0.87 |

cluster 2 | 0.56 | 0.56 | 0.56 | 0.56 |

cluster 3 | 0.76 | 0.76 | 0.76 | 0.76 |

cluster 4 | 1.25 | 1.25 | 1.25 | 1.25 |

cluster 5 | 1.55 | 1.55 | 1.55 | 1.55 |

AIC | 2403.75 | 324,546,794.5 | 30,078,415.55 | 31,491,160.07 |

BIC | 2421.82 | 324,546,809.5 | 30,078,433.62 | 31,491,178.14 |

**Table 2.**RMSE and MAD of the relativity for selected number of clusters 5, 10, 15, 20, using GLM and GLMM.

GLM | ||||

Number of Clusters | 5 | 10 | 15 | 20 |

RMSE | 0.0405 | 0.0464 | 0.0717 | 0.0731 |

MAD | 0.0360 | 0.0383 | 0.0443 | 0.0494 |

GLMM | ||||

Number of Clusters | 5 | 10 | 15 | 20 |

RMSE | 0.1254 | 0.1886 | 0.0729 | 0.0862 |

MAD | 0.1120 | 0.1620 | 0.0443 | 0.0576 |

**Table 3.**The membership coefficients from five-cluster fuzzy C-Means clustering. The bold value indicates the dominant cluster for the selected FSA that we used for illustration purposes.

FSA-ID | Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 | Cluster 5 |
---|---|---|---|---|---|

1 | 0.0039 | 0.0003 | 0.0007 | 0.0001 | 0.9950 |

2 | 0.9835 | 0.0013 | 0.0047 | 0.0003 | 0.0102 |

3 | 0.0245 | 0.8135 | 0.1405 | 0.0102 | 0.0113 |

4 | 0.9994 | 0.0001 | 0.0002 | 0.0000 | 0.0003 |

5 | 0.9910 | 0.0007 | 0.0027 | 0.0002 | 0.0054 |

6 | 0.9315 | 0.0070 | 0.0344 | 0.0014 | 0.0257 |

7 | 0.9989 | 0.0001 | 0.0004 | 0.0000 | 0.0006 |

8 | 0.0083 | 0.0058 | 0.9829 | 0.0006 | 0.0024 |

9 | 0.4262 | 0.0484 | 0.4545 | 0.0074 | 0.0635 |

10 | 0.7713 | 0.0239 | 0.1437 | 0.0042 | 0.0569 |

11 | 0.9985 | 0.0001 | 0.0006 | 0.0000 | 0.0008 |

12 | 0.9987 | 0.0001 | 0.0005 | 0.0000 | 0.0007 |

13 | 0.0247 | 0.8120 | 0.1417 | 0.0103 | 0.0114 |

14 | 0.0868 | 0.0118 | 0.0254 | 0.0039 | 0.8720 |

15 | 0.9912 | 0.0008 | 0.0036 | 0.0002 | 0.0043 |

16 | 0.0351 | 0.0954 | 0.8509 | 0.0057 | 0.0130 |

17 | 0.9984 | 0.0001 | 0.0006 | 0.0000 | 0.0008 |

18 | 0.5525 | 0.0422 | 0.3307 | 0.0068 | 0.0679 |

19 | 0.0430 | 0.0205 | 0.9232 | 0.0023 | 0.0110 |

20 | 0.1546 | 0.0419 | 0.7653 | 0.0054 | 0.0328 |

**Table 4.**The membership coefficients from six-cluster fuzzy C-Means clustering. The bold value indicates the dominant cluster for the selected FSA that we used for illustration purposes.

FSA-ID | Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 | Cluster 5 | Cluster 6 |
---|---|---|---|---|---|---|

1 | 0.0676 | 0.0045 | 0.0117 | 0.0006 | 0.9136 | 0.0020 |

2 | 0.8473 | 0.0162 | 0.0912 | 0.0012 | 0.0390 | 0.0050 |

3 | 0.0292 | 0.4298 | 0.0721 | 0.0142 | 0.0165 | 0.4382 |

4 | 0.6641 | 0.0334 | 0.2340 | 0.0023 | 0.0565 | 0.0097 |

5 | 0.8134 | 0.0197 | 0.1155 | 0.0015 | 0.0439 | 0.0060 |

6 | 0.3118 | 0.0488 | 0.5769 | 0.0028 | 0.0472 | 0.0125 |

7 | 0.6536 | 0.0343 | 0.2430 | 0.0023 | 0.0570 | 0.0099 |

8 | 0.0394 | 0.6450 | 0.2639 | 0.0041 | 0.0163 | 0.0313 |

9 | 0.0033 | 0.0039 | 0.9912 | 0.0001 | 0.0009 | 0.0006 |

10 | 0.0758 | 0.0284 | 0.8721 | 0.0013 | 0.0161 | 0.0063 |

11 | 0.6457 | 0.0349 | 0.2498 | 0.0024 | 0.0572 | 0.0100 |

12 | 0.6483 | 0.0347 | 0.2475 | 0.0024 | 0.0572 | 0.0100 |

13 | 0.0292 | 0.4317 | 0.0722 | 0.0142 | 0.0165 | 0.4362 |

14 | 0.0802 | 0.0122 | 0.0256 | 0.0021 | 0.8740 | 0.0060 |

15 | 0.5642 | 0.0407 | 0.3224 | 0.0027 | 0.0587 | 0.0114 |

16 | 0.0011 | 0.9914 | 0.0041 | 0.0002 | 0.0005 | 0.0027 |

17 | 0.6430 | 0.0351 | 0.2521 | 0.0024 | 0.0573 | 0.0101 |

18 | 0.0017 | 0.0014 | 0.9962 | 0.0000 | 0.0004 | 0.0002 |

19 | 0.0527 | 0.4171 | 0.4744 | 0.0044 | 0.0205 | 0.0309 |

20 | 0.0443 | 0.1485 | 0.7724 | 0.0026 | 0.0155 | 0.0166 |

**Table 5.**The membership coefficients from ten-cluster fuzzy C-Means clustering. The bold value indicates the dominant cluster for the selected FSA that we used for illustration purposes.

FSA-ID | Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 | Cluster 5 | Cluster 6 | Cluster 7 | Cluster 8 | Cluster 9 | Cluster 10 |
---|---|---|---|---|---|---|---|---|---|---|

1 | 0.0120 | 0.0022 | 0.0049 | 0.0002 | 0.7771 | 0.0006 | 0.1585 | 0.0323 | 0.0012 | 0.0109 |

2 | 0.6325 | 0.0069 | 0.0292 | 0.0004 | 0.0158 | 0.0011 | 0.0454 | 0.2618 | 0.0028 | 0.0041 |

3 | 0.0026 | 0.0197 | 0.0049 | 0.0007 | 0.0011 | 0.0054 | 0.0014 | 0.0018 | 0.9616 | 0.0007 |

4 | 0.9737 | 0.0010 | 0.0048 | 0.0000 | 0.0015 | 0.0001 | 0.0037 | 0.0143 | 0.0004 | 0.0004 |

5 | 0.7338 | 0.0060 | 0.0260 | 0.0003 | 0.0125 | 0.0009 | 0.0348 | 0.1801 | 0.0023 | 0.0034 |

6 | 0.8086 | 0.0112 | 0.0864 | 0.0005 | 0.0096 | 0.0014 | 0.0206 | 0.0549 | 0.0037 | 0.0032 |

7 | 0.9803 | 0.0007 | 0.0037 | 0.0000 | 0.0011 | 0.0001 | 0.0028 | 0.0106 | 0.0003 | 0.0003 |

8 | 0.0363 | 0.7155 | 0.1597 | 0.0017 | 0.0086 | 0.0071 | 0.0129 | 0.0194 | 0.0345 | 0.0043 |

9 | 0.0001 | 0.0001 | 0.9997 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |

10 | 0.3273 | 0.0317 | 0.5129 | 0.0010 | 0.0154 | 0.0031 | 0.0295 | 0.0642 | 0.0092 | 0.0057 |

11 | 0.9846 | 0.0006 | 0.0030 | 0.0000 | 0.0009 | 0.0001 | 0.0022 | 0.0082 | 0.0002 | 0.0003 |

12 | 0.9832 | 0.0006 | 0.0032 | 0.0000 | 0.0010 | 0.0001 | 0.0024 | 0.0089 | 0.0002 | 0.0003 |

13 | 0.0027 | 0.0202 | 0.0050 | 0.0007 | 0.0011 | 0.0056 | 0.0015 | 0.0019 | 0.9606 | 0.0007 |

14 | 0.0326 | 0.0094 | 0.0174 | 0.0013 | 0.4828 | 0.0030 | 0.1193 | 0.0591 | 0.0055 | 0.2696 |

15 | 0.9989 | 0.0000 | 0.0003 | 0.0000 | 0.0001 | 0.0000 | 0.0001 | 0.0005 | 0.0000 | 0.0000 |

16 | 0.0124 | 0.8682 | 0.0319 | 0.0014 | 0.0040 | 0.0069 | 0.0056 | 0.0078 | 0.0597 | 0.0022 |

17 | 0.9859 | 0.0005 | 0.0028 | 0.0000 | 0.0008 | 0.0001 | 0.0020 | 0.0074 | 0.0002 | 0.0002 |

18 | 0.0280 | 0.0092 | 0.9434 | 0.0002 | 0.0025 | 0.0007 | 0.0044 | 0.0084 | 0.0022 | 0.0010 |

19 | 0.0626 | 0.4502 | 0.3660 | 0.0023 | 0.0132 | 0.0089 | 0.0202 | 0.0315 | 0.0388 | 0.0063 |

20 | 0.0608 | 0.1410 | 0.7125 | 0.0014 | 0.0104 | 0.0052 | 0.0166 | 0.0272 | 0.0201 | 0.0047 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Xie, S.; Gan, C.
Estimating Territory Risk Relativity Using Generalized Linear Mixed Models and Fuzzy *C*-Means Clustering. *Risks* **2023**, *11*, 99.
https://doi.org/10.3390/risks11060099

**AMA Style**

Xie S, Gan C.
Estimating Territory Risk Relativity Using Generalized Linear Mixed Models and Fuzzy *C*-Means Clustering. *Risks*. 2023; 11(6):99.
https://doi.org/10.3390/risks11060099

**Chicago/Turabian Style**

Xie, Shengkun, and Chong Gan.
2023. "Estimating Territory Risk Relativity Using Generalized Linear Mixed Models and Fuzzy *C*-Means Clustering" *Risks* 11, no. 6: 99.
https://doi.org/10.3390/risks11060099