Machine Learning and Artificial Intelligence in Non-life Insurance: Theory, Methods and Applications

A special issue of Risks (ISSN 2227-9091).

Deadline for manuscript submissions: closed (31 December 2023) | Viewed by 16090

Special Issue Editor


E-Mail Website
Guest Editor
Global Management Studies, Ted Rogers School of Management, Toronto Metropolitan University, Toronto, ON M5G 2C3, Canada
Interests: statistical machine learning; explainable data analytics; risk modeling; rate making; multivariate statistical methods; time series analysis; predictive analytics; health informatics; biosignal analysis
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data from non-life insurance complex systems share many commonalities, such as high data volume, complex data structure, high dimensionality, and multi-scale. Traditional predicting modelling, including General linear Models, Generalized Linear and Generalized Additive Models, have been widely used for predicting insurance losses and future claims. On the other hand, the use of machine learning (ML) as an emerging insurance technique is a powerful tool for predicting insurance claim frequency, insurance pricing risk analysis and management, and insurance fraud detection and prevention. Actuarial practices in insurance pricing and rate regulation have shown that the interpretability of results obtained from ML techniques is crucial for the broader application of ML and AI technologies. Therefore, much effort has been made to improve ML explainability and interpretability. However, the applications of interpretable ML techniques and explainable artificial intelligence (XAI) are still in their infancy and require further development, particularly for non-life insurance, where different advanced statistical and computational methods are applied for solving problems from actuarial perspectives. This Special Issue aims to collect outstanding research papers on building statistical or computational machine learning models that can provide good interpretability for insurance pricing, risk analysis, risk management and predictive modelling for risk, particularly in non-life insurance.

The methodology topics include, but are not limited to, the following:

  • Sparse statistical methods;
  • Interpretable statistical models;
  • High-dimension insurance data and their dimension reduction;
  • Explainable artificial neural networks;
  • Model agnostics methods;
  • Variable importance measures;
  • Machine learning algorithm for ratemaking.

Both theoretical development and applied work addressing the interpretability of models for insurance pricing, modelling and risk analysis are welcome.

Dr. Shengkun Xie
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Risks is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • explainable machine learning
  • interpretable machine learning
  • insurance pricing
  • ratemaking
  • risk analysis
  • risk management
  • non-life insurance

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

33 pages, 2978 KiB  
Article
A Generalized Linear Model and Machine Learning Approach for Predicting the Frequency and Severity of Cargo Insurance in Thailand’s Border Trade Context
by Praiya Panjee and Sataporn Amornsawadwatana
Risks 2024, 12(2), 25; https://doi.org/10.3390/risks12020025 - 30 Jan 2024
Cited by 1 | Viewed by 1412
Abstract
The study compares model approaches in predictive modeling for claim frequency and severity within the cross-border cargo insurance domain. The aim is to identify the optimal model approach between generalized linear models (GLMs) and advanced machine learning techniques. Evaluations focus on mean absolute [...] Read more.
The study compares model approaches in predictive modeling for claim frequency and severity within the cross-border cargo insurance domain. The aim is to identify the optimal model approach between generalized linear models (GLMs) and advanced machine learning techniques. Evaluations focus on mean absolute error (MAE) and root mean squared error (RMSE) metrics to comprehensively assess predictive performance. For frequency prediction, extreme gradient boosting (XGBoost) demonstrates the lowest MAE, indicating higher accuracy compared to gradient boosting machines (GBMs) and a generalized linear model (Poisson). Despite XGBoost’s lower MAE, it shows higher RMSE values, suggesting a broader error spread and larger magnitudes compared to gradient boosting machines (GBMs) and a generalized linear model (Poisson). Conversely, the generalized linear model (Poisson) showcases the best RMSE values, indicating tighter clustering and smaller error magnitudes, despite a slightly higher MAE. For severity prediction, extreme gradient boosting (XGBoost) displays the lowest MAE, implying better accuracy. However, it exhibits a higher RMSE, indicating wider error dispersion compared to a generalized linear model (Gamma). In contrast, a generalized linear model (Gamma) demonstrates the lowest RMSE, portraying tighter clustering and smaller error magnitudes despite a higher MAE. In conclusion, extreme gradient boosting (XGBoost) stands out in mean absolute error (MAE) for both frequency and severity prediction, showcasing superior accuracy. However, a generalized linear model (Gamma) offers a balance between accuracy and error magnitude, and its performance outperforms extreme gradient boosting (XGBoost) and gradient boosting machines (GBMs) in terms of RMSE metrics, with a slightly higher MAE. These findings empower insurance companies to enhance risk assessment processes, set suitable premiums, manage reserves, and accurately forecast claim occurrences, contributing to competitive pricing for clients while ensuring profitability. For cross-border trade entities, such as trucking companies and cargo owners, these insights aid in improved risk management and potential cost savings by enabling more reasonable insurance premiums based on accurate predictive claims from insurance companies. Full article
Show Figures

Figure 1

14 pages, 458 KiB  
Article
Advancing the Use of Deep Learning in Loss Reserving: A Generalized DeepTriangle Approach
by Yining Feng and Shuanming Li
Risks 2024, 12(1), 4; https://doi.org/10.3390/risks12010004 - 26 Dec 2023
Viewed by 1506
Abstract
This paper proposes a generalized deep learning approach for predicting claims developments for non-life insurance reserving. The generalized approach offers more flexibility and accuracy in solving actuarial reserving problems. It predicts claims outstanding weighted by exposure instead of loss ratio to remove subjectivity [...] Read more.
This paper proposes a generalized deep learning approach for predicting claims developments for non-life insurance reserving. The generalized approach offers more flexibility and accuracy in solving actuarial reserving problems. It predicts claims outstanding weighted by exposure instead of loss ratio to remove subjectivity associated with premium weighting. Chain-ladder predicted outstanding claims are used as part of the multi-task learning to remove the dependence on case estimates. Grid-search is introduced for hyperparameter tuning to improve model performance. Performance-wise, the Generalized DeepTriangle outperforms both traditional chain-ladder methodology, the automated machine learning approaches (AutoML), and the original DeepTriangle model. Full article
Show Figures

Figure 1

19 pages, 2795 KiB  
Article
Machine Learning in Forecasting Motor Insurance Claims
by Thomas Poufinas, Periklis Gogas, Theophilos Papadimitriou and Emmanouil Zaganidis
Risks 2023, 11(9), 164; https://doi.org/10.3390/risks11090164 - 18 Sep 2023
Cited by 3 | Viewed by 4723
Abstract
Accurate forecasting of insurance claims is of the utmost importance for insurance activity as the evolution of claims determines cash outflows and the pricing, and thus the profitability, of the underlying insurance coverage. These are used as inputs when the insurance company drafts [...] Read more.
Accurate forecasting of insurance claims is of the utmost importance for insurance activity as the evolution of claims determines cash outflows and the pricing, and thus the profitability, of the underlying insurance coverage. These are used as inputs when the insurance company drafts its business plan and determines its risk appetite, and the respective solvency capital required (by the regulators) to absorb the assumed risks. The conventional claim forecasting methods attempt to fit (each of) the claims frequency and severity with a known probability distribution function and use it to project future claims. This study offers a fresh approach in insurance claims forecasting. First, we introduce two novel sets of variables, i.e., weather conditions and car sales, and second, we employ a battery of Machine Learning (ML) algorithms (Support Vector Machines—SVM, Decision Trees, Random Forests, and Boosting) to forecast the average (mean) insurance claim per insured car per quarter. Finally, we identify the variables that are the most influential in forecasting insurance claims. Our dataset comes from the motor portfolio of an insurance company operating in Athens, Greece and spans a period from 2008 to 2020. We found evidence that the three most informative variables pertain to the new car sales with a 3-quarter and 1-quarter lag and the minimum temperature of Elefsina (one of the weather stations in Athens) with a 3-quarter lag. Among the models tested, Random Forest with limited depth and XGboost run on the 15 most informative variables, and these exhibited the best performance. These findings can be useful in the hands of insurers as they can consider the weather conditions and the new car sales among the parameters that are considered to perform claims forecasting. Full article
Show Figures

Figure 1

20 pages, 1128 KiB  
Article
Modelling Motor Insurance Claim Frequency and Severity Using Gradient Boosting
by Carina Clemente, Gracinda R. Guerreiro and Jorge M. Bravo
Risks 2023, 11(9), 163; https://doi.org/10.3390/risks11090163 - 12 Sep 2023
Cited by 2 | Viewed by 3931
Abstract
Modelling claim frequency and claim severity are topics of great interest in property-casualty insurance for supporting underwriting, ratemaking, and reserving actuarial decisions. Standard Generalized Linear Models (GLM) frequency–severity models assume a linear relationship between a function of the response variable and the predictors, [...] Read more.
Modelling claim frequency and claim severity are topics of great interest in property-casualty insurance for supporting underwriting, ratemaking, and reserving actuarial decisions. Standard Generalized Linear Models (GLM) frequency–severity models assume a linear relationship between a function of the response variable and the predictors, independence between the claim frequency and severity, and assign full credibility to the data. To overcome some of these restrictions, this paper investigates the predictive performance of Gradient Boosting with decision trees as base learners to model the claim frequency and the claim severity distributions of an auto insurance big dataset and compare it with that obtained using a standard GLM model. The out-of-sample performance measure results show that the predictive performance of the Gradient Boosting Model (GBM) is superior to the standard GLM model in the Poisson claim frequency model. Differently, in the claim severity model, the classical GLM outperformed the Gradient Boosting Model. The findings suggest that gradient boost models can capture the non-linear relation between the response variable and feature variables and their complex interactions and thus are a valuable tool for the insurer in feature engineering and the development of a data-driven approach to risk management and insurance. Full article
Show Figures

Figure 1

20 pages, 886 KiB  
Article
Estimating Territory Risk Relativity Using Generalized Linear Mixed Models and Fuzzy C-Means Clustering
by Shengkun Xie and Chong Gan
Risks 2023, 11(6), 99; https://doi.org/10.3390/risks11060099 - 24 May 2023
Viewed by 1219
Abstract
Territory risk analysis has played an important role in auto insurance rate regulation. It aims to design rating territories from a set of basic rating units so that their respective risk relativities can be estimated to reflect the regional risk of insurance. In [...] Read more.
Territory risk analysis has played an important role in auto insurance rate regulation. It aims to design rating territories from a set of basic rating units so that their respective risk relativities can be estimated to reflect the regional risk of insurance. In this work, spatially constrained clustering is first applied to insurance loss data to form such regions, using the forward sortation area (FSA) as a basic rating unit. The groupings of FSA by spatially constrained clustering reduce the insurance rate heterogeneity caused by smaller risk exposures. Furthermore, the generalized linear mixed model (GLMM) is proposed to derive the risk relativities of clusters and each FSA. In addition, as an alternative approach, fuzzy C-Means clustering is proposed to derive the risk relativity of FSA, and the obtained results are compared to the ones from GLMM. The spatially constrained clustering and risk relativity estimation help to retrieve a set of territory risk benchmarks used in rate filings within the regulation process. It also provides guidance for auto insurance companies on rate making. Full article
Show Figures

Figure 1

21 pages, 7037 KiB  
Article
Optimizing Pension Participation in Kenya through Predictive Modeling: A Comparative Analysis of Tree-Based Machine Learning Algorithms and Logistic Regression Classifier
by Nelson Kemboi Yego, Juma Kasozi and Joseph Nkurunziza
Risks 2023, 11(4), 77; https://doi.org/10.3390/risks11040077 - 18 Apr 2023
Viewed by 2134
Abstract
Pension plans play a vital role in the economy by impacting savings, consumption, and investment allocation. Despite declining mortality rates and increasing life expectancy, pension enrollment remains low, affecting the long-term financial stability and well-being of populations. To address this issue, this study [...] Read more.
Pension plans play a vital role in the economy by impacting savings, consumption, and investment allocation. Despite declining mortality rates and increasing life expectancy, pension enrollment remains low, affecting the long-term financial stability and well-being of populations. To address this issue, this study was conducted to explore the potential of predictive modeling techniques in improving pension participation. The study utilized three tree-based machine learning algorithms and a logistic regression classifier to analyze data from a nationally representative 2019 Kenya FinAccess Household Survey. The results indicated that ensemble tree-based models, particularly the random forest model, were the most effective in predicting pension enrollment. The study identified the key factors that influenced enrollment, such as National Health Insurance Fund (NHIF) usage, monthly income, and bank usage. The findings suggest that collaboration among the NHIF, banks, and pension providers is necessary to increase pension uptake, along with increased financial education for citizens. The study provides valuable insight for promoting and optimizing pension participation. Full article
Show Figures

Figure 1

Back to TopTop