# Explaining Deep Learning Models for Credit Scoring with SHAP: A Case Study Using Open Banking Data

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Literature Review

## 2. Data

#### 2.1. Textual Descriptions of Transactions

#### 2.2. Preprocessing

## 3. Methods and Models

#### 3.1. Text Classification Models

#### 3.1.1. A Deep Text Classification Model Trained from Scratch

#### Text Vectorization Layer

#### Text Embedding Layer

#### Convolutional Layer

#### Global Max Pooling Layer

#### Dropout Layer

#### Output Layer

#### 3.2. A Deep Text Classification Model Based on Transfer Learning Using BERT

#### 3.3. SHAP

- -
- Symmetry: If two tokens contribute equally to all possible coalitions, their contribution value is the same.
- -
- Efficiency: The sum of all Shapley values fully explains the gain or loss.
- -
- Dummy: A token that does not affect the result of the model has a contribution value of zero.
- -
- Additivity: When the output of a model is the additive result of two intermediate outputs, the new Shapley value is the sum of both intermediate Shapley values.

#### 3.4. Evaluation Metrics: AUC and Brier Score

## 4. Results

#### 4.1. Model Performance—Discriminatory Power (AUC) and Calibration (Brier Score)

#### 4.2. Model and Prediction Explainability (SHAP)

#### 4.2.1. Global Explainability

#### 4.2.2. Local Explainability

## 5. Discussion

## 6. Conclusions and Implications

#### Limitations and Further Research

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Aas, Kjersti, Martin Jullum, and Anders Løland. 2021. Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artificial Intelligence 298: 103502. [Google Scholar] [CrossRef]
- Acheampong, Francisca Adoma, Henry Nunoo-Mensah, and Wenyu Chen. 2021. Transformer models for text-based emotion detection: A review of BERT-based approaches. Artificial Intelligence Review 54: 5789–829. [Google Scholar] [CrossRef]
- Adadi, Amina, and Mohammed Berrada. 2018. Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access 6: 52138–60. [Google Scholar]
- Addo, Peter Martey, Dominique Guegan, and Bertrand Hassani. 2018. Credit risk analysis using machine and deep learning models. Risks 6: 38. [Google Scholar] [CrossRef][Green Version]
- Ala’raj, Maher, Maysam F. Abbod, Munir Majdalawieh, and Luay Jum’a. 2022. A deep learning model for behavioural credit scoring in banks. Neural Computing and Applications 34: 5839–66. [Google Scholar]
- Bach, Sebastian, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLoS ONE 10: e0130140. [Google Scholar] [CrossRef][Green Version]
- Baesens, Bart, Tony Van Gestel, Stijn Viaene, Maria Stepanova, Johan Suykens, and Jan Vanthienen. 2003. Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the Operational Research Society 54: 627–35. [Google Scholar] [CrossRef]
- Brier, Glenn W. 1950. Verification of forecasts expressed in terms of probability. Monthey Weather Review 78: 1–3. [Google Scholar] [CrossRef]
- Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, and Amanda Askell. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33: 1877–901. [Google Scholar]
- Chattopadhay, Aditya, Anirban Sarkar, Prantik Howlader, and Vineeth N. Balasubramanian. 2018. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. Paper presented at the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, March 12–15. [Google Scholar]
- Chen, Hanjie, Guangtao Zheng, and Yangfeng Ji. 2020. Generating hierarchical explanations on text classification via feature interaction detection. arXiv arXiv:2004.02015. [Google Scholar]
- Chorowski, Jan K., Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-based models for speech recognition. Advances in Neural Information Processing Systems 28: 577–85. [Google Scholar]
- Desai, Vijay S., Jonathan N. Crook, and George A. Overstreet Jr. 1996. A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operational Research 95: 24–37. [Google Scholar] [CrossRef]
- Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv arXiv:1810.04805. [Google Scholar]
- Fadel, Soufiane. 2022. Explainable Machine Learning, Game Theory, and Shapley Values: A Technical Review; Ottawa: Statistics Canada. Available online: https://www.statcan.gc.ca/en/data-science/network/explainable-learning (accessed on 1 February 2023).
- Frye, Christopher, Colin Rowat, and Ilya Feige. 2020. Asymmetric shapley values: Incorporating causal knowledge into model-agnostic explainability. Advances in Neural Information Processing Systems 33: 1229–39. [Google Scholar]
- Gunnarsson, Björn Rafn, Seppe vanden Broucke, Bart Baesens, María Óskarsdóttir, and Wilfried Lemahieu. 2021. Deep learning for credit scoring: Do or don’t? European Journal of Operational Research 295: 292–305. [Google Scholar] [CrossRef]
- Hamori, Shigeyuki, Minami Kawai, Takahiro Kume, Yuji Murakami, and Chikara Watanabe. 2018. Ensemble learning or deep learning? Application to default risk analysis. Journal of Risk and Financial Management 11: 12. [Google Scholar] [CrossRef][Green Version]
- Hand, David J., and William E. Henley. 1997. Statistical classification methods in consumer credit scoring: A review. Journal of the Royal Statistical Society: Series A (Statistics in Society) 160: 523–41. [Google Scholar] [CrossRef]
- Henley, WEm, and David J. Hand. 1996. AK-Nearest-Neighbour Classifier for Assessing Consumer Credit Risk. Journal of the Royal Statistical Society: Series D (The Statistician) 45: 77–95. [Google Scholar] [CrossRef]
- Hjelkrem, Lars Ole, Petter Eilif De Lange, and Erik Nesset. 2022a. An end-to-end deep learning approach to credit scoring using CNN + XGBoost on transaction data. Journal of Risk Model Validation 16: 37–62. [Google Scholar] [CrossRef]
- Hjelkrem, Lars Ole, Petter Eilif De Lange, and Erik Nesset. 2022b. The Value of Open Banking Data for Application Credit Scoring: Case Study of a Norwegian Bank. Journal of Risk and Financial Management 15: 597. [Google Scholar] [CrossRef]
- Howard, Jeremy, and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. arXiv arXiv:1801.06146. [Google Scholar]
- Itti, Laurent, Christof Koch, and Ernst Niebur. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20: 1254–59. [Google Scholar] [CrossRef][Green Version]
- Kriebel, Johannes, and Lennart Stitz. 2022. Credit default prediction from user-generated text in peer-to-peer lending using deep learning. European Journal of Operational Research 302: 309–23. [Google Scholar] [CrossRef]
- Kvamme, Håvard, Nikolai Sellereite, Kjersti Aas, and Steffen Sjursen. 2018. Predicting mortgage default using convolutional neural networks. Expert Systems with Applications 102: 207–17. [Google Scholar] [CrossRef][Green Version]
- LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521: 436–44. [Google Scholar] [CrossRef]
- Lessmann, Stefan, Bart Baesens, Hsin-Vonn Seow, and Lyn C. Thomas. 2015. Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research 247: 124–36. [Google Scholar] [CrossRef][Green Version]
- Lundberg, Scott, and Su-In Lee. 2016. An unexpected unity among methods for interpreting model predictions. arXiv arXiv:1611.07478. [Google Scholar]
- Lundberg, Scott M., and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems 30: 4765–74. [Google Scholar]
- Lundberg, Scott M., Gabriel G. Erion, and Su-In Lee. 2018. Consistent individualized feature attribution for tree ensembles. arXiv arXiv:1802.03888. [Google Scholar]
- Mai, Feng, Shaonan Tian, Chihoon Lee, and Ling Ma. 2019. Deep learning models for bankruptcy prediction using textual disclosures. European Journal of Operational Research 274: 743–58. [Google Scholar] [CrossRef]
- Melsom, Borger, Christian B. Vennerød, Petter de Lange, Lars Ole Hjelkrem, and Sjur Westgaard. 2022. Explainable artificial intelligence for credit scoring in banking. Journal of Risk 25. [Google Scholar] [CrossRef]
- Owen, Guilliermo. 1977. Values of games with a priori unions. In Mathematical Economics and Game Theory: Essays in Honor of Oskar Morgenstern. Berlin/Heidelberg: Springer. [Google Scholar]
- Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. Paper presented at 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17. [Google Scholar]
- Selvaraju, Ramprasaath R., Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. Paper presented at IEEE International Conference on Computer Vision, Venice, Italy, October 22–29. [Google Scholar]
- Shapley, Lloyd S. 1952. A Value for n-Person Games. Santa Monica, CA: RAND Corporation. Available online: https://www.rand.org/pubs/papers/P295.html (accessed on 1 February 2023).
- Shrikumar, Avanti, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. Paper presented at the International Conference on Machine Learning, Sydney, Australia, August 6–11. [Google Scholar]
- Simonyan, Karen, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv arXiv:1312.6034. [Google Scholar]
- Springenberg, Jost Tobias, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. 2014. Striving for simplicity: The all convolutional net. arXiv arXiv:1412.6806. [Google Scholar]
- Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15: 1929–58. [Google Scholar] [CrossRef]
- Stevenson, Matthew, Christophe Mues, and Cristián Bravo. 2021. The value of text for small business default prediction: A deep learning approach. European Journal of Operational Research 295: 758–71. [Google Scholar] [CrossRef]
- Thomas, Lyn, Jonathan Crook, and David Edelman. 2017. Credit Scoring and Its Applications. Philadelphia: Siam, vol. 2. [Google Scholar]
- Uddin, Mohammad Shamsu, Guotai Chi, Tabassum Habib, and Ying Zhou. 2019. An alternative statistical framework for credit default prediction. Journal of Risk Model Validation 14: 65–101. [Google Scholar] [CrossRef]
- Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30: 5998–6008. [Google Scholar]
- West, David. 2000. Neural network credit scoring models. Computers & Operations Research 27: 1131–52. [Google Scholar] [CrossRef]
- Yobas, Mumine B., Jonathan N. Crook, and Peter Ross. 2000. Credit scoring using neural and evolutionary techniques. IMA Journal of Management Mathematics 11: 111–25. [Google Scholar] [CrossRef][Green Version]
- Yuan, Hao, Haiyang Yu, Jie Wang, Kang Li, and Shuiwang Ji. 2021. On explainability of graph neural networks via subgraph explorations. Paper presented at the International Conference on Machine Learning, Virtual, July 18–24. [Google Scholar]

Dataset | Years | No. of Observations |
---|---|---|

Training data | 2009–2017 | 124,142 |

Validation data | 2009–2017 | 20,686 |

Test data | 2020 | 31,030 |

Examples of Textual Descriptions |
---|

‘04.11 HOLE KJØTT AS K. WILHELMSG ÅLESUND’ ‘07.11 LARSGÅRDEN POST LARSGÅRDSV19 ÅLESUND’ ‘14.11 FAVORITTEN A/S LARSGÅRDSVN ÅLESUND’ |

‘20.11 VINMONOPOLET STORMOA ÅLESUND’ |

Preprocessed Textual Descriptions |
---|

‘hole kjoett as k wilhelmsg aalesund’ |

‘larsgaarden post aalesund’ ‘favoritten as larsgaardsv aalesund’ |

‘vinmonopolet stormoa aalesund’ |

Concatenated Textual Descriptions |
---|

‘hole kjoett as k wilhelmsg aalesund larsgaarden post aalesund favoritten as larsgaardsv aalesund vinmonopolet stormoa aalesund’ |

Word | Embedding Vector |
---|---|

Sun | [1.2, −0.1, 4.3] |

Earth | [2.1, 0.3, 0.1] |

Water | [0.4, 2.5, −0.9] |

Model | Training Data | Validation Data | Test Data |
---|---|---|---|

Deep text classification model (trained from scratch) | 85.6% | 84.7% | 90.9% |

BERT transfer learning model | 81.5% | 79.2% | 82.5% |

Model | Training Data | Validation Data | Test Data |
---|---|---|---|

Deep text classification model (trained from scratch) | 0.129 | 0.061 | 0.015 |

BERT transfer learning model | 0.154 | 0.067 | 0.023 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Hjelkrem, L.O.; Lange, P.E.d.
Explaining Deep Learning Models for Credit Scoring with SHAP: A Case Study Using Open Banking Data. *J. Risk Financial Manag.* **2023**, *16*, 221.
https://doi.org/10.3390/jrfm16040221

**AMA Style**

Hjelkrem LO, Lange PEd.
Explaining Deep Learning Models for Credit Scoring with SHAP: A Case Study Using Open Banking Data. *Journal of Risk and Financial Management*. 2023; 16(4):221.
https://doi.org/10.3390/jrfm16040221

**Chicago/Turabian Style**

Hjelkrem, Lars Ole, and Petter Eilif de Lange.
2023. "Explaining Deep Learning Models for Credit Scoring with SHAP: A Case Study Using Open Banking Data" *Journal of Risk and Financial Management* 16, no. 4: 221.
https://doi.org/10.3390/jrfm16040221