Comparing the Performance of Corporate Bankruptcy Prediction Models Based on Imbalanced Financial Data
Abstract
:1. Introduction
 ■
 Research question 1: How can we derive data sampling methods that improve the performance of corporate bankruptcy prediction models for imbalanced corporate financial information?
 ■
 Research question 2: How can we derive an optimal threshold technique that improve AUC performance even when considering the imbalanced corporate financial information?
2. Data and Methods
2.1. Data and Sampling
2.1.1. Data
2.1.2. Sampling
2.2. Models and Performance Measures
LSTM model pseudocode 

2.2.1. LSTM Model
2.2.2. Logistic Regression Model
2.2.3. kNearest Neighbor (kNN) Model
2.2.4. Decision Tree Model
2.2.5. Random Forest Model
2.2.6. Performance Measure
3. Performance Analysis Results
4. Conclusions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Category  Section  Feature 

Financial Statements  Balance Sheet (1000 won)  Accumulations 
Retained Earnings  
Net assets of controlling shareholders (before capital stock reduction)  
Owners of Parent Equity  
Total Equity  
Comprehensive Income Statement (1000 won)  Earnings before tax  
(Total Comprehensive Income Attributable to) Owners of Parent Equity  
Total Comprehensive Income  
Cash Flow Statement (1000 won)  Cash Flow  
Financial Ratio  Stability (%)  Intangible Asset Ratio 
Equity Capital Ratio  
Borrowings and Bonds Payable Ratio  
Borrowed Capital Ratio  
Cash Flow/Total Debt  
Cash Flow/Total Equity  
Cash Flow/Total Asset  
Growth (yearly) (%)  Total Asset Growth Rate  
Profitability (%)  Operating Revenue/Operating Expense  
Profit Margin Ratio  
ROA (Current Net Income)  
ROA (Earnings before tax)  
ROA (Operating Profit)  
ROA (Total Comprehensive Income)  
ROE (Current Net Income)  
ROE (Earnings before tax)  
ROE (Operating Profit)  
ROE (Net profit of controlling shareholders)  
Activity (times)  Total Debt Turnover  
Total Asset Turnover 
References
 Oh, W.S.; Kim, J.H. Forecasting corporate bankruptcy with artificial intelligence. J. Ind. Converg. 2017, 15, 17–32. [Google Scholar]
 Cha, S.; Kang, J. Corporate default prediction model using deep learning time series algorithm, RNN and LSTM. J. Intell. Inf. Syst. 2018, 24, 1–32. [Google Scholar]
 Barboza, F.; Kimura, H.; Altman, E. Machine learning models and bankruptcy prediction. Expert Syst. Appl. 2017, 83, 405–417. [Google Scholar] [CrossRef]
 Falavigna, G. Financial ratings with scarce information: A neural network approach. Expert Syst. Appl. 2012, 39, 1784–1792. [Google Scholar] [CrossRef]
 Hinton, G.E.; Osindero, S.; The, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed]
 Jang, Y.; Jeong, I.; Cho, Y.; Ahn, H. Business Failure Prediction with LSTM RNN in the Construction Industry. In Proceedings of the ASCE 2019 International Conference on Computing in Civil Engineering, Atlanta, GA, USA, 17–19 June 2019; pp. 1–8. [Google Scholar]
 Kim, H.; Cho, H.; Ryu, D. Corporate bankruptcy prediction using machine learning methodologies with a focus on sequential data. Comput. Econ. 2022, 59, 1231–1249. [Google Scholar] [CrossRef]
 Odom, M.D.; Sharda, R. A neural network model for bankruptcy prediction. In Proceedings of the 1990 IJCNN International Joint Conference on Neural Networks, San Diego, CA, USA, 17–21 June 1990; pp. 163–168. [Google Scholar]
 Wilson, R.L.; Sharda, R. Bankruptcy prediction using neural networks. Decis. Support. Syst. 1994, 11, 545–557. [Google Scholar] [CrossRef]
 Kim, H.; Cho, H.; Ryu, D. Corporate default predictions using machine learning: Literature review. Sustainability 2021, 12, 6325. [Google Scholar] [CrossRef]
 Brygata, M. Consumer Bankruptcy Prediction Using Balanced and Imbalanced Data. Risks 2022, 10, 24. [Google Scholar] [CrossRef]
 Zhou, L. Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods. Knowl. Based Syst. 2013, 41, 16–25. [Google Scholar] [CrossRef]
 Garcia, V.; Jose, S.S.; Ramon, A.M. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl. Based Syst. 2012, 25, 13–21. [Google Scholar] [CrossRef]
 Syed, N.; Sharifah, H.; Shafinar, I.; Bee, W.Y. Personal bankruptcy prediction using decision tree model. J. Econ. Financ. Adm. Sci. 2019, 24, 157–170. [Google Scholar] [CrossRef]
 Amidon, A. PyOD: A Unified Python Library for Anomaly Detection. 11 May 2021. Available online: https://towardsdatascience.com/pyodaunifiedpythonlibraryforanomalydetection3608ec1fe321 (accessed on 15 January 2023).
 Mishra, S.; Kshisagar, V.; Dwivedula, R.; Hota, C. AttentionBased BiLSTM for Anomaly Detection on TimeSeries Data. In Proceedings of the 2021 ICANN International Conference on Artificial Neural Networks, Online, 14–17 September 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 129–140. [Google Scholar]
 Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. In Proceedings of the 2021 NeurlPS 35th Conference on Neural Information Processing Systems, online, 6–14 December 2021; pp. 1–14. [Google Scholar]
 Richman, J.S.; Moorman, J.R. Physiogical timeseries analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart Circ. Physiol. 2000, 278, H2039–H2049. [Google Scholar] [CrossRef] [PubMed] [Green Version]
 Noh, S.H. Analysis of gradient vanishing of RNNs and performance comparison. Information 2021, 12, 442. [Google Scholar] [CrossRef]
 Jagannath, V. Random Forest Template Tibco Spotfirer Wiki Page. 24 March 2017. Available online: https://community.tibco.com/wiki/randomforesttemplatetibcospotfirerwikipage (accessed on 15 January 2023).
Financial Information  tTest pValue 

Total assets  2.33 × 10^{−5} 
Parent company equity holder  3.59 × 10^{−4} 
Intangible assets ratio  4.84 × 10^{−2} 
Equity capital ratio  1.48 × 10^{−20} 
Debt ratio  9.63 × 10^{−6} 
Cash flows/total liabilities  2.23 × 10^{−8} 
Total assets growth rate  2.88 × 10^{−14} 
Operating revenue/operating expense  6.71 × 10^{−13} 
Gross margin  2.45 × 10^{−4} 
ROA (income from continuing operations before tax)  1.79 × 10^{−9} 
ROA (operating profit)  2.06 × 10^{−10} 
ROE (income from continuing operations before tax)  2.89 × 10^{−5} 
ROE (operating profit)  2.52 × 10^{−2} 
Total debt turnover ratio  5.68 × 10^{−27} 
LSTM input data: Company (Financial information in 2012, …, 2020) 
LSTM label data: Company (Bankruptcy or nonbankruptcy): (1)/(0) 
LR, kNN, DT, RF input data: Company (Financial information in 2012) ⋮ (Financial information in 2020) 
LR, kNN, DT, RF label data: Company (Bankruptcy or nonbankruptcy in 2013) ⋮ (Bankruptcy or nonbankruptcy in 2021) 
Prediction outcome  
Nonbankrupt company  Bankrupt company  
Actual outcome  Nonbankrupt company  True positive (TP)  False negative (FN) 
Bankrupt company  False positive (FP)  True negative (TN) 
Model  Confusion Matrix  Accuracy  NonBankruptcy Precision  NonBankruptcy Recall  Bankruptcy Precision  Bankruptcy Recall  NonBankruptcy F1 Score  Bankruptcy F1 Score  AUC 
LR  $\left[\begin{array}{cc}183& 23\\ 49& 15\end{array}\right]$  0.7333  0.7888  0.8883  0.3947  0.2344  0.8356  0.2941  0.7457 
kNN  $\left[\begin{array}{cc}183& 23\\ 53& 11\end{array}\right]$  0.7185  0.7754  0.8883  0.3235  0.1719  0.8281  0.2245  0.8676 
DT  $\left[\begin{array}{cc}155& 51\\ 42& 22\end{array}\right]$  0.6556  0.7868  0.7524  0.3014  0.3438  0.7692  0.3212  0.6587 
RF  $\left[\begin{array}{cc}178& 28\\ 49& 15\end{array}\right]$  0.7148  0.7841  0.8641  0.3488  0.2344  0.8222  0.2804  0.8095 
(a) Performance results of the LR, kNN, DT, and RF for the small random sample  
Model  Confusion Matrix  Accuracy  NonBankruptcy Precision  NonBankruptcy Recall  Bankruptcy Precision  Bankruptcy Recall  NonBankruptcy F1 Score  Bankruptcy F1 Score  AUC 
LR  $\left[\begin{array}{cc}187& 9\\ 58& 16\end{array}\right]$  0.7519  0.7633  0.9541  0.64  0.2162  0.8481  0.3232  0.7929 
kNN  $\left[\begin{array}{cc}187& 9\\ 53& 21\end{array}\right]$  0.7704  0.7792  0.9541  0.7  0.2838  0.8578  0.4038  0.8076 
DT  $\left[\begin{array}{cc}168& 28\\ 41& 33\end{array}\right]$  0.7444  0.8038  0.8571  0.5410  0.4459  0.8296  0.4889  0.5979 
RF  $\left[\begin{array}{cc}182& 14\\ 43& 31\end{array}\right]$  0.7889  0.8089  0.9286  0.6889  0.4189  0.8646  0.5210  0.8326 
(b) Performance results: LR, kNN, DT, and RF for the small dataset sampled using approximate entropy  
Model  Confusion Matrix  Accuracy  NonBankruptcy Precision  NonBankruptcy Recall  Bankruptcy Precision  Bankruptcy Recall  NonBankruptcy F1 Score  Bankruptcy F1 Score  AUC_1 
AUC_2  
LR  $\left[\begin{array}{cc}4514& 2\\ 74& 0\end{array}\right]$  0.9834  0.9839  0.9995  0  0  0.9916  Not defined  0.7906 
0.9998  
kNN  $\left[\begin{array}{cc}4503& 13\\ 74& 0\end{array}\right]$  0.9810  0.9839  0.9971  0  0  0.9904  Not defined  0.6500 
0.9998  
DT  $\left[\begin{array}{cc}4398& 118\\ 68& 6\end{array}\right]$  0.9595  0.9848  0.9739  0.0484  0.0811  0.9793  0.0606  0.5477 
0.9998  
RF  $\left[\begin{array}{cc}4514& 2\\ 74& 0\end{array}\right]$  0.9834  0.9839  0.9996  0  0  0.9917  Not defined  0.7389 
0.9998  
(c) Performance results: LR, kNN, DT, and RF for the total data, where AUC_1 = AUC at the threshold of 0.5 and AUC_2 = AUC using the optimal threshold.  
Accuracy  Precision  Recall  F1 Score  AUC  
Logistic  0.7570  0.0001  0.2174  0.0002  0.4872  
SVM  0.7236  0.0002  0.3913  0.0003  0.5575  
RF  0.9899  0.0023  0.2174  0.0045  0.6037  
RNN  0.9789  0.0024  0.4783  0.0048  0.7286  
LSTM  0.9936  0.0058  0.3478  0.0114  0.6707  
Ensemble  0.9826  0.0029  0.4783  0.0058  0.7305  
(d) Performance results: Bankruptcy Forecasting Performance by Methodology [7] 
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. 
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Noh, S.H. Comparing the Performance of Corporate Bankruptcy Prediction Models Based on Imbalanced Financial Data. Sustainability 2023, 15, 4794. https://doi.org/10.3390/su15064794
Noh SH. Comparing the Performance of Corporate Bankruptcy Prediction Models Based on Imbalanced Financial Data. Sustainability. 2023; 15(6):4794. https://doi.org/10.3390/su15064794
Chicago/Turabian StyleNoh, SeolHyun. 2023. "Comparing the Performance of Corporate Bankruptcy Prediction Models Based on Imbalanced Financial Data" Sustainability 15, no. 6: 4794. https://doi.org/10.3390/su15064794