Next Article in Journal
Does Working Capital Management Influence Operating and Market Risk of Firms?
Previous Article in Journal
ESG-Washing in the Mutual Funds Industry? From Information Asymmetry to Regulation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Optimal Model of Financial Distress Prediction: A Comparative Study between Neural Networks and Logistic Regression

by
Youssef Zizi
1,*,
Amine Jamali-Alaoui
2,
Badreddine El Goumi
3,
Mohamed Oudgou
4 and
Abdeslam El Moudden
1
1
Laboratory of Research in Organizational Management Sciences, ENCG Kenitra, Ibn Tofail University, Kenitra 14000, Morocco
2
Faculty of Science and Technology, Sidi Mohammed Ben Abdellah University, Fez 3000, Morocco
3
INSA EUROMED, University EUROMED of Fez, Fez 3000, Morocco
4
ENCG Béni Mellal, University Sultane Moulay Slimane, Béni Mellal 23000, Morocco
*
Author to whom correspondence should be addressed.
Risks 2021, 9(11), 200; https://doi.org/10.3390/risks9110200
Submission received: 25 September 2021 / Revised: 23 October 2021 / Accepted: 1 November 2021 / Published: 8 November 2021

Abstract

:
In the face of rising defaults and limited studies on the prediction of financial distress in Morocco, this article aims to determine the most relevant predictors of financial distress and identify its optimal prediction models in a normal Moroccan economic context over two years. To achieve these objectives, logistic regression and neural networks are used based on financial ratios selected by lasso and stepwise techniques. Our empirical results highlight the significant role of predictors, namely interest to sales and return on assets in predicting financial distress. The results show that logistic regression models obtained by stepwise selection outperform the other models with an overall accuracy of 93.33% two years before financial distress and 95.00% one year prior to financial distress. Results also show that our models classify distressed SMEs better than healthy SMEs with type I errors lower than type II errors.

1. Introduction

Work on financial distress is a topical issue that has attracted the attention of researchers for several decades. Financial distress occurs when a company’s current assets can no longer meet its current liabilities (Malécot 1981). The process of financial distress is continuous and dynamic, lasting from a few months to several years, and can ultimately lead to bankruptcy (Sun et al. 2014).
Financial distress can have devastating effects on the company itself and all of its stakeholders (Hafiz et al. 2015). Financial distress prediction studies help companies detect financial difficulties earlier, understand the process of financial distress, and prevent the occurrence of bankruptcy (Crutzen and Van Caillie 2007).
Since the Z-score model proposed by Altman (1968), a great deal of research has focused on the prediction of corporate financial distress using different prediction models. However, most models used are either statistical or based on artificial intelligence (Balcaen and Ooghe 2006). In general, the objective of these predictive tools is to use financial ratios to differentiate between non-distressed and distressed firms and build an explanatory model of business failure (Refait-Alexandre 2004).
The ability of financial ratios to detect early warning signals of business failure has been highlighted by several empirical studies (Bellovary et al. 2007; Altman et al. 2017; Mselmi et al. 2017; Svabova et al. 2020; Kliestik et al. 2020). Predictors of business failure can be classified into two broad categories, namely ratios related to the firm’s ability to generate profits (profitability ratios) and those associated with the firm’s ability to meet its short-, medium-, or long-term obligations (liquidity and solvency ratios) (Back et al. 1996; Bunn and Redwood 2003; Sharifabadi et al. 2017; Lukason and Laitinen 2019; Valaskova et al. 2018; Kamaluddin et al. 2019).
Even though the study of SMEs is intriguing because their management style is generally focused on the short term and reaction rather than forecasting, applying predictive techniques to SMEs is difficult compared to large firms because of obstacles related to the lack of available data (Van Caillie 1993; Psillaki 1995; Bellanca et al. 2015). In Morocco, business failure is a present phenomenon with an evolution of 244% between 2009–2019. Despite the preponderant weight of Very Small and Medium-sized Businesses (VSMB) in the Moroccan economic fabric, they are the most affected by business failure by 99.7%.The first cause of mortality of VSMB is the long payment delays (Inforisk 2020; Haut-Commissariat au Plan 2019).
Nevertheless, studies on predicting SMEs’ business failure in the Moroccan context are limited. There is a need to determine the relevant ratios of financial distress as well as the development of its prediction models in the Moroccan regions. The development of predictive models of financial distress under unique regional and national conditions allows for a better estimation of financial risks since the accuracy and reliability of these models can vary if they are used in a different context than the one in which they were originally developed. Indeed, empirical works conducted on a single region or country play a crucial role in predicting financial distress (Gregova et al. 2020).
This article aims to determine the most relevant predictors of financial distress and identify its optimal prediction models in Morocco. In particular, this study is conducted to answer the following questions: What are the most relevant ratios of financial distress? Consequently, what are the optimal prediction models of financial distress?
To do so, we use logistic regression and neural networks to develop financial distress prediction models based on financial ratios selected by LASSO (Least Absolute Shrinkage and Selection Operator) and stepwise techniques. The models are built on a sample of 180 SMEs during 2017–2018 including 123 healthy SMEs and 57 distressed SMEs. To address the problem of unbalanced data, we use SMOTE (Synthetic Minority Over-sampling Technique). Our study focuses on the Fez-Meknes region, one of the 12 Moroccan regions. This region is characterized by a high concentration of companies operating in the construction sector (11.2% of companies in the sector are located in this region). It contributes 8.4% of the national Gross Domestic Product (GDP) and is ranked second in terms of contribution to the primary sector by 14.5% (Haut-Commissariat au Plan 2018, 2019).
Our contributions to the literature can be listed as follows. First, the estimation results of our models identify predictors that have a significant impact on financial distress, namely interest to sales and return on assets. Second, to the best of our knowledge, no study has ever attempted to apply the lasso technique in the selection of financial distress discriminant variables in Morocco. Indeed, the findings reveal that the lasso technique performs better with neural networks than logistic regression. Third, the results show that logistic regression is a powerful and robust tool for Moroccan SMEs’ financial distress prediction. Finally, our models classify distressed SMEs better than healthy SMEs with type I errors lower than type II errors and can be effective to Moroccan creditors.
The rest of the article is as follows. Section 2 consists of a literature review on the prediction of business failure as well as the main works that have used neural networks and logistic regression to predict business failure. Section 3 presents the data collection, the variables considered, the methodology used for feature selection and model construction, and the performance metrics used to evaluate our models. Section 4 presents the empirical results. Finally, Section 5 and Section 6 are dedicated to present the discussion and conclusions, respectively.

2. Literature Review

Over the past five decades, numerous studies on the prediction of corporate financial distress have been developed. In the early research of business failure prediction, Beaver (1966) proposed a one-dimensional dichotomous classification based on a single ratio. This method was rarely exploited afterward because of the lack of robustness linked to the uniqueness of the ratio used (Deakin 1972; Gebhardt 1980).
Through multiple discriminant analysis, Altman (1968) was the first to use several ratios simultaneously to predict the failure of firms. The author developed a Z-score model, a linear combination of the selected ratios, which makes it possible to assign the firm to the group to which it is closest (failing firms or non-failing firms). From a sample of 66 firms, the author retained 5 ratios out of 22 potential ratios to construct the Z-score function, namely working capital to total assets, retained earnings to total assets, earnings before interest and taxes to total assets, market value equity to book value of total debt, and sales to total assets. However, multiple discriminant analysis requires statistical conditions that are generally not satisfied in financial data. The explanatory variables must follow a normal distribution and their variance–covariance matrices must be identical for the sample of non-failing firms as for the sample of failing firms. Furthermore, the Z-score model is suitable only for linear classification. Faced with the statistical conditions required by multiple discriminant analysis, which are rarely respected in the empirical part, several statistical models have been developed that assume a different distribution of the explanatory ratios, particularly the widely used logistic regression. Logistic regression is a probabilistic method used to treat two-class classification problems such as the prediction of business failure. In the United States, Ohlson (1980) was the first to use logistic regression to predict business failure. After that, logistic regression has gained popularity and it is considered one of the most used methods in predicting business failure worldwide (Shi and Li 2019). Amor et al. (2009) developed a logistic regression model to anticipate the financial difficulties of Quebec SMEs known for their particularities. Based on solvency, liquidity, and profitability ratios, the model achieved an accuracy of 63.63% two years prior to default and 72.84% one year prior to default. Charalambakis and Garrett (2019) employed a multi-period logit model on a sample of 31.000 Greek private firms between 2003 and 2011. The model classified 88% of firms that went bankrupt during the Greek debt crisis as likely to fail. The results showed that the model retains its predictive ability over different time horizons.
In Morocco, Kherrazi and Ahsina (2016) used a binomial logistic regression model to identify the determinants of SMEs failure in the Gharb-Chrarda-Beni-Hssen region. The results of the model showed that the failure of SMEs in the region is related to the lack of commercial profitability and the lack of permanent funds. On a sample of 2.032 borrowing SMEs and large firms, Khlifa (2017) built a logistic regression model to predict the risk of default of Moroccan firms. The model yielded a classification rate of 88.2% over two years.
Several studies have shown that logistic regression models provide better accuracy than multiple discriminant analysis. In a sample of U.S. banks, Iturriaga and Sanz (2015) obtained 81.73% accuracy by logistic regression one year prior to bankruptcy versus 77.88% for discriminant analysis. This finding is confirmed by Du Jardin (2015) and Affes and Hentati-Kaffel (2019), the authors showed that logistic regression outperforms multiple discriminant analysis in terms of prediction accuracy.
Given the advancement of computer technology and the dynamism and complexity of real-world financial problems, machine learning techniques have been used for the prediction of corporate failure, including Artificial Neural Network (ANN).
The principle of neural networks is to develop an algorithm that replicates the functioning of the human brain in the information processing process. The use of neural networks in the field of business failure prediction was introduced by Odom and Sharda (1990). Subsequently, the neural network models have been prosperously used by several authors to predict business failure since they are characterized by nonlinear and nonparametric adaptive learning properties. During the last three decades, neural networks have shown promising results in terms of predicting business failure and they can be considered as one of the machine learning techniques with the highest predictive capability (Jeong et al. 2012).
Based on a matched sample of 220 U.S. firms, Zhang et al. (1999) found that neural networks outperform logistic regression models in terms of classification rate estimation. Chen and Du (2009) used neural networks on 68 companies listed on the Taiwan Stock Exchange Corporation (TSEC) with 37 ratios. The results indicated that neural networks are a suitable technique for predicting corporate financial distress with an accuracy of 82.14% two seasons before financial distress. Paule-Vianez et al. (2020) used a hidden layer artificial neural networks model to predict financial distress in Spain. The authors obtained an accuracy of more than 97% on a sample of 148 Spanish credit institutions and demonstrated that neural networks have a better prognostic capacity than multivariate discriminant analysis. In a large-scale study, Altman et al. (2020) compared the performance of five failure prediction methods, namely logistic regression, neural networks with multi-layer perceptron, support vector machine, decision tree, and gradient boosting. The results showed that neural networks and logistic regression outperform other techniques in terms of efficiency and accuracy in an open European economic zone. In order to identify the best financial distress prediction model for Slovakian industrial firms, Gregova et al. (2020) confirmed the superiority of neural networks over other techniques, namely random forest and logistic regression. Despite the good performances of the last two techniques, neural networks yield better results for all metrics combined.
Machine learning techniques can give better performance in classifying companies as failing or non-failing compared to statistical methods. For this reason, new studies should be directed to apply these classification techniques in predicting financial distress (Jones et al. 2017). However, statistical techniques for predicting business failure are still used worldwide and are comparable to machine learning techniques in terms of accuracy and predictive performance. Indeed, each classification method has its advantages and disadvantages and the performance of the financial distress prediction models depends on the particularities of each country, the methodology, and the variables used to build these models (Kovacova et al. 2019). Given the reliability and predictive accuracy of logistic regression and neural networks in different contexts, we use these techniques to predict the financial distress of Moroccan SMEs.

3. Methodology

3.1. Data Collection

Before predicting corporate financial distress, we need first to define when financial distress occurs and which firms enter financial distress. A firm is considered to be in financial distress if it is unable to meet a credit deadline after 90 days from the due date (Circular n° 19/G/2002 of Bank Al-Maghrib 2002).
Using this definition, we contacted the major banks in the Fez-Meknes region to obtain the financial statements of SMEs.1 Constrained by the availability of information, we selected an initial sample of 218 SMEs. A total of 38 SMEs were eliminated for the following reasons: Young firms less than three years old, absence of financial statements for at least two consecutive years, lack of business continuity, and firms with specific characteristics such as financial and agricultural firms. Thus, the final sample includes 180 SMEs including 123 non-distressed SMEs and 57 distressed SMEs. The financial distress occurred in 2019 and the data used in the study correspond to the financial statements of the year 2017 and 2018. Our final sample covers the following sectors: Trade (45.55%), construction (42.23%), and industry (12.22%).

3.2. Data Balancing

When collecting data, an unbalanced classification problem can be encountered. This can lead to inefficiency in the prediction models. To avoid this problem, we can use one of the methods to deal with unbalanced data such as the oversampling method or the undersampling method.
In this article, we use the oversampling method. This method is a resampling technique, which works by increasing the number of observations of minority class(es) in order to achieve a satisfactory ratio of minority class to majority class.
To generate synthetic samples automatically, we use the SMOTE (Synthetic Minority Over-sampling Technique) algorithm. This technique works by creating synthetic samples from the minority class instead of creating simple copies. For more details on the SMOTE algorithm, we refer the reader to Chawla et al. (2002).
As shown in Table 1, we obtain by the SMOTE algorithm on data the following results:

3.3. Training-Test Set Split

We divide the sample into two sub-samples, the first called training sample (in this paper, we take 75% of the sample for training) and the second called validation or test sample (25% of the sample). The prediction models that we present next are built on the training sample and validated on the test sample.

3.4. Variable Analysis

Financial distress as defined in the previous subsection is the variable to be explained in the study. It is a qualitative, dichotomous, and binary variable. In this paper, it takes the value of 1 when the SME is in arrears of more than 90 days. Thus, it is considered to be in a distressed situation. Otherwise, it takes the value of 0 when the SME is not in arrears or is in arrears for less than 90 days and is considered normal.
The selection of financial ratios as initial features for predicting financial distress is based on their predictive and discriminative ability between non-distressed and distressed firms in previous works (Jabeur 2017; Kliestik et al. 2020; Mselmi et al. 2017; Kovacova et al. 2019; Kisman and Krisand 2019; Valaskova et al. 2018; Zizi et al. 2020).
As shown in Table 2, the explanatory variables are divided into four categories: Liquidity, solvency and capital structure, profitability, and management. The management ratios are used to take into account the long customer and supplier payment delays that characterize the context of the study (Inforisk 2020).

3.5. Stepwise and Lasso Selection Techniques

In applied studies, many variables can lead to greater variance in the performance of the predictive models and decrease their accuracy. Eliminating redundant and insignificant variables prevents models from underfitting or overfitting. Therefore, it is necessary to look for the best embedded model composed only of the most pertinent variables that explain well the endogenous variable (output variable).
In empirical studies, selection techniques based on Wald or likelihood ratio (LR) are tedious and sometimes impossible to apply. For this reason, it is better to use numerical selection techniques such as stepwise logistic regression selection, or regularization techniques based on cross-validation to obtain the most pertinent variables that well explain the endogenous variable.
In this paper, we use two selection techniques: Stepwise logistic regression selection and lasso logistic regression selection.

3.5.1. Stepwise Logistic Regression Selection

In step-by-step numerical selection techniques, we evaluate successions of embedded models, by adding them as they are added → FORWARD, or by removing them as they are removed → BACKWARD.
The stepwise selection technique consists of alternating between FORWARD and BACKWARD, i.e., checking that each addition of a variable does not cause the removal of another variable. The principle of the stepwise method is to minimize one of the following criteria:
  • Akaike Information Criterion (AIC):
    A I C = 2 ln ( L ) + 2 ( K + 1 )
  • Bayesian Information Criterion (BIC):
    B I C = 2 ln ( L ) + ( K + 1 ) ln ( n )
where:
  • L is the likelihood of the logit model;
  • K is the number of variables in the model;
  • n is the number of observations.
The stopping criterion: The addition or removal of a variable does not improve the criterion used anymore.
In our article, we use the BIC criterion for selection, as it penalizes complexity more; therefore, this criterion selects fewer variables.

3.5.2. Lasso Logistic Regression Selection

Least Absolute Shrinkage and Selection Operator (LASSO) is a method for the reduction in regression coefficients. It has been extended to many statistical models such as generalized linear models, M-estimators, and proportional risk models.
The lasso method has the advantage of a parsimonious and consistent selection. It selects a restricted subset of variables that allows a better interpretation of a model. Thus, the selected subset of variables is used for the prediction.
Formal presentation:
Let x i = ( x i , 1 , x i , 2 , , x i , p ) T be a vector containing the explanatory variables associated to individual i, y i the associated response, and β = { β 1 , β 2 , , β p } the coefficients to be estimated. We note by X the matrix containing the individuals in a row, X i , . = x i T and y = ( y 1 , y 2 , , y n ) .
The log-likelihood associated to the lasso logistic regression is defined as:
L n ( y , X , β 0 , β ) = i = 1 n y i ( β 0 + X i , . β ) ln ( 1 + β 0 + X i , . β )
Considering centered variables, the lasso is generally written in vector form by the following minimization problem:
arg min ( β 0 , β ) R × R p L n ( y , X , β 0 , β ) + λ i = 1 n | β i |
where λ is the penalty coefficient.
To select the best variables explaining the endogenous variable and to choose a minimum penalty coefficient λ , k-folds cross-validation is used.

3.6. Prediction Models

3.6.1. Logistic Regression Model

Logistic regression or logit model is a binomial regression model from the family of generalized linear models. It is widely used in many fields. For example, it is used to detect risk groups when taking out credit in banking. In econometrics, the model is used to explain a discrete variable. While in medicine, it is used to find the factors characterizing a group of sick subjects compared to healthy subjects.
Let Y be the variable to be predicted (Variable to be explained) and X = ( X 1 , X 2 , , X J ) the predictors (explanatory variables).
In the framework of binary logistic regression, the variable Y takes two possible modes { 1 , 0 } . The variables X j are exclusively continuous or binary.
Let Ω be a set of n samples, comprising n 1 (resp. n 0 ) observations corresponding to the 1 (resp. 0) mode of Y.
  • P ( Y = 1 ) (resp. P ( Y = 0 ) ) is the a priori probability that Y = 1 (resp. Y = 0 ). For simplicity, this is hereafter denoted as p ( 1 ) (resp. p ( 0 ) ).
  • p ( X | 1 ) (resp. p ( X | 0 ) ) is the conditional distribution of X knowing the value taken by Y. The a posteriori probability of obtaining the modality 1 of Y (resp. 0) knowing the value taken by X is noted p ( 1 | X ) (resp. p ( 0 | X ) ).
The logit term for p ( 1 | X ) is given by the following expression:
ln p ( 1 | X ) 1 p ( 1 | X ) = β 0 + i = 1 J β i X i
The equation above is a “regression”, as it reflects a dependency relationship between the variable to be explained and a set of explanatory variables.
This regression is “logistic” because the probability distribution is modeled from a logistic distribution. Indeed, after converting the above equation, we find:
p ( 1 | X ) = e β 0 + i = 1 J β i X i 1 + e β 0 + i = 1 J β i X i

3.6.2. Neural Networks Model: Multi-Layer Perceptron

An artificial neural network is a system whose concept was originally schematically inspired by the functioning of biological neurons. It is a set of interconnected formal neurons allowing the solving of complex problems such as pattern recognition or natural language processing owing to the adjustment of weighting coefficients in a learning phase.
The formal neuron is a model that is characterized by an internal state s S , input signals X = ( X 1 , X 2 , X J ) T , and an activation function:
s = h ( X 1 , X 2 , X J ) = g ( α 0 + i = 1 J α i X i )
The activation function performs a transformation of an affine combination of input signals α 0 (a constant term that is called the bias of the neuron). This affine combination is determined by a vector of weights [ α 0 , α 1 , , α J ] associated with each neuron and which values are estimated in the learning phase. These elements constitute the memory or distributed knowledge of the network.
The different types of neurons are distinguished by the nature of their activation function g. The main types are linear, threshold, sigmoid, ReLU, softmax, stochastic, radial, etc.
In this article, we use the sigmoid activation function that is given by:
g ( x ) = 1 1 + e x
The advantage of using sigmoid is that it works well for learning algorithms involving gradient back-propagation because their activation function is differentiable.
For supervised learning, we focus in this paper on an elementary network structure, the so-called static one without feedback loops.
The multilayer perceptron (MLP) is a network composed of successive layers. A layer is a set of neurons with no connection between them. An input layer reads the incoming signals, one neuron per input X i . An output layer provides the system response.
One or more hidden layers participate in the transfer. In a perceptron, a neuron in a hidden layer is connected as an input to each neuron in the previous layer and as an output to each neuron in the next layer. Therefore, a multi-layer perceptron realizes a transformation of input variables:
Y = f ( X 1 , X 2 , X J , α )
where α is the vector containing each parameter α j k l of the jth input and of the kth neuron in the lth layer; the input layer ( l = 0 ) is not parameterized and it only distributes the inputs to all the neurons of the layer.
In regression with a single hidden layer perceptron of q neurons and an output neuron, this function is written:
Y = f ( X 1 , X 2 , X J , β , α ) = β 0 + β T z
where:
z k = g ( α 0 k + α k T X ) ; k = 1 , . , q
Let us assume that we have a database with n observations ( X 1 i , , X J i , Y i ) ( i = 1 , , n ) of the explanatory variables X 1 i , , X J i , Y i and the variable to be provided Y.
Considering the simplest case of regression with a network consisting of a linear output neuron and a layer of q neurons which parameters are optimized by least squares.
Learning is the estimation of the parameters α j = 0 , J ; k = 1 , q and β k = 0 , q by minimization of the quadratic loss function or that of an entropy function in classification:
Q ( α , β ) = Σ i = 1 n Q i = Σ i = 1 n [ Y i f ( X , α , β ) ] 2
Error back-propagation:
Back-propagation aims to evaluate the derivative of the cost function at an observation and with respect to the various parameters.
Let z k = g ( α 0 k + α k T X ) and z i = ( z 1 i , z 2 i , , z q i ) . The partial derivatives of the quadratic loss function are written:
Q i β k = 2 ( y i φ ( x i ) ) ( β T z i ) z k i = δ i z k i
Q i α k i = 2 ( y i φ ( x i ) ) ( β T z i ) β k g ( α k T X i ) X i J = s k i X i J
The terms δ i and s k i are the error terms of the current model at the output and on each hidden neuron, respectively. These error terms verify the so-called back-propagation equations:
s k i = β k g ( α k T X i ) δ i
These terms are evaluated in two passes. A forward pass with the current values of the weights: The application of the different inputs x i to network allows us to determine the fitted values f ^ ( x i ) . The return pass then determines the δ i that are back-propagated in order to calculate the s k i and thus obtain the gradient evaluations.
Optimization algorithms:
To evaluate the gradients, different algorithms are implemented. The most elementary one is an iterative use of a gradient: At any point in the parameter space, the gradient vector of Q points in a direction of increasing error. To make Q decrease, it is sufficient to move in the opposite direction.
This is an iterative algorithm modifying the weights of each neuron according to:
β k r + 1 = β k r τ Σ i = 1 n Q i β k r
α k J r + 1 = α k J r τ Σ i = 1 n Q i α k J r
The proportionality coefficient τ is called the learning rate. It can be fixed (determined by the user) or variable (according to certain heuristics). It seems intuitively reasonable that this rate, high at the beginning to go faster, decreases to achieve a finer adjustment as the system approaches a solution. For more details on machine learning techniques, we refer to Friedman et al. (2017).

3.7. Metrics

In this paper, the performance of prediction models is measured by the common evaluation metrics of machine learning, namely confusion matrix, accuracy, precision, sensitivity, specificity, F1-score, and Area Under the Curve (AUC).
Confusion matrix: It represents the basis for calculating the performance of the prediction models. Each column of the table indicates the instances of the predicted class and each row indicates the instances of a real class, or vice versa.
Accuracy: It measures the percentage of cases correctly classified.
A c c u r a c y = T r u e P o s i t i v e + T r u e N e g a t i v e T r u e P o s i t i v e + T r u e N e g a t i v e + F a l s e P o s i t i v e + F a l s e N e g a t i v e
Precision (also known as Positive Predictive Value): It is the percentage of positive cases classified.
P r e c i s i o n = T r u e P o s i t i v e T r u e P o s i t i v e + F a l s e P o s i t i v e
Sensitivity: It can also be referred to as Recall, True Positive Rate, or Hit Rate. It measures the ability of a model to identify true positives.
S e n s i t i v i t y = T r u e P o s i t i v e T r u e P o s i t i v e + F a l s e N e g a t i v e
Specificity (also known as True Negative Rate): It is the proportion of true negative cases to the total number of negative cases.
S p e c i f i c i t y = T r u e N e g a t i v e T r u e N e g a t i v e + F a l s e P o s i t i v e
F1-score: It is the harmonic mean of recall and precision. It is calculated as follows:
F 1 s c o r e = 2 × p r e c i s i o n × r e c a l l p r e c i s i o n + r e c a l l
Area Under the Curve (AUC): It is a measure introduced to characterize the ROC curve2 numerically. The closer the area value is to 1, the better the discrimination quality of the model. (Long and Freese 2006).

4. Results

In this section, we present the main results obtained by the R 4.0.5 software.

4.1. Feature Selection Results

Table 3 shows the ratios selected by the stepwise and lasso techniques. The stepwise logistic technique is based on minimizing the BIC criterion to select the relevant variables. While the lasso logistic technique is based on the optimal choice of the penalty coefficient to select the relevant ratios. In our case, the optimal BIC value is 132.1 in 2017 and 123.67 in 2018; however, the optimal penalty coefficient is 0.05867105 in 2017 and 0.0311904 in 2018.
We note that interest to sales (R14), return on assets (R15), and days in accounts receivable (R21) remain discriminant one and two years before financial distress for both techniques. These variables belong to the profitability and management categories. Interest to sales (R14) represents the weight of interest in relation to sales. A healthy financial situation is generally characterized by a level of interest not exceeding 2.5% or 3% of sales. Return on assets (R15) measures the net income earned for each amount invested in assets. This profitability ratio plays an important role in the early prediction of business failure and it can reduce its probability (Geng et al. 2014; Zizi et al. 2020). Days in accounts receivable (R21) relates accounts receivable (multiplied by 360) to sales and is expressed in the number of days of sales. Long payment terms can lead to business failure.

4.2. Descriptive Statistics

The main results of the descriptive statistics of selected variables by the two selection techniques (stepwise and lasso) are illustrated in Table A1, Table A2, Table A3 and Table A4 (Appendix A), namely descriptive statistics for selected variables, normality tests, correlation matrices, and multicollinearity tests.
We note from the descriptive statistics that failing SMEs are more indebted than their non-failing peers. SMEs in financial distress are more dependent on external funds with high means of debt to equity ratio (R4) and autonomy ratio (R7). Thus, the use of debt favors the increase in interest (R14). In addition, distressed SMEs are less solvent and they find it difficult to repay their debts with low average interest coverage (R5) compared to healthy SMEs. The results of the descriptive statistics also show that distressed SMEs are less profitable with negative return on assets (R15) and retained earnings to total assets (R17) means. Concerning management ratios, days in accounts receivable (R21) and duration of trade payables (R22) are longer for defaulting SMEs. Contrary to what was expected, liquidity expressed by the quick ratio (R2) is higher for distressed SMEs.
Based on the p-values of the Shapiro–Wilk and Lilliefors (adapted Kolmogorov–Smirnov test) normality tests, we reject the hypothesis of normality of the explanatory variables (p-value of the two tests are <0.05).
To ensure that significant correlations in absolute value close to 0.7 (such as the correlations between R6-R14 and R16-R21) do not give rise to a multicollinearity problem that can affect the results, we test the degree of multicollinearity by Variance Inflation Factor (VIF) and we calculate the tolerance coefficient (TOL). If the TOL is close to 0, then it can be considered that there is a significant collinearity for the variable. If it is close to 1 with a VIF value between 1 and 5, then it can be considered that the collinearity generated by the variable is not important and does not influence the reliability. Problematic multicollinearity exists if the VIF is greater than 10 or if the TOL is less than 0.1 (Zhang et al. 2010).
The VIF values of the selected ratios are all below 5 and their tolerances are close to 1. Therefore, we do not have a multicollinearity problem.

4.3. Estimation Results of the Stepwise and Lasso Logistic Regression Models

Table 4 and Table 5 present the estimation results of the stepwise logistic regression models. One year before financial distress, all variables in the model are significant at the threshold of 1%. Interest coverage (R5), autonomy ratio (R7), interest to sales (R14), and days in accounts receivable (R21) have a positive effect on financial distress. While return on assets (R15) negatively impacts financial distress. Interest to sales (R14) impacts more on the probability of financial distress. An increase in this ratio of one unit raises the probability of financial distress by 79.59%. Two years prior to financial distress, all ratios are significant at the threshold of 5% except for the repayment capacity (R8). Variables already selected by the stepwise method in 2017 retain the same sign in 2018. Interest to sales (R14) keeps the largest marginal effect and may increase the probability of default by 66.91%. While increasing return on assets (R15) by one unit may decrease the probability of financial distress by 35.78%.
Table 6 provides the estimation results of the lasso logistic regression models. In 2017, four out of seven variables have a positive effect on financial distress. While in 2018 four out of nine variables have a positive impact on financial distress. Regarding the marginal effect of ratios, increasing interest to sales (R14) raises the risk of financial distress to 10.09% in 2017 and 34.91% in 2018. While the increase in return on assets (R15) reduces the risk of default by 7.94% in 2017 and 6.50% in 2018.

4.4. Performance of Logit Models

The results obtained by the confusion matrices are based on the test sample.
As shown in Table 7, two years before the occurrence of financial distress, the stepwise logistic regression model correctly classifies 93.33% of the SMEs. One year before the occurrence of financial distress, the accuracy improves to 95.00% and the sensitivity is 96.67% (29/30 of the failing SMEs are correctly classified).
Regarding the performance of lasso logistic regression models, the accuracy improves in 2018 with 86.67% compared to 80% in 2017. The type I error (When a model classifies a failing company as healthy) goes from 16.67% in 2017 to 13.33% in 2018 showing the improvement of the quality of the model when financial distress is imminent.

4.5. Performance of Neural Networks Models

To find the best neural networks models for stepwise logistic selection and lasso logistic selection, we vary the network parameters, namely the hidden layers from 0 to 10 and the number of its nodes from 0 to 10. We find that the best neural networks models for stepwise logistic selection (resp for lasso logistic selection) are composed of a single hidden layer containing three nodes.
According to Table 8, in 2017 the lasso neural networks model performs better than the stepwise neural networks model with an accuracy of 83.33%. In addition, the type I error of the lasso neural networks model is 6.67% against 13.33% for the stepwise neural networks model, a difference of 6.66%.
As for 2018, the stepwise neural networks model has a higher overall accuracy of 88.33% versus 86.67% for the lasso neural networks model.
In general, the performance of neural networks models improves one year before the financial distress. Furthermore, these models achieve a lower type I errors than type II errors.
As shown in the Appendix B, the architecture of neural networks consists of three layers (input layer, output layer, and one hidden layer). The nodes of the input layer correspond to the ratios selected by the lasso and stepwise techniques. The solution to the dichotomous problem (distressed SME or healthy SME) is provided by the output layer.

5. Discussion

The performance metrics of our prediction models are summarized in Table 9 and Table 10. In addition to those used in Table 7 and Table 8, we add precision, F1-score, and AUC. Precisions and F1-scores of our models improve one year before financial distress as the other metrics. For the AUC metric, the values obtained vary between 0.833 and 0.959, thus showing an excellent discrimination capacity of the models (Long and Freese 2006). Furthermore, our models correctly classify distressed SMEs better than healthy SMEs. That is, our models have lower type I errors than type II errors. Indeed, type I errors are considered by the literature as the most costly for all stakeholders (Bellovary et al. 2007). These findings are in contrast with those of Shrivastav and Ramudu (2020) and Durica et al. (2021). On a sample of 59 Indian banks, Shrivastav and Ramudu (2020) obtained by support vector machine with linear kernel a type I error of 25% and a type II error of 0%. One year before the default, Durica et al. (2021) obtained by the CART algorithm a better classification of healthy Slovak companies with 94.93% compared to a classification of 81.48% for Slovak companies in financial distress.
Regarding the performance of the models based on lasso selection, neural networks give better performances with an accuracy of 83.33% in 2017 and 86.67% in 2018 against 80.00% and 86.67% for logistic regression, respectively. However, our best results are obtained by stepwise selection with an accuracy of 93.33% in 2017 and 95.00% in 2018 for logistic regression and an accuracy of 88.33% in 2018 for neural networks. In general, our results show the superior performances of logistic regression over neural networks. These findings are in line with the works of Du Jardin and Séverin (2012), Islek and Oguducu (2017), Kim et al. (2018), Lukason and Andresson (2019), and Malakauskas and Lakštutienė (2021). For example, logistic regression reached for Du Jardin and Séverin (2012) an accuracy of 81.6% against 81.3% for neural networks with data collected over one year. Similarly for Lukason and Andresson (2019) where logistic regression scored first on the test sample with 90.2% accuracy followed by multilayer perceptron with 87.60%.
By comparing our logistic regression results obtained by the stepwise selection technique, we can say that they are well above the average obtained by other studies on the topic of prediction of financial distress (Bateni and Asghari 2020; Cohen et al. 2017; Vu et al. 2019; Guan et al. 2020; Ogachi et al. 2020; Tong and Serrasqueiro 2021; Rahman et al. 2021; Park et al. 2021). On a sample of 64 listed companies in the Nairobi Securities Exchange, Ogachi et al. (2020) correctly classified 83% of the companies through logistic regression with the following significant ratios: working capital ratio, current ratio, debt ratio, total asset, debtors turnover, debt–equity ratio, asset turnover, and inventory turnover. Tong and Serrasqueiro (2021) used logistic regression to predict the financial distress of Portuguese small and mid-sized enterprises operating in Portuguese technology manufacturing sectors. Logistic regression models managed to correctly classify 79.60% in 2013, 80.40% in 2014, and 79.20% in 2015 for the financial distress group. Based on a sample of U.S. publicly traded companies, Rahman et al. (2021) achieved an overall accuracy of 79.2% in the hold-out sample. As for Shrivastava et al. (2018), they achieved better performance by Bayesian logit model with an accuracy of 98.9% on a sample of Indian firms extracted from Capital IQ.
For neural networks, our best results outperform those of Kim et al. (2018), Lukason and Andresson (2019), Papana and Spyridou (2020), and Malakauskas and Lakštutienė (2021). For instance, using neural networks with 42 nodes in the hidden layer, Kim et al. (2018) found an accuracy of 71.9% through 41 financial ratios selected from 1548 Korean heavy industry companies. To predict bankruptcy in the Greek market, Papana and Spyridou (2020) achieved by neural networks a good classification rate of 65.7% two years before bankruptcy and 70% one year before bankruptcy; however, our results are lower than those of Islek and Oguducu (2017) and Paule-Vianez et al. (2020). We take as an example the Paule-Vianez et al. (2020) model that achieved an overall success of 97.3% in predicting the financial distress of Spanish credit institutions.
In the Moroccan context, our results are better than Azayite and Achchab (2017), Khlifa (2017), Idrissi and Moutahaddib (2020), and Zizi et al. (2020) for either logistic regression or neural networks. Using logistic regression, Khlifa (2017) correctly classified 88.2% of Moroccan firms and Zizi et al. (2020) managed to achieve an overall accuracy of 84.44% two years and one year before the default. While our best logistic regression models correctly classify 93.33% of firms two years before financial distress and 95.00% of firms one year before financial distress. Same observation for neural networks where our best model achieves an accuracy of 88.33% against 80.76% for Idrissi and Moutahaddib (2020) and 85.6% for Azayite and Achchab (2017).

6. Conclusions

The lack of consensus on predictors of financial distress, the limited studies on the prediction of financial distress in Morocco, and the crucial role that the prediction of financial distress plays in a specific context led us to conduct this study. The objectives of this article were to determine the most relevant predictors of financial distress and identify its optimal prediction models.
To achieve these objectives, we have used logistic regression and neural networks on a sample of 180 SMEs in the Fez-Meknes region, including 123 healthy SMEs and 57 distressed SMEs. The SMOTE technique was used to solve the problem of unbalanced data. Focusing on Morocco, financial distress is defined according to Bank Al Maghrib’s circular n° 19/G/2002. Following the literature review on the topic and the context of the study, we have used a battery of 23 financial ratios as initial predictors. Our models were based on the discriminant ratios selected by the lasso and stepwise techniques.
Our results highlighted the importance of variables such as interest to sales (R14) and return on assets (R15) in predicting financial distress. Interest to sales (R14) has a positive impact on financial distress and retains the largest marginal effect over two years for both selection techniques, while return on assets (R15) reduces the probability of financial distress.
Empirical results on test samples showed the superiority of logistic regression over neural networks with accuracies obtained by stepwise selection of 93.33% two years before financial distress and 95.00% one year before financial distress. In addition, our results showed that performance metrics improved one year before financial distress. As an example, the accuracies ranged from 80.00% (logistic regression with lasso selection) to 93.33% (logistic regression with stepwise selection) in 2017 while in 2018 they ranged from 86.67% (neural networks with lasso selection) to 95.00% (logistic regression with stepwise selection). Furthermore, our models classified distressed SMEs better than healthy SMEs with type I errors lower than type II errors.
The results have practical implications for creditors, academics, and managers. Our proposed models can be effective for creditors who should assess the financial condition of borrowing firms and make low-risk credit-granting decisions to avoid capital loss. From an academic point of view, this paper suggests that logistic regression is a robust and more accurate tool in predicting the financial distress of Moroccan SMEs. As far as managers are concerned, our results will allow them to take corrective actions upstream through the proposed variables representing early warning signals.
Constrained by the availability of information, our results can be improved by increasing the sample size and introducing qualitative and macroeconomic variables into our models. Finally, future studies on business failure prediction in Morocco can consider comparing the results of our models with other machine learning techniques such as random forests or decision trees.

Author Contributions

All authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Descriptive Statistics Tables

Table A1. Descriptive statistics for variables selected by stepwise technique, year 2017.
Table A1. Descriptive statistics for variables selected by stepwise technique, year 2017.
VariablesR5R7R14R15R21
Entire data
Mean25.7300.670510.0110080.025758175.19
Std101.39634.9331850.023129980.08351403195.8285
Lilliefors (Kolmogorov–Smirnov) normality test<2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16
Shapiro–Wilk normality test (p-value)<2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 4.554 × 10 16 4.554 × 10 16
Distressed SMEs
Mean−1.0371.07560.02248−0.0132258.20
Std12.829265.879870.036443430.08829215276.327
Non-distressed SMEs
Mean38.1350.48280.0056910.04381136.72
Std120.4964.4412640.0092361370.07494895128.4722
Correlation matrix
R51.00
R70.041.00
R140.080.381.00
R150.29−0.12−0.381.00
R21−0.09−0.050.15−0.111.00
Multicollinearity test
VIF1.04851.05671.13141.10501.0602
TOL0.95380.94630.88390.90490.9432
Notes: Std indicates standard deviation.
Table A2. Descriptive statistics for variables selected by lasso technique, year 2017.
Table A2. Descriptive statistics for variables selected by lasso technique, year 2017.
VariablesR2R5R14R15R17R21R22
Entire data
Mean1.192625.7300.0110080.0257580.04643175.19126.67
Std1.408321101.39630.023129980.083514030.2095069195.8285179.956
Lilliefors (Kolmogorov–Smirnov) normality test<2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16
Shapiro–Wilk normality test (p-value)<2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 6.591 × 10 15 4.554 × 10 16 <2.2 × 10 16
Distressed SMEs
Mean1.5586−1.0370.02248−0.0132−0.01017258.20183.65
Std1.87453512.829260.036443430.088292150.2715556276.327276.2145
Non-distressed SMEs
Mean1.023038.1350.0056910.043810.07267136.72100.27
Std1.097959120.4960.0092361370.074948950.168407128.4722101.3615
Correlation matrix
R21.00
R50.031.00
R14 0.24 **0.081.00
R150.10 0.29 *** 0.38 ***1.00
R170.11 0.18 *−0.070.171.00
R21 0.48 **−0.09 0.15 . −0.110.091.00
R22−0.08−0.040.23−0.19−0.02 0.30 ***1.00
Multicollinearity test
VIF1.14131.05071.17511.09271.04971.19291.1988
TOL0.87620.95170.85100.91520.95270.83830.8342
Notes: Std indicates standard deviation; *** significance level at 0.001; ** significance level at 0.01; * significance level at 0.05; . significance level at 0.1.
Table A3. Descriptive statistics for variables selected by stepwise technique, year 2018.
Table A3. Descriptive statistics for variables selected by stepwise technique, year 2018.
VariablesR5R8R14R15R17R21
Entire data
Mean41.534.7770.01215230.0087710.04539230.05
Std237.397221.651860.023285630.21447450.2189421394.0288
Lilliefors (Kolmogorov–Smirnov) normality test<2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16
Shapiro–Wilk normality test (p-value)<2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 4.545 × 10 16 <2.2 × 10 16
Distressed SMEs
Mean24.51210.92670.025189−0.073553−0.02475424.8
Std318.679636.548850.034591380.35591140.290211639.2844
Non-distressed SMEs
Mean49.42241.926460.00611110.046920.07789139.78
Std189.40456.9863720.01140670.068644350.1682521119.4231
Correlation matrix
R51.00
R8 0.21 *1.00
R14−0.05 0.34 ***1.00
R15 0.47 ***0.00 0.42 ***1.00
R170.04−0.08−0.120.221.00
R21−0.080.050.08−0.11 0.06 **1.00
Multicollinearity test
VIF1.01321.01371.15831.09731.00521.0538
TOL0.98690.98650.86330.91140.99490.9489
Notes: Std indicates standard deviation; *** significance level at 0.001; ** significance level at 0.01; * significance level at 0.05; . significance level at 0.1.
Table A4. Descriptive statistics for variables selected by lasso technique, year 2018.
Table A4. Descriptive statistics for variables selected by lasso technique, year 2018.
VariablesR4R6R8R14R15R16R17R20R21
Entire data
Mean0.656870.094214.7770.01215230.0087711.154360.045390.4837230.05
Std1.6015350.205096121.651860.023285630.21447451.209650.21894210.8906335394.0288
Lilliefors (Kolmogorov–Smirnov) normality test<2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 2.2 × 10 16
Shapiro–Wilk normality test (p-value)<2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 <2.2 × 10 16 4.554 × 10 16 <2.2 × 10 16 <2.2 × 10 16
Distressed SMEs
Mean1.10360.0785310.92670.025189−0.0735530.70415−0.024750.59976424.8
Std2.4927080.122701336.548850.034591380.35591140.47775460.2902111.222349639.2844
Non-distressed SMEs
Mean0.4498270.101481.926460.00611110.046921.36300.077890.42988139.78
Std0.88014610.23374896.9863720.01140670.068644351.3796930.16825210.6846804119.4231
Correlation matrix
R41.00
R6 0.30 ***1.00
R8 0.50 *** 0.26 **1.00
R14 0.56 *** 0.69 *** 0.34 ***1.00
R15 0.30 *** 0.21 **0.00 0.42 ***1.00
R160.05-0.040.05 0.20 ** 0.29 ***1.00
R17−0.09−0.04−0.08−0.12 0.22 **0.051.00
R200.03 0.14 . 0.13 . 0.15 * 0.29 *** 0.39 ***0.041.00
R210.000.000.050.08−0.11 0.58 ***0.060.051.00
Multicollinearity test
VIF1.28431.07501.22431.23591.11301.29041.03951.15831.1467
TOL0.77860.93030.81680.80910.89850.77490.96200.86330.8721
Notes: Std indicates standard deviation; *** significance level at 0.001; ** significance level at 0.01; * significance level at 0.05; . significance level at 0.1.

Appendix B. Architectures of Neural Networks Models

Figure A1. Neural networks model for stepwise selection technique, year 2017.
Figure A1. Neural networks model for stepwise selection technique, year 2017.
Risks 09 00200 g0a1
Figure A2. Neural networks model for stepwise selection technique, year 2018.
Figure A2. Neural networks model for stepwise selection technique, year 2018.
Risks 09 00200 g0a2
Figure A3. Neural networks model for lasso selection technique, year 2017.
Figure A3. Neural networks model for lasso selection technique, year 2017.
Risks 09 00200 g0a3
Figure A4. Neural networks model for lasso selection technique, year 2018.
Figure A4. Neural networks model for lasso selection technique, year 2018.
Risks 09 00200 g0a4

Appendix C. Machine Learning Libraries

library(Matrix); library(glmnet); library(lasso2); library(MASS);library(caret); library (mlbench); library(neuralnet); library(e1071); library(ROSE); library(smotefamily); library (pROC).

Notes

1
According to Maroc PME, SMEs are companies with a turnover of less than or equal to 200 million dirhams.
2
A graph that relates true positive rates and false positive rates. By varying the threshold S (threshold used for the assignment rule) over the interval [0, 1], the ROC curve is constructed and the true positive and false positive rates are calculated.

References

  1. Affes, Zeineb, and Rania Hentati-Kaffel. 2019. Predicting US banks bankruptcy: Logit versus Canonical Discriminant analysis. Computational Economics 54: 199–244. [Google Scholar] [CrossRef] [Green Version]
  2. Altman, Edward I. 1968. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance 23: 589–609. [Google Scholar] [CrossRef]
  3. Altman, Edward I., Małgorzata Iwanicz-Drozdowska, Erkki K. Laitinen, and Arto Suvas. 2017. Financial distress prediction in an international context: A review and empirical analysis of Altman’s Z score model. Journal of International Financial Management & Accounting 28: 131–71. [Google Scholar]
  4. Altman, Edward I., Małgorzata Iwanicz-Drozdowska, Erkki K. Laitinen, and Arto Suvas. 2020. A race for long horizon bankruptcy prediction. Applied Economics 52: 4092–111. [Google Scholar] [CrossRef]
  5. Amor, S. Ben, Nabil Khoury, and Marko Savor. 2009. Modèle prévisionnel de la défaillance financière des PME québécoises emprunteuses. Journal of Small Business and Entrepreneurship 22: 517–34. [Google Scholar] [CrossRef]
  6. Azayite, Fatima Zahra, and Said Achchab. 2017. The impact of payment delays on bankruptcy prediction: A comparative analysis of variables selection models and neural networks. Paper presented at 2017 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech), Rabat, Morocco, October 24–26; pp. 1–7. [Google Scholar]
  7. Back, Barbro, Teija Laitinen, Kaisa Sere, and Michiel van Wezel. 1996. Choosing Bankruptcy Predictors Using Discriminant Analysis, Logit Analysis, And Genetic Algorithms. Turku Centre for Computer Science Technical Report 40: 1–18. [Google Scholar]
  8. Balcaen, Sofie, and Hubert Ooghe. 2006. 35 years of studies on business failure: An overview of the classic statistical methodologies and their related problems. The British Accounting Review 38: 63–93. [Google Scholar] [CrossRef]
  9. Bank Al-Maghrib. 2002. Circulaire du Gouverneur de Bank Al-Maghrib n°19/G/2002 du 23 décembre 2002 (18 chaoual 1423) Relative à la Classification des Créances et à leur Couverture par les Provisions. Rabat: Bank Al-Maghrib. [Google Scholar]
  10. Bateni, Leila, and Farshid Asghari. 2020. Bankruptcy Prediction Using Logit and Genetic Algorithm Models: A Comparative Analysis. Computational Economics 55: 335–48. [Google Scholar] [CrossRef]
  11. Beaver, William H. 1966. Financial ratios as predictors of failure. Journal of Accounting Research 4: 71–111. [Google Scholar] [CrossRef]
  12. Bellanca, Sabrina, Loredana Cultrera, and Guillaume Vermeylen. 2015. «La faillite des PME belges». La Libre Belgique, March 28. [Google Scholar]
  13. Bellovary, Jodi L., Don E. Giacomino, and Michael D. Akers. 2007. A review of bankruptcy prediction studies: 1930 to present. Journal of Financial Education 1: 1–42. [Google Scholar]
  14. Bunn, Philip, and Victoria Redwood. 2003. Company Accounts-Based Modelling of Business Failures and the Implications for Financial Stability. Bank of England Working Paper No. 210. London: Bank of England. [Google Scholar]
  15. Charalambakis, Evangelos C., and Ian Garrett. 2019. On corporate financial distress prediction: What can we learn from private firms in a developing economy? Evidence from Greece. Review of Quantitative Finance and Accounting 52: 467–91. [Google Scholar] [CrossRef] [Green Version]
  16. Chawla, Nitesh V., Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16: 321–57. [Google Scholar] [CrossRef]
  17. Chen, Wei-Sen, and Yin-Kuan Du. 2009. Using neural networks and data mining techniques for the financial distress prediction model. Expert Systems with Applications 36: 4075–86. [Google Scholar] [CrossRef]
  18. Cohen, Sandra, Antonella Costanzo, and Francesca Manes-Rossi. 2017. Auditors and early signals of financial distress in local governments. Managerial Auditing Journal 32: 234–50. [Google Scholar] [CrossRef]
  19. Crutzen, Nathalie, and Didier Van Caillie. 2007. The Business Failure Process: Towards an Integrative Model of the Literature. Paper presented at International Workshop on Default Risk and Financial Distress, Rennes, France, September 13–14. [Google Scholar]
  20. Deakin, Edward B. 1972. A discriminant analysis of predictors of business failure. Journal of Accounting Research 10: 167–79. [Google Scholar] [CrossRef]
  21. Du Jardin, Philippe. 2015. Bankruptcy prediction using terminal failure processes. European Journal of Operational Research 242: 286–303. [Google Scholar] [CrossRef]
  22. Du Jardin, Philippe, and Éric Séverin. 2012. Forecasting financial failure using a Kohonen map: A comparative study to improve model stability over time. European Journal of Operational Research 221: 378–96. [Google Scholar] [CrossRef] [Green Version]
  23. Durica, Marek, L. Svabova, and Jaroslav Frnda. 2021. Financial distress prediction in Slovakia: An application of the cart algorithm. Journal of International Studies 14: 201–15. [Google Scholar] [CrossRef]
  24. Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2017. The Elements of Statistical Learning. New York: Springer Series in Statistics, vol. 1. [Google Scholar]
  25. Gebhardt, G. 1980. Insolvency prediction based on annual financial statements according to the company law. An assessment of the reform of annual statements by the law of 1965 from the view of external addresses. Bochumer Beitrage zur Untennehmungs und Unternehmens-Forschung 22. [Google Scholar]
  26. Geng, Ruibin, Indranil Bose, and Xi Chen. 2014. Prediction of financial distress: An empirical study of listed chinese companies using data mining. European Journal of Operational Research 241: 236–47. [Google Scholar] [CrossRef]
  27. Gregova, Elena, Katarina Valaskova, Peter Adamko, Milos Tumpach, and Jaroslav Jaros. 2020. Predicting financial distress of slovak enterprises: Comparison of selected traditional and learning algorithms methods. Sustainability 12: 3954. [Google Scholar] [CrossRef]
  28. Guan, Rong, Huiwen Wang, and Haitao Zheng. 2020. Improving accuracy of financial distress prediction by considering volatility: An interval-data-based discriminant model. Computational Statistics 35: 491–514. [Google Scholar] [CrossRef]
  29. Hafiz, Alaka, Oyedele Lukumon, Bilal Muhammad, Akinade Olugbenga, Owolabi Hakeem, and Ajayi Saheed. 2015. Bankruptcy prediction of construction businesses: Towards a big data analytics approach. Paper presented at 2015 IEEE First International Conference on Big Data Computing Service and Applications, Redwood City, CA, USA, March 30–April 2; pp. 347–52. [Google Scholar]
  30. Haut-Commissariat au Plan. 2018. Note D’information Relative aux Comptes Régionaux de L’année 2018. Casablanca: Haut-Commissariat au Plan. [Google Scholar]
  31. Haut-Commissariat au Plan. 2019. Enquête Nationale Auprès des Entreprises, Premiers Résultats 2019. Casablanca: Haut-Commissariat au Plan. [Google Scholar]
  32. Idrissi, Khadir, and Aziz Moutahaddib. 2020. Prédiction de la défaillance financière des PME marocaine: Une étude comparative. Revue Africaine de Management 5: 18–36. [Google Scholar]
  33. Inforisk. 2020. Étude Inforisk, Défaillances Maroc 2019. Casablanca: Inforisk. [Google Scholar]
  34. Islek, Irem, Idris Murat Atakli, and Sule Gunduz Oguducu. 2017. A Framework for Business Failure Prediction. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10246 LNAI. Zakopane: Springer, pp. 74–83. [Google Scholar] [CrossRef]
  35. Iturriaga, Félix J. López, and Iván Pastor Sanz. 2015. Bankruptcy visualization and prediction using neural networks: A study of US commercial banks. Expert Systems with Applications 42: 2857–69. [Google Scholar] [CrossRef]
  36. Jabeur, Sami Ben. 2017. Bankruptcy prediction using Partial Least Squares Logistic Regression. Journal of Retailing and Consumer Services 36: 197–202. [Google Scholar] [CrossRef]
  37. Jeong, Chulwoo, Jae H. Min, and Myung Suk Kim. 2012. A tuning method for the architecture of neural network models incorporating GAM and GA as applied to bankruptcy prediction. Expert Systems with Applications 39: 3650–58. [Google Scholar] [CrossRef]
  38. Jones, Stewart, David Johnstone, and Roy Wilson. 2017. Predicting corporate bankruptcy: An evaluation of alternative statistical frameworks. Journal of Business Finance & Accounting 44: 3–34. [Google Scholar]
  39. Kamaluddin, Amrizah, Norhafizah Ishak, and Nor Farizal Mohammed. 2019. Financial distress prediction through cash flow ratios analysis. International Journal of Financial Research 10: 63–76. [Google Scholar] [CrossRef]
  40. Kherrazi, Soufiane, and Khalifa Ahsina. 2016. Défaillance et politique d’entreprises: Modélisation financière déployée sous un modèle logistique appliqué aux PME marocaines. La Revue Gestion et Organisation 8: 53–64. [Google Scholar] [CrossRef] [Green Version]
  41. Khlifa, Selma Haj. 2017. Predicting default risk of SMEs in developing economies: Evidence from Morocco. Journal of WEI Business and Economics 6: 3. [Google Scholar]
  42. Kim, Kyoung-jae, Kichun Lee, and Hyunchul Ahn. 2018. Predicting corporate financial sustainability using Novel Business Analytics. Sustainability 11: 64. [Google Scholar] [CrossRef] [Green Version]
  43. Kisman, Zainul, and Dian Krisand. 2019. How to Predict Financial Distress in the Wholesale Sector: Lesson from Indonesian Stock Exchange. Journal of Economics and Business 2: 569–85. [Google Scholar] [CrossRef] [Green Version]
  44. Kliestik, Tomas, Katarina Valaskova, George Lazaroiu, Maria Kovacova, and Jaromir Vrbka. 2020. Remaining financially healthy and competitive: The role of financial predictors. Journal of Competitiveness 12: 74–92. [Google Scholar] [CrossRef]
  45. Kovacova, Maria, Tomas Kliestik, Katarina Valaskova, Pavol Durana, and Zuzana Juhaszova. 2019. Systematic review of variables applied in bankruptcy prediction models of Visegrad group countries. Oeconomia Copernicana 10: 743–72. [Google Scholar] [CrossRef] [Green Version]
  46. Long, J. Scott, and Jeremy Freese. 2006. Regression Models for Categorical Dependent Variables Using Stata. College Station: Stata Press. [Google Scholar]
  47. Lukason, Oliver, and Art Andresson. 2019. Tax arrears versus financial ratios in bankruptcy prediction. Journal of Risk and Financial Management 12: 187. [Google Scholar] [CrossRef] [Green Version]
  48. Lukason, Oliver, and Erkki K. Laitinen. 2019. Firm failure processes and components of failure risk: An analysis of European bankrupt firms. Journal of Business Research 98: 380–90. [Google Scholar] [CrossRef]
  49. Malakauskas, Aidas, and Aušrinė Lakštutienė. 2021. Financial distress prediction for small and medium enterprises using machine learning techniques. Engineering Economics 32: 4–14. [Google Scholar] [CrossRef]
  50. Malécot, Jean-François. 1981. Les défaillances: Un essai d’explication. [Google Scholar]
  51. Mselmi, Nada, Amine Lahiani, and Taher Hamza. 2017. Financial distress prediction: The case of French small and medium-sized firms. International Review of Financial Analysis 50: 67–80. [Google Scholar] [CrossRef]
  52. Odom, Marcus D., and Ramesh Sharda. 1990. A neural network model for bankruptcy prediction. Paper presented at 1990 IJCNN International Joint Conference on Neural Networks, San Diego, CA, USA, June 17–21; pp. 163–68. [Google Scholar]
  53. Ogachi, Daniel, Richard Ndege, Peter Gaturu, and Zeman Zoltan. 2020. Corporate Bankruptcy Prediction Model, a Special Focus on Listed Companies in Kenya. Journal of Risk and Financial Management 13: 47. [Google Scholar] [CrossRef] [Green Version]
  54. Ohlson, James A. 1980. Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research 18: 109–31. [Google Scholar] [CrossRef] [Green Version]
  55. Papana, Angeliki, and Anastasia Spyridou. 2020. Bankruptcy Prediction: The Case of the Greek Market. Forecasting 2: 505–25. [Google Scholar] [CrossRef]
  56. Park, Sunghwa, Hyunsok Kim, Janghan Kwon, and Taeil Kim. 2021. Empirics of Korean Shipping Companies’ Default Predictions. Risks 9: 159. [Google Scholar] [CrossRef]
  57. Paule-Vianez, Jessica, Milagros Gutiérrez-Fernández, and José Luis Coca-Pérez. 2020. Prediction of financial distress in the Spanish banking system: An application using artificial neural networks. Applied Economic Analysis 28: 69–87. [Google Scholar] [CrossRef]
  58. Psillaki, Maria. 1995. Rationnement du crédit et PME: Une tentative de mise en relation. Revue Internationale PME économie et Gestion de La Petite et Moyenne Entreprise 8: 67–90. [Google Scholar] [CrossRef] [Green Version]
  59. Rahman, Mahfuzur, Cheong Li Sa, and Md KaiumMasud. 2021. Predicting Firms’ Financial Distress: An Empirical Analysis Using the F-Score Model. Journal of Risk and Financial Management 14: 199. [Google Scholar] [CrossRef]
  60. Refait-Alexandre, Catherine. 2004. La prévision de la faillite fondée sur l’analyse financière de l’entreprise: Un état des lieux. Economie Prevision 1: 129–47. [Google Scholar] [CrossRef]
  61. Sharifabadi, M. Ramezani, M. Mirhaj, and Naser Izadinia. 2017. The impact of financial ratios on the prediction of bankruptcy of small and medium companies. QUID: Investigación, Ciencia y Tecnología 1: 164–73. [Google Scholar]
  62. Shi, Yin, and Xiaoni Li. 2019. An overview of bankruptcy prediction models for corporate firms: A systematic literature review. Intangible Capital 15: 114–27. [Google Scholar] [CrossRef] [Green Version]
  63. Shrivastav, Santosh Kumar, and P. Janaki Ramudu. 2020. Bankruptcy prediction and stress quantification using support vector machine: Evidence from Indian banks. Risks 8: 52. [Google Scholar] [CrossRef]
  64. Shrivastava, Arvind, Kuldeep Kumar, and Nitin Kumar. 2018. Business distress prediction using bayesian logistic model for Indian firms. Risks 6: 113. [Google Scholar] [CrossRef] [Green Version]
  65. Sun, Jie, Hui Li, Qing-Hua Huang, and Kai-Yu He. 2014. Predicting financial distress and corporate failure: A review from the state-of-the-art definitions, modeling, sampling, and featuring approaches. Knowledge-Based Systems 57: 41–56. [Google Scholar] [CrossRef]
  66. Svabova, Lucia, Lucia Michalkova, Marek Durica, and Elvira Nica. 2020. Business failure prediction for Slovak small and medium-sized companies. Sustainability 12: 4572. [Google Scholar] [CrossRef]
  67. Tong, Yehui, and Zélia Serrasqueiro. 2021. Predictions of failure and financial distress: A study on portuguese high and medium-high technology small and midsized enterprises. Journal of International Studies 14: 9–25. [Google Scholar] [CrossRef] [PubMed]
  68. Valaskova, Katarina, Tomas Kliestik, Lucia Svabova, and Peter Adamko. 2018. Financial risk measurement and prediction modelling for sustainable development of business entities using regression analysis. Sustainability 10: 2144. [Google Scholar] [CrossRef] [Green Version]
  69. Van Caillie, Didier. 1993. Apports de l’analyse factorielle des correspondances multiples à l’étude de la détection des signaux annonciateurs de faillite parmi les PME. Cahier de Recherche Du Service d’Informatique de Gestion (Reprint: Cahier de Recherche Du CEPE). Ph.D. thesis, Université de Liège, Liège, Belgiquepp; pp. 1–36. [Google Scholar]
  70. Vu, Loan Thi, Nga Thu Nguyen, Lien Thi Vu, Phuong Thi Thuy Do, and Dong Phuong Dao. 2019. Feature selection methods and sampling techniques to financial distress prediction for Vietnamese listed companies. Investment Management and Financial Innovations 16: 276–90. [Google Scholar] [CrossRef] [Green Version]
  71. Zhang, Guoqiang, Michael Y. Hu, B. Eddy Patuwo, and Daniel C. Indro. 1999. Artificial neural networks in bankruptcy prediction: General framework and cross-validation analysis. European Journal of Operational Research 116: 16–32. [Google Scholar] [CrossRef] [Green Version]
  72. Zhang, Ling, Edward I. Altman, and Jerome Yen. 2010. Corporate financial distress diagnosis model and application in credit rating for listing firms in China. Frontiers of Computer Science in China 4: 220–36. [Google Scholar] [CrossRef]
  73. Zizi, Youssef, Mohamed Oudgou, and Abdeslam El Moudden. 2020. Determinants and predictors of smes’ financial failure: A logistic regression approach. Risks 8: 107. [Google Scholar] [CrossRef]
Table 1. Class distribution before and after resampling.
Table 1. Class distribution before and after resampling.
Before ResamplingAfter Resampling
0101
0.68330.31660.50.5
Notes: 0 indicates the class of healthy SMEs and 1 indicates the class of SMEs in financial distress.
Table 2. Financial ratios used as initial features.
Table 2. Financial ratios used as initial features.
Liquidity
R1Current Ratio C u r r e n t A s s e t s C u r r e n t L i a b i l i t i e s
R2Quick Ratio L i q u i d A s s e t s C u r r e n t L i a b i l i t i e s
R3Working Capital to Total Assets W o r k i n g C a p i t a l T o t a l A s s e t s
Solvency and Capital Structure
R4Debt to Equity Ratio T o t a l D e b t S h a r e h o l d e r s E q u i t y
R5Interest Coverage E B I T I n t e r e s t E x p e n s e
R6Cost of Debt I n t e r e s t E x p e n s e T o t a l D e b t
R7Autonomy Ratio M e d i u m - a n d L o n g - T e r m F i n a n c i a l D e b t S h a r e h o l d e r s E q u i t y
R8Repayment Capacity F i n a n c i a l D e b t S e l f - F i n a n c i n g C a p a c i t y
R9Bank Loans S h o r t - T e r m F i n a n c i a l D e b t T o t a l D e b t
R10Financial Equilibrium W o r k i n g C a p i t a l W o r k i n g C a p i t a l R e q u i r e m e n t
R11Trade Payables to Total Liabilities T r a d e P a y a b l e s T o t a l L i a b i l i t i e s
Profitability
R12Operating Income to Sales E B I T S a l e s
R13Value added to Sales V a l u e a d d e d S a l e s
R14Interest to Sales I n t e r e s t S a l e s
R15Return On Assets N e t I n c o m e T o t a l A s s e t s
R16Asset Turnover S a l e s T o t a l A s s e t s
R17Retained Earnings to Total Assets R e t a i n e d E a r n i n g s T o t a l A s s e t s
R18Return On Equity N e t I n c o m e S h a r e h o l d e r s E q u i t y
R19Profit Margin N e t I n c o m e S a l e s
Management
R20Inventory to Sales I n v e n t o r y S a l e s
R21Days in Accounts Receivable A c c o u n t s R e c e i v a b l e S a l e s × 360
R22Duration of Trade Payables T r a d e P a y a b l e s P u r c h a s e s + O t h e r E x t e r n a l C h a r g e s I n c l u d i n g T a x × 360
R23Working Capital Requirement Management W o r k i n g C a p i t a l R e q u i r e m e n t S a l e s
Notes: EBIT indicates Earnings Before Interest and Taxes.
Table 3. Discriminants ratios selected by lasso and stepwise techniques.
Table 3. Discriminants ratios selected by lasso and stepwise techniques.
Stepwise Logistic Technique Lasso Logistic Technique
Year20172018 20172018
Selected variablesR5R5 R2R4
R7R8 R5R6
R14R14 R14R8
R15R15 R15R14
R21R17 R17R15
R21 R21R16
R22R17
R20
R21
BIC132.1123.67penalty coefficient0.058671050.0311904
Table 4. Stepwise logistic regression results in 2017.
Table 4. Stepwise logistic regression results in 2017.
2017 Two Years Prior to Financial Distress
EstimateStd.ErrorZ Value Pr()
(Intercept)−2.1584.451 × 10 1 −4.8471.25 × 10 6 ***
R51.752 × 10 3 6.205 × 10 4 2.8230.004754 **
R71.0153.083 × 10 1 3.2910.000998 ***
R147.959 × 10 1 2.155 × 10 1 3.6930.000221 ***
R15−2.774 × 10 1 6.603−4.202−2.65 × 10 5 ***
R214.957 × 10 3 1.305 × 10 3 3.7980.000146 ***
Notes: *** significance level at 0.001; ** significance level at 0.01; * significance level at 0.05; . significance level at 0.1.
Table 5. Stepwise logistic regression results in 2018.
Table 5. Stepwise logistic regression results in 2018.
2018 One Year Prior to Financial Distress
EstimateStd.ErrorZ Value Pr()
(Intercept)−2.3935.218 × 10 1 −4.5874.50 × 10 6 ***
R51.907 × 10 3 6.895 × 10 4 2.7660.00568 **
R83.291 × 10 2 1.783 × 10 2 1.8460.06491 .
R146.691 × 10 1 2.114 × 10 1 3.1650.00155 **
R15−3.578 × 10 1 8.646−4.138−3.50 × 10 5 ***
R17−3.296 × 10 3 1.281−2.5720.01010 *
R217.490 × 10 3 1.792 × 10 3 4.1792.92 × 10 5 ***
Notes: *** significance level at 0.001; ** significance level at 0.01; * significance level at 0.05; . significance level at 0.1.
Table 6. Lasso logistic regression results.
Table 6. Lasso logistic regression results.
2017 Two Years Prior to Financial Distress2018 One Year Prior to Financial Distress
RatiosCoefficientsRatiosCoefficients
R20.0574R40.0937
R5−0.0010R6−0.9277
R1410.0928R80.0029
R15−7.9388R1434.9176
R17-0.4502R15−6.5013
R210.0010R16−0.0700
R220.0003R17−1.2586
R20−0.1070
R210.0016
Table 7. Confusion matrices for logit models, years: 2017–2018.
Table 7. Confusion matrices for logit models, years: 2017–2018.
Stepwise Logistic Regression Lasso Logistic Regression
2017 two years prior to financial distress
01 01
028 (93.33%) a2 (6.67%) b023 (76.67%)7 (23.33%)
12 (6.67%) c28(93.33%) d15 (16.67%)25 (83.33%)
Overall accuracy93.33% Overall accuracy80.00%
2018 one year prior to financial distress
01 01
028 (93.33%)2 (6.67%)026 (86.67%)4 (13.33%)
11 (3.33%)29 (96.67%)14 (13.33%)26 (86.67%)
Overall accuracy95.00% Overall accuracy86.67%
Notes: a indicates the specificity; b indicates the type II error; c indicates the type I error; d indicates the sensitivity. The rate of the metrics are shown in parentheses. 0 and 1 indicate healthy SMEs and financially distressed SMEs, respectively.
Table 8. Confusion matrices for neural networks models, years: 2017–2018.
Table 8. Confusion matrices for neural networks models, years: 2017–2018.
Stepwise Logistic Regression Lasso Logistic Regression
2017 two years prior to financial distress
01 01
023 (76.67%) a7 (23.33%) b022 (73.33%)8 (26.67%)
14 (13.33%) c26 (86.67%) d12 (6.67%)28 (93.33%)
Overall accuracy81.67% Overall accuracy83.33%
2018 one year prior to financial distress
01 01
026 (86.67%)4 (13.33%)026 (86.67%)4 (13.33%)
13 (10.00%)27 (90.00%)14 (13.33%)26 (86.67%)
Overall accuracy88.33% Overall accuracy86.67%
Notes: a indicates the specificity; b indicates the type II error; c indicates the type I error; d indicates the sensitivity. The rate of the metrics are shown in parentheses. 0 and 1 indicate healthy SMEs and financially distressed SMEs, respectively.
Table 9. Model performance metrics for stepwise selection technique.
Table 9. Model performance metrics for stepwise selection technique.
Stepwise Selection
LRSt 2017LRSt 2018NNSt 2017NNSt 2018
Accuracy93.33%95.00%81.67%88.33%
Sensitivity93.33%96.67%86.67%90.00%
Specificity93.33%93.33%76.67%86.67%
Precision93.33%93.50%78.80%87.10%
F1-score93.33%95.10%82.50%88.50%
Type I error6.67%3.33%13.33%10.00%
Type II error6.67%6.67%23.33%13.33%
AUC0.9360.9590.8330.880
Notes: LRSt: Logistic Regression after Stepwise selection; NNSt: Neural Networks after Stepwise selection.
Table 10. Model performance metrics for lasso selection technique.
Table 10. Model performance metrics for lasso selection technique.
Lasso Selection
LRL 2017LRL 2018NNL 2017NNL 2018
Accuracy80.00%86.67%83.33%86.67%
Sensitivity83.33%86.67%93.33%86.67%
Specificity76.67%86.67%73.33%86.67%
Precision78.10%86.67%77.80%86.67%
F1-score80.60%86.67%84.80%86.67%
Type I error16.67%13.33%6.67%13.33%
Type II error23.33%13.33%26.67%13.33%
AUC0.8480.8490.9440.833
LRL: Logistic Regression after Lasso selection; NNL: Neural Networks after Lasso selection.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zizi, Y.; Jamali-Alaoui, A.; El Goumi, B.; Oudgou, M.; El Moudden, A. An Optimal Model of Financial Distress Prediction: A Comparative Study between Neural Networks and Logistic Regression. Risks 2021, 9, 200. https://doi.org/10.3390/risks9110200

AMA Style

Zizi Y, Jamali-Alaoui A, El Goumi B, Oudgou M, El Moudden A. An Optimal Model of Financial Distress Prediction: A Comparative Study between Neural Networks and Logistic Regression. Risks. 2021; 9(11):200. https://doi.org/10.3390/risks9110200

Chicago/Turabian Style

Zizi, Youssef, Amine Jamali-Alaoui, Badreddine El Goumi, Mohamed Oudgou, and Abdeslam El Moudden. 2021. "An Optimal Model of Financial Distress Prediction: A Comparative Study between Neural Networks and Logistic Regression" Risks 9, no. 11: 200. https://doi.org/10.3390/risks9110200

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop