Next Article in Journal
An Advanced Decision Making Framework via Joint Utilization of Context-Dependent Data Envelopment Analysis and Sentimental Messages
Next Article in Special Issue
Statistical Riemann and Lebesgue Integrable Sequence of Functions with Korovkin-Type Approximation Theorems
Previous Article in Journal
Source Identification of a Chemical Incident in an Urban Area
Previous Article in Special Issue
Water Particles Monitoring in the Atacama Desert: SPC Approach Based on Proportional Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Method for Visualizing Posterior Probit Model Uncertainty in the Early Prediction of Fraud for Sustainability Development

1
Department of Industrial Management, National Taiwan University of Science and Technology, 43 Sec. 4 Keelung Road, Daan District, Taipei 106335, Taiwan
2
Institute of Industrial Management, National Central University, 300 Zhongda Road, Zhongli District, Taoyuan City 32001, Taiwan
*
Author to whom correspondence should be addressed.
Axioms 2021, 10(3), 178; https://doi.org/10.3390/axioms10030178
Submission received: 20 June 2021 / Revised: 30 July 2021 / Accepted: 1 August 2021 / Published: 4 August 2021

Abstract

:
Corporate fraud is not only curtailed investors’ rights and privileges but also disrupts the overall market economy. For this reason, the formulation of a model that could help detect any unusual market fluctuations would be essential for investors. Thus, we propose an early warning system for predicting fraud associated with financial statements based on the Bayesian probit model while examining historical data from 1999 to 2017 with 327 businesses in Taiwan to create a visual method to aid in decision making. In this study, we utilize a parametric estimation via the Markov Chain Monte Carlo (MCMC). The result show that it can reduce over or under-confidence within the decision-making process when standard logistic regression is utilized. In addition, the Bayesian probit model in this study is found to offer more accurate calculations and not only represent the prediction value of the responses but also possible ranges of these responses via a simple plot.

1. Introduction

In the last few decades, many senior managers have been caught using phony financial statements to cheat stakeholders or manipulate stock prices in an attempt to funnel profits. As such, corporate fraud has long been a serious problem, particularly when it involves financial statements. Ironically, the information contained in these documents has remained as one of the key indicators that fraud has taken place [1,2]. Fraudulent activities have not only directly resulted in significant losses for stakeholders and severe punishments for the accounting institutions involved, but they have also significantly altered trading practices in the financial market. According to the Association of Certified Fraud Examiners (ACFE) in 2020, a total of 2504 cases from all over the world with an average loss of 5% revenue was due to corporate fraud which is equivalent to the loss of Gross World Product (GWP) about USD 3.6 trillion [3]. Although it is possible to detect corporate fraud, the ACEF still holds that it is indeed ubiquitous, and that no organization can be completely immune from this threat. The complex causes of fraud are explained in the agency problem theory, earning management, fraud triangle, and the GONE theory [4]. According to the fiduciary norm, managers must act solely in the interests of the principal, neglecting all others [5]. If the principal and agent are at odds, the latter will tend to focus on his or her interests which has attracted much attention over the years. Moreover, Song, et al. [6] have voiced concern over the privatization of many state-owned businesses in China, which may be problematic because, previously, the interests of state-owned companies have aligned with those of the nation. However, with privatization comes market-oriented goals, meaning effective performance and profit become the primary objectives for the corporation.
Although earnings management in and of itself is a legitimate practice, corporate managers often manipulate it for their benefit. For example, they interfere with financial reporting when preparing statements to purposely mislead readers [7] and the public about the performance of specific enterprises [8], which may cause investors to make misinformed decisions. In this way, unethical earnings management may lead to fraudulent financial reporting. The fraud triangle theory was firstly proposed by Cressey [9], which effectively explains various aspects of this crime and has become the foundation of SAS No. 99. Many scholars have applied this theory in a variety of ways [10]. For instance, Brennan and McGrath [11] argue that the most common form of this crime is the creation of fraudulent transaction records to meet expected profit levels. According to Skousen et al. [12], the rapid growth of company assets, the need for cash, and an increase of external capital are frequent indicators that fraud is taking place. In a study of 64 British firms accused of fraud, Hollow [13] discovered that financial pressure plays a critical role in this crime. Bologna, Lindquist, and Wells [4] proposed the GONE theory, which posits that greed, opportunity, need, and exposure is closely associated with fraud. From this perspective, it becomes obvious that greed and need are personal factors while opportunity and exposure are environmental or systemic. Although all four features must be present for fraud to occur, they do not need to exist simultaneously.
Altman [14] proposed the following five financial ratios to effectively determine such difficulties: (1) the ratios of current assets/total assets, (2) retained earnings/total assets, (3) earnings before interest and taxes (EBIT)/total assets, (4) equity value/total liabilities, and (5) total sale value/total assets. This model has also been used to predict the bankruptcy risk of various companies. The lower the company scores, the higher the possibility of bankruptcy. Likewise, if a company is in grave danger, the manager will be under extreme financial pressure. According to Cressy’s fraud triangle theory, the stress of this kind tends to promote corporate fraud. Also, Persons [1] found that financial leverage, asset turnover rate, asset portfolio, and the size of the company are closely associated with fraudulent financial reporting. Thus, managers of smaller companies with high levels of financial leverage and low asset turnover rates will be most likely to commit financial reporting crimes. The findings from many empirical studies indicate that the type of corporate governance greatly impacts the behavior of managers and the company’s overall performance [15,16,17]. Xie, et al. [18] hold that large boards will be more likely to be comprised of experts with a variety of backgrounds and areas of specialization, who will be able to contribute to the effective supervision of managers and, thus, mitigate the agency problem. According to Beasley [15], if the board has a high percentage of external or independent directors who have extended terms of office, and the company has a significant number of external shareholders, the possibility of fraud will be greatly reduced.
The logistic regression model has long been studied for academic studies of fraud, and it remains the prevailing technique for studying this devastating crime. However, many scholars have attempted to include various perspectives to improve its flexibility. Specifically, they included related variables such as conventional financial indicators [1], audit quality [19,20], corporate governance [15,16,17], and the principle of stability [21,22] in the fitted models. Lin [23] integrated the principles of conventional financial indicators, corporate governance, and stability into the fitted model and showed better performance than the considered conventional financial indicators and the corporate governance factor within the model. Ensemble modeling techniques have become increasingly popular to enhance classification accuracy [24,25,26]. Recently, Tseng et al. [27] employed these methods to investigate the impact of bias, multicollinearity, and erroneous input patterns on model analysis. In other words, if parameter uncertainty in the fitted model is not considered, this oversight might easily lead to erroneous inferences and flawed estimates of quality. The purpose of this study is to take this critical element into account for generating multiple models from the posterior distribution through Bayesian probit modeling via Markov Chain Monte Carlo (MCMC). We implemented the MCMC method for developing relatively realistic predictive models largely from the posterior distribution even in the absence of closed-form parameters. Also, we show the visual distribution of the prediction values for better understanding the results under comparison of two models. In addition, we construct 13 indicators of corporate governance and provide a financial overview to improve fraud detection. This study is motivated by the following questions: (1) How to visualize the effects of uncertainty bias for better decisions? How to handle the overestimation or underestimation of statistical models? (3) Which method can enhance the predictive power and ameliorate the effects of model uncertainty? To answer these questions, we aim to reduce the bias of parametric estimation based on the Bayesian probit model and compare it with the standard logistic model through visualization. This study is organized as follows: the review of related studies, analysis of the causes of fraud, and a description of the predictive model are shown in Section 2. In Section 3, we present the structure of the model and define the variables. Section 4 provides an in-depth discussion of the data gathering process, parametric estimation, and analysis of the results while Section 5 includes the conclusion and recommendations for future research.

2. Literature Review

2.1. Related Studies of Fraud Detection

Fraud detection has been studied for a long time, with many techniques and models such as logistic models, decision trees, artificial neural networks [28,29,30], support vector machines [31], and random forests [32] or data engineering methods [33] which have proven to be quite precise. The most famous model is the Z-score, which is commonly used even nowadays for predicting financial distress and fraud [34]. Summers and Sweeney [35] used the logit model to study 51 companies that were under investigation by The Wall Street Journal for financial statement fraud from 1980 to 1987. The researchers matched the samples from the same number of no-fault companies following the standard industry classification code (SIC code). They found that company insiders who commit fraud tend to sell significant numbers of shares in order to reduce the quantity available for others to buy, which obviously also reduces the percentage of shares held by the company. Imhoff [36] suggests that substantive change is necessary to improve corporate governance. Problems with accounting or auditing procedures will not be solved until boards are given sufficient information to operate independently and are allowed to act on behalf of the shareholders.
In practice, fraudulent financial reporting is associated with managers who can easily override or change the internal control procedures while appearing to be loyal to the company [15]. Under these circumstances, managers can easily manipulate earnings and present falsified financial reports. Desai [37] suggests that many corporate scandals have been caused by the exaggeration of profits. Managers tend to report gross profits in the capital market and taxable profits to governmental agencies to avoid paying taxes, which leads to the creation of fraudulent financial statements. Davidson, et al. [38] studied the effect of corporate governance on earnings management by analyzing 434 companies listed in the exchange. They discovered that most non-executive directors on the board and audit committees would be less likely to manipulate earnings if the board is independent. Perols and Lougee [39] argue that managers engaged in acts of fraud begin to manipulate earnings a few years before the crime is detected. The level of adjustment may even exceed that of predicted growth, or they may exaggerate their revenues to commit financial statement fraud.
Many researchers also suggest that the quality of audits can be guaranteed [19], and fraud will much less likely [40] if financial statements are audited by large accounting firms. Although this theory is not directly observable, Hribar, Kravet, and Wilson [20] who used accounting fees as a surrogate variable, found that the size of the fees may reflect the level of reliability of the statements. Kamarudin, Ismail, and Mustapha [22] have a different perspective on this controversial issue. After analyzing data from 184 companies from 2003 to 2010, they found that most that were guilty of fraud tended to practice “aggressive accounting” including claiming revenue prematurely or over-optimism and the timely identification of loss. Although these practices are not against the law, they are considered negligent because their presence indicates that financial statements must be compiled a second time which calls into question the reliability and quality of financial reporting. In recent years, data mining and machine learning techniques have shown many advantages to traditional statistical tools in fraud detection, but we are still trying to explain this “black box” to reduce the bias of models [26]. Perols [28] found that logistic regression outperforms neural networks and decision trees. Furthermore, the Bayesian Belief Network model outperforms decision trees and neural network models for identifying fraudulent financial statements, and it also can utilize ten-fold stratified cross-validation [30]. Also, many scholars and practitioners prefer the Bayesian methods rather than machine learning or deep learning models due to the limit of data and lack of interpretability [41]. After analyzing the development of fraud detection models, it becomes clear that the accuracy of models depends heavily on gauging financial indicators.

2.2. Comparing the Bayesian Probit Model to the Standard Logistic Model

Over the years, many scholars have performed logistic analysis using the Bayesian model [42,43,44,45] to correct parametric estimation errors and establish a more realistic model. This method has been extensively applied to various domains of research. Gerlach, et al. [46] applied the Bayesian probit model to 63 items within financial statements and used stepwise regression to select appropriate variables to create a logistic model specifically for forecasting changes in corporate earnings. Lately, the Bayesian probit model, which is widely used in the domain of statistics, has attracted much attention in the field of social science [47]. In a similar vein, Rossi, et al. [48] adopted this model to analyze many marketing problems and help managers make more informed decisions. The difference between the Bayesian probit model and the standard logistic model is that the estimation of parameters under the latter is based on the Maximum Likelihood Estimation (MLE). This iterative method of calculation is necessary for determining non-linear solutions, which causes the expression of parameters to be in closed-form. After calculating the coefficient, the chi-square can be used to test its significance. Another common method is the Wald test, which conforms to the standard normal distribution with a null hypothesis [49].
Although some researchers argue that the most effective sample size for the standard logistic model is only ten or more [50], the process of mathematical inference requires a larger sample size that is substantial enough to effectively approximate the chi-square or normal distribution. However, the prior assertion cannot be ignored. Researchers always use a sample size of less than 100 for corporate fraud studies due to the prolonged time it takes to reach verdicts in such cases. For this reason, there are not enough types of samples to conduct a valid study. These limited sample sizes remain one of the inherent shortcomings of the standard logistic model. In addition, that model operates through the paring of samples. The common ratio of pairing companies that have been accused of fraudulent activities with no-fault companies is 1:1 or 1:2. In reality, it would be difficult to find two companies of similar size in the same industry. For example, in an oligarchic market, the size of companies varies significantly. At this point, because it is so difficult to find companies in good standing to use for analysis, the results of this study would be somewhat biased. This is yet another shortcoming of the logistic model. Although it is unnecessary to assume that the independent variables are from the normal distribution, after model fitting and computing the confidence interval between the independent and the dependent variable, the standard normal distribution method of the Wald test is required. Therefore, this model may not be stable enough to detect fraud, which is a third shortcoming of the standard logistic model. Whether or not the results from this model can effectively map the relationship between the variables, is another issue to be explored in the future.
Due to these shortcomings, we adopted the Bayesian probit model in conjunction with the MCMC for this study to overcome the aforementioned constraints [51]. After utilizing simulation to redistribute the parameters, we compared the posterior probability to the prior probability via the Bayesian probit model to create a realistic scenario. This model is also more effective and stable than others for determining early signs of fraud. In summary, this model can help to effectively eliminate the bias of parametric estimation. However, it has not been popularly applied by researchers in particular of financial statement fraud. Thus, the objective of this study is an attempt to use the Bayesian probit model to more effectively analyze financial statement fraud and compare it to the results of the standard logistic model to provide a more accurate reference and decision-making guide.

3. Methods

3.1. Notations

In the Bayesian probit model, we noted y that represents corporate fraud as 1, while all others were noted as 0. Therefore, the equation for determining the probability of fraud is F ( x i ; β ) = P ( y i = 1 | x i ; β ) , and non-fraud is P ( y i = 0 | x i ; β ) = 1 F ( x i ; β ) . As such, the logistic function g ( x i ) is also referred to as an odds ratio, as expressed in the equation below:
g ( x i ) = ln F ( x i | β ) 1 F ( x i | β ) = β 0 + j = 1 p β j x ij + ε i
where i = 1 ,   2 , ,   n . that represents the sample size of the model; j = 1 ,   2 , ,   p . symbolizes the individual variables; F ( x i ; β ) is the probability of fraud while ε i represents the residual effects.
For the logistic function, parameter β was calculated via MLE, and the i t h term likelihood function was determined as l i ( β ) = F ( x i | β ) y i [ ( 1 F ( x i | β ) ] 1 y i , which could be expanded into Equation (2). I assumed that each variable was independent, and that the likelihood function of the model would be the product of all items, as shown in Equation (3). According to the Bayesian inference, the posterior probability would be directly proportional to the product of the likelihood function and prior probability, which is shown in Equation (4).
l i ( β ) = ( e β 0 + β 1 X i   1 + β 2 X i   2 + + β p X i   p 1 + e β 0 + β 1 X i   1 + β 2 X i   2 + + β p X i   p ) y i ( 1 e β 0 + β 1 X i   1 + β 2 X i   2 + + β p X i   p 1 + e β 0 + β 1 X i   1 + β 2 X i   2 + + β p X i   p ) ( 1 y i )
l ( β ) = i = 1 n [ ( e β 0 + β 1 X i   1 + β 2 X i   2 + + β p X i   p 1 + e β 0 + β 1 X i   1 + β 2 X i   2 + + β p X i   p ) y i ( 1 e β 0 + β 1 X i   1 + β 2 X i   2 + + β p X i   p 1 + e β 0 + β 1 X i   1 + β 2 X i   2 + + β p X i   p ) ( 1 y i ) ]
P ( β | Y , X ) = P ( Y , X | β ) P ( β ) P ( Y , X ) L i k e l i h o o d × p r i o r
Furthermore, we summarize the sequence of the proposed method as a flowchart in Figure 1.

3.2. MCMC Parameter Estimation

There has recently been a resurgence in the use of Bayesian regression methods, in part due to the popularity of the MCMC approach [48]. In this study, our model was derived from a combination of the Markov Chain and the Monte Carlo methodologies. Based on random sampling from the Markov Chain, the Monte Carlo method is used to estimate the integration of problems that have no analytical solutions or to analyze difficult and complicated probability distributions.
When employing the Markov Chain, we assumed that if β 0 ,   β 1 ,   β 2 , are a series of random variables, then β t + 1 would be generated from the conditional probability of P ( β t + 1 | β t ) , and its value would only depend on β t and would not be related to { β 0 ,   β 1 ,   β 2 ,   ,   β t ,   β t 1 } . When time t increases, the distribution would become stationary and independent from t and β 0 . However, if the probability could not fit into a standard distribution, we would need to apply the Monte Carlo method to obtain an accurate estimation. For instance, if β is the random variable of the model parameter, and we assume that it conforms to the posterior probability distribution π ( β ) , then f ( β ) would be the expected value of the probability distribution, as shown in Equation (5). Sometimes, if it is too difficult or even impossible to calculate the integration using Equation (5), we employ the Monte Carlo integration, which is based on random sampling from π ( β ) for selecting { β 1 ,   β 2 ,   ,   β m } and can be used to accurately estimate the mean value of the samples to approximate the expected value of the probability distribution f ( β ) . The process is shown in Equations (5) and (6):
E [ f ( β ) ] = f ( β ) π ( β ) d β
E [ f ( β ) ] 1 m t = 1 m f ( β t )
where β t represents the t t h sampling result when t 0 .
It becomes clear that if the initial value were different, the average estimation result would also change. Thus, if we could establish that ϕ ( ) = π ( ) , we could ignore the burn-in sample of the previous r t h test, utilize the sampling result with interval k , and solve the above problem via Equation (7).
E [ f ( β ) ] = lim m 1 m r t = r + 1 m f ( β t )
In this study, we applied the Gibbs sampling method (an MCMC algorithm), a special type of the Metropolis-Hastings algorithm proposed by [52] to obtain the following observations. According to this method, we determined the result of the i t h sampling of β = ( β 0 ,   β 1 ,   ,   β p ) from the mth sampling as β i = ( β 0 i ,   β 1 i ,   ,   β p i ) by following the three steps shown below.
Step 1: We found the initial value of β 0 = ( β 0 0 ,   β 1 0 ,   ,   β p 0 ) of a given parameter and set the sampling frequency to m.
Step 2: We conducted an ith + 1 sampling to determine the value of β i + 1 = ( β 0 i + 1 ,   β 1 i + 1 ,   ,   β p i + 1 ) and updated the value for each instance, as shown in Equation (8).
β 0 i + 1 ~ ϕ 0 ( β 0 | β 1 i ,   β 2 i ,   ,   β p i ,   Y ,   X ) β 1 i + 1 ~ ϕ 1 ( β 1 | β 0 i + 1 ,   β 2 i ,   ,   β p i ,   Y ,   X ) β p 1 i + 1 ~ ϕ p 1 ( β p 1 | β 0 i + 1 ,   β 1 i + 1 ,   ,   β p i ,   Y ,   X ) β p i + 1 ~ ϕ p ( β p | β 0 i + 1 ,   β 1 i + 1 ,   ,   β p 1 i + 1 ,   Y ,   X )
Step 3: We used the parametric values from the sampling to repeat step 2 until we reached the end of the m t h sample.
After estimating via the Gibbs sampling, in order to verify that the Markov Chain reached stationarity, we used the Autocorrelation Function (ACF) to monitor the convergence of the chain [48,53]. Then, we selected the number series { β m : m = 0 ,   1 ,   2 , } from the m value of the Markov Chain. When m approximated infinity, β m changed to β . At this point, β was the random variable from the joint probability distribution, f ( β ) and we accomplished our estimation goal.

3.3. Creating the Fraud Detection Model

During the data-gathering phase, n represents the total number of companies and X i signifies all the predictive variables of the ith company. These could include continuous or dispersed variables, such as financial indicators, corporate governance variables, principles of stability, and the size of the company, which will be explained in detail in Section 3.4. In the model, if y i = 1 , this would indicate that an act of fraud had taken place at i t h company. If y i = 0 , this would suggest that employees at i t h company were innocent of this crime. In this study, my analysis was based on the binary probit model language of the R statistical software for sampling and estimation, as shown in Equation (9).
{ y i = 1 i f z t 0 y i = 0 i f z t < 0 z i = X i β t + ε i , ε i ~ N ( 0 , 1 )   i = 1 ,   2 ,   ,   n ,     t = 0 ,   1 ,   2 ,   ,   p
where Y i = ( y 1 ,   y 2 ,   ,   y n ) is a vector of n × 1 which is used to determine if employees at the i t h company which is engaged in fraud. Z i = ( z 1 ,   z 2 ,   ,   z n ) is also a vector of n × 1 and the aggregate of the continuous potential variables that correspond to Y i . As such, the model structure that corresponds to the i t h company is shown in Equation (10).
X i , t = [ 1 x 1 , 1 x 1 , 2 x 1 , p 1 x 2 , 1 x 2 , 2 x 2 , p 1 x n , 1 x n , 2 x n , p ] n × ( p + 1 ) ,   β t = [ β 0 β 1 β p ] ( p + 1 ) × 1 ,   ε i = [ ε 1 ε 2 ε n ] n × 1
In this model, the cutoff point of value in the judgment of { Y i } differs from that in the logistic model. Thus, before we could begin any analysis, we converted the scope covered by { X i } to a range within the closed-form of [−1, 1] [48], as shown in Equation (11).
{ X i } = o r i g i n a l { X i } M a x ( o r i g i n a l { X i } ) + M i n ( o r i g i n a l { X i } ) 2 M a x ( o r i g i n a l { X i } ) M i n ( o r i g i n a l { X i } ) 2
In the fraudulent financial statement prediction model proposed in this study, the only observed values were { X i } and { Y i } . The estimation parameters were the aggregate of β in the multiple of p + 1 denoted as { β t } = ( β 0 ,   β 1 ,   ,   β p ) while the posterior probability { β t } featured the closed-form parameters. As such, I used the Gibbs sampling of the posterior probability distribution to estimate the joint probability distribution of f ( β ) of { β t } .

3.4. Description of Variables

In this study, the fitted model has constructed 14 variables that are similar to Lin [23]. The operational definitions are discussed below:
  • Dependent Variables:
We used binary classification to categorize the variables in this equation. The fraudulent company was noted as 1 and the no-fault company was 0.
  • Independent Variables:
In this study, there were 13 independent variables from the following categories: the “five financial ratios,” proposed by Bernstein [54], included profitability, liquidity, growth, utility, and financial structure (Table 1), corporate governance variables (Table 2), and conservative accounting variables.
We adopted Givoly and Hayn [21] hypothesis of stable variables, which states that the greater the Conservative Accounting (CONACC) value, the more conservative the accounting policy of the company.
C O N A C C ( β 12 ) = 1 3 t = 2 0 ( Earnings   before   extraodinary   items + depreciation cash   flow   from   the   operation ) Total   assets   at   the   beginning   of   study   timeframe
  • Control Variables
Size of the company β13 = ln (Asset Size).

4. Bayesian Modeling

4.1. Sample Data

In this section, we applied the data organization as Lin [23] for adapting the framework and utilizing the MCMC method to thoroughly analyze. The income is chosen before extraordinary gain (loss). However, since enterprises in Taiwan have already adopted the IFRS accounting standards, income (loss) for continuing is more appropriate than before.
TA(β12) = [income (loss) for continuing + depreciation − cash flow from operations]/average total assets:
C O N A C C = 1 3 t = 2 0 T A
We analyzed companies that had been convicted of fraud in a court of law for crimes such as insider trading, stock price manipulation, and fraudulent financial statements between 1999 and 2017. The reason we used the dataset until 2017 was because most of the recent investigations could not be completed yet. Of the 327 companies investigated, 109 were found guilty. The 1:2 ratio method was used to match them with 218 companies that had not engaged in fraud (see Table 3).
Moreover, 109 companies that had engaged in fraud spanned a total of 35 different industries. Although the crimes covered a wide range of industry categories, they did not all include special financial statement layout items such as the financial industry, securities, or insurance industries and were very similar in this way. The selection criteria used for pairing companies were based on the industry to which the fraudulent company belonged, and the fact that the asset gaps did not exceed 40% during the same year. The goal is to match two innocent companies with one guilty company of fraud. Corporate information data published by the Taiwan Economic Journal (TEJ) was used in the study. We collected all the data from the year the fraudulent activities took place (T), 1 year prior to the fraudulent activities (T-1), 2 years prior (T-2), and 3 years prior (T-3). Data from 327 enterprises and a total of 981 data items were used to establish the analysis model. The fraud distribution by industry is shown in Table 4. According to Table 4, a large portion of the fraud detection is from the semiconductor industry with 10.1%, while motherboards stay behind with 7.3%, compared to 35 different industries. In addition, most of the frauds were detected from the 2005–2009 period compared to other periods. Besides, around 30% of industries were detected as fraud with only one company from 1999 to 2017 such as glass ceramics, communication equipment or foods, and animal feed.

4.2. Prior Distributions

The corresponding probability distributions prior to estimation were assigned to all unknown parameters in the model, including the 14 constant terms. The β prior probability β ¯ in this study was set as the average and the A 1 normal distribution of the variances, which were calculated using β ¯ Equation (13) by A 1 with v 0 = 3 [48].
β ¯ = [ 0 0 M 0 ] 14 × 1 ,   A = v 0 S X = v 0 [ s 1 2 0 Λ 0 0 s 2 2 Λ 0 M M O M 0 0 Λ s 14 2 ] 14 × 14
where S X = d i a g ( s 1 2 ,   s 2 2 ,   K ,   s 14 2 )   and   s j 2 = i ( x ij x ¯ j ) 2 n 1 .

4.3. Sampling and Modeling

The parameter of the Bayesian probit model used in the study was estimated according to the MCMC procedure described in the previous chapter. The number of Gibbs samplings was set to 1 million (R = 1 million), the sampling interval was 10 (keep = 10), and a total of 100 thousand iterations were obtained. Next, the first 20 thousand sampling results were discarded (burn-in = 20 thousand) and the remaining 80 thousand were determined as the joint probability distribution of the parameters, which were used to calculate the detection capacity and range of the fraud warning model.
K-fold cross-validation was used in this study to establish and analyze the model. The 327 companies were divided into 10 groups according to the three different years using a ratio of 1:2 between fraudulent and non-fraudulent companies. The first nine groups were made up of 33 companies, and the last group contained only 10. I used one as a test group, and the remaining nine were used as training sets. The testing was carried out 10 times, and a different group was chosen to be the test set each time to most efficiently calculate the predictive ability of the model. Besides the first-order term, an interaction term (full second-order) is also added that could represent the analysis results by a particular degree according to Allen and Tseng [55].

4.4. Prediction Results from the Standard Logistic and Bayesian Probit Models

The results of the first-order model are shown in Figure 2, Figure 3 and Figure 4. Each graph on the box-and-whisker plot was drawn according to the prediction results and was estimated from 80,000 iterations using MCMC. The red dot represents the prediction result of the general logistic model. According to the Cross-Validation result in Figure 2, only Set 4 and Set 8 are stable by using the general logistic model, while others are uncertain in the T-1 period. In the T-2 period, most of the logistic model predictions are stable more than in the T-1 period but the uncertainty seems to increase during the T-3 period. Overall, the figures show that the single result of the logistic model fell within the 80,000 iterations that were estimated using MCMC, which indicates that the logistic model results were quite unstable. However, the MCMC was able to estimate the overall distribution and provided more abundant information.

4.5. Comparison of the One-Time Model and the Interaction Term Model

Moreover, Figure 5, Figure 6 and Figure 7 illustrate the results of the first order and the interaction term models. Each graph on the box-and-whisker plot was also drawn based on the prediction results from 80,000 iterations that were estimated using the MCMC. The red dot represents the prediction result of the general logistic model. The figures also indicate that this model’s results were quite unstable and often produced over- or under-estimations. Furthermore, the predictive accuracy of the interaction term model was generally higher than that of the one-time model.
The results of the comparison are shown in Table 5. The T-test confirmed that there was a significant difference between the two, and the T-2 and T-3 phases were shown to have a higher accuracy rate based on the overall average, as seen at the end of Table 5.
Comparisons of the predictive results from the traditional logistic and MCMC models regarding the 109 fraudulent companies are shown in Table 6. A logistic prediction of “1” indicates that fraud had occurred while “0” indicates no fraud. Using the MCMC method, there were 80,000 iterations for each sample, and the ratios in the fields represent the ratios of the 80,000 iterations predicted to be a fraud. According to Table 6, the MCMC provided clearly more information than the standard logistic model. For example, the 7th, 58th, 139th, 169th, and 322nd of the logistic model during the T-1 period was predicted to be normal; however, the MCMC’s predictions revealed fraud with over 76%, as highlighted in grey. Furthermore, the difference between the MCMC and the logistic model also occurs in the T-2 period in the 64th, 238th, 250th, and 256th samples. During the T-3 period, eight samples are predicted as normal, but the MCMC indicates it as fraud—such as the 202nd sample with 82.9%, or the 322nd sample with 88%.

4.6. Model Error Analysis

Concerning the limitations of the models, the percentage of errors in the prediction results can be divided into false negatives and false positives. The error analysis results within the interaction models are shown in Figure 8, Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13. Besides, the box plots are shown in black, correspond to the sets of errors from the 80,000 iterations via the MCMC method and the red solid dots represent errors from the logistic model. In this study, I defined a false positive error as when a company was falsely accused of fraud. The false negatives occurred when a guilty company was judged to be innocent of fraud.
Furthermore, Figure 8, Figure 9 and Figure 10 show the false positive errors in T-1 to T-3 periods. More than half of the 30 results (26 groups) using the logistic model deviated from the overall distribution of the 80,000 iterations. Figure 11, Figure 12 and Figure 13 indicated false negatives in T-1, T-2, and T-3 periods, and 28 groups of these results from the logistic model deviated from the overall distribution.
The figures above clearly show that the results of the standard logistic prediction model also indicate an unstable state (i.e., overestimation or underestimation) regarding error analysis, meaning that if only the logistic model’s error results were analyzed, it would most likely result in a miscalculation of the error rate. Both the standard logistic and the Bayesian probit model have their strengths and weaknesses. For example, the former may be too simplistic to handle such complicated data. Despite the complex nature of the Bayesian probit model, it could be used to correct the parametric estimation errors and reduce the problem of over- or under-confidence. As always, the best analysis method will depend on the problem that must be solved. Although the Bayesian probit model often yields clearer results, it is very complicated and expensive. Therefore, we recommend the integrated use of these two methods. The standard logistic model can be utilized for a preliminary analysis of the sample.
The Bayesian probit model will then be used for more precise calculations. Since over-fitting will interfere with the accuracy of the predictions yielded from the standard logistic model, it would not be as useful for real-world scenarios. However, as previously stated, the other model yields more accurate predictive results when the specific fitting of the correct model and data are used. These elements will be processed through the Bayesian probit model to take advantage of its more realistic predictive power, and also provide a visual component to help users better understand the distribution of prediction values. Above all, if the logistic model is used for prediction, a single result represents only one prediction point within the distribution space. However, if the MCMC model is used, multiple iterations may be used to offset the uncertainty of its parameters (dispersion of the predicted result). Thus, the MCMC model may be more appropriate for helping researchers understand the complexity of corporate fraud.

5. Conclusions and Recommendations

5.1. Theoretical Implications

In this study, we primarily employed the standard logistic model supplemented by Bayesian inference to counteract the uncertainty of model parameters. This study may be the first to use the boxplot to visualize the effects of model uncertainty and help users to make decisions based on the simulation results of model coefficients. Based on the proposed method, we also can eliminate the bias of parametric estimation for regular statistical models. In fact, the standard logistic model better revealed the analytical results while the Bayesian probit model with parameters via the MCMC showed a stable convergence. We also found that, unlike the standard logistic model, the distribution of unknown variables cannot be expressed in closed-form, and must be referred to as a simulated sample to accurately interpret the exact distribution value of the parameters.
Combining these two models to analyze the data yielded ideal predictive results. We found that the predictive power of the standard logistic model was stronger than that of the Bayesian probit model, which was more appropriate for approximating the maximum value. But the predictive power of the standard logistic model is unstable because the parametric estimation bias is inherent within the model. In this study, we used the MCMC model to calculate an unbiased estimation to enhance the predictive power and ameliorate the effects of model uncertainty.

5.2. Implications for Managers

For the investigation of fraud, the predictive results from the standard logistic model tended to be overly optimistic. However, the Bayesian probit model will significantly drive up the cost of analysis. Thus, although the full-range application of this model is ideal, it is not practical in the real world. For this reason, we suggest the integrated use of both models for the detection of fraud. In this way, the weakness of over-fitting would balance the unfitted model and data. After preliminary sorting of the data, the Bayesian probit model could be used for more precise calculations and would provide not only the prediction value of the responses but also possible ranges of these responses via a simple plot. This can help users to make informed decisions. In this way, the strengths of both models can be retained and utilized to their best advantage. This system would be much more accurate than applying the logistic model on its own to predict corporate fraud.
In this study, both models were run independently. The findings from both models using the same set of data unanimously indicated that the data from two years before the fraud occurred could most effectively predict this crime. As such, we can infer that indirect signs of fraud would begin to surface two years before it would become obvious. Therefore, issues related to corporate fraud, particularly fraudulent financial statements, not only require impeccable professional ethics and patience to correct the problem, but also a viable model that will allow for systematic analysis and reduce false accusations of fraud. Accordingly, companies that have been wrongly accused could be freed from unnecessary legal trouble, and these resources could be used more efficiently elsewhere. Most importantly, it could accurately detect companies that are engaged in acts of fraud. This would also help to protect the rights and privileges of the stakeholders and maintain stability within the market. Besides, the limitations of the proposed method still exist, such as the cost of analysis due to computationally expensive posterior distributions in the MCMC. In addition, the proposed model can be applied to the multinomial probit model in future studies. Further studies can be explored using other techniques to increase the efficiency of the MCMC algorithm, such as [56,57].

Author Contributions

Conceptualization, S.-H.T. and T.S.N.; formal analysis, S.-H.T. and T.S.N.; investigation, S.-H.T. and T.S.N.; methodology, S.-H.T. and T.S.N.; supervision, S.-H.T.; validation, S.-H.T. and T.S.N.; visualization, S.-H.T. and T.S.N.; writing—original draft, S.-H.T.; writing—review and editing, S.-H.T. and T.S.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Corporate information data. Available at Taiwan Economic Journal.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Persons, O.S. Using financial statement data to identify factors associated with fraudulent financial reporting. J. Appl. Bus. Res. 1995, 11, 38–46. [Google Scholar] [CrossRef]
  2. Kaminski, K.A.; Wetzel, T.S.; Guan, L. Can financial ratios detect fraudulent financial reporting? Manag. Audit. J. 2004, 19, 15–28. [Google Scholar] [CrossRef]
  3. ACFE. Report to the Nations on Occupational Fraud and Abuse. Available online: https://www.acfe.com/report-to-the-nations/2020/ (accessed on 20 June 2021).
  4. Bologna, J.; Lindquist, R.J.; Wells, J.T. The Accountant’s Handbook of Fraud and Commercial Crime; Wiley: New York, NY, USA, 1993. [Google Scholar]
  5. Mitnick, B.M. The theory of agency. Public Choice 1975, 24, 27–42. [Google Scholar] [CrossRef]
  6. Song, J.; Wang, R.; Cavusgil, S.T. State ownership and market orientation in China’s public firms: An agency theory perspective. Int. Bus. Rev. 2015, 24, 690–699. [Google Scholar] [CrossRef]
  7. Schipper, K. Earnings management. Account. Horiz. 1989, 3, 91. [Google Scholar]
  8. Healy, P.M.; Wahlen, J.M. A review of the earnings management literature and its implications for standard setting. Account. Horiz. 1999, 13, 365–383. [Google Scholar] [CrossRef]
  9. Cressey, D.R. Other People’s Money: A Study of the Social Psychology of Embezzlement; Free Press: Glencoe, IL, USA, 1953. [Google Scholar]
  10. Vousinas, G.L. Advancing theory of fraud: The SCORE model. J. Financ. Crime 2019, 26, 372–381. [Google Scholar] [CrossRef]
  11. Brennan, N.M.; McGrath, M. Financial statement fraud: Some lessons from US and European case studies. Aust. Account. Rev. 2007, 17, 49–61. [Google Scholar] [CrossRef]
  12. Skousen, C.J.; Smith, K.R.; Wright, C.J. Detecting and predicting financial statement fraud: The effectiveness of the fraud triangle and SAS No. 99. In Corporate Governance and Firm Performance; Hirschey, M., John, K., Makhija, A.K., Eds.; Emerald Group Publishing Limited: Bingley, UK, 2009; pp. 53–81. [Google Scholar]
  13. Hollow, M. Money, morals and motives: An exploratory study into why bank managers and employees commit fraud at work. J. Financ. Crime 2014, 21, 174–190. [Google Scholar] [CrossRef] [Green Version]
  14. Altman, E.I. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Financ. 1968, 23, 589–609. [Google Scholar] [CrossRef]
  15. Beasley, M.S. An empirical analysis of the relation between the board of director composition and financial statement fraud. Account. Rev. 1996, 71, 443–465. [Google Scholar]
  16. Persons, O.S. The relation between the new corporate governance rules and the likelihood of financial statement fraud. Rev. Account. Financ. 2005, 4, 125–148. [Google Scholar] [CrossRef]
  17. Tan, D.T.; Chapple, L.; Walsh, K.D. Corporate fraud culture: Re-examining the corporate governance and performance relation. Account. Financ. 2017, 57, 597–620. [Google Scholar] [CrossRef]
  18. Xie, B.; Davidson, W.N., III; DaDalt, P.J. Earnings management and corporate governance: The role of the board and the audit committee. J. Corp. Financ. 2003, 9, 295–316. [Google Scholar] [CrossRef]
  19. Francis, J.R. What do we know about audit quality? Br. Account. Rev. 2004, 36, 345–368. [Google Scholar] [CrossRef] [Green Version]
  20. Hribar, P.; Kravet, T.; Wilson, R. A new measure of accounting quality. Rev. Account. Stud. 2014, 19, 506–538. [Google Scholar] [CrossRef]
  21. Givoly, D.; Hayn, C. The changing time-series properties of earnings, cash flows and accruals: Has financial reporting become more conservative? J. Account. Econ. 2000, 29, 287–320. [Google Scholar] [CrossRef]
  22. Kamarudin, K.A.; Ismail, W.A.W.; Mustapha, W.A.H.W. Aggressive financial reporting and corporate fraud. Procedia Soc. Behav. Sci. 2012, 65, 638–643. [Google Scholar] [CrossRef] [Green Version]
  23. Lin, Y.-J. A Study of the Corporate Fraud Early Warning Models. Master’s Thesis, National Chung Hsing University, Taichung, Taiwan, 2014. [Google Scholar]
  24. Leith, C. Theoretical skill of Monte Carlo forecasts. Mon. Weather Rev. 1974, 102, 409–418. [Google Scholar] [CrossRef] [Green Version]
  25. Amerstorfer, T.; Hinterreiter, J.; Reiss, M.A.; Möstl, C.; Davies, J.A.; Bailey, R.L.; Weiss, A.J.; Dumbović, M.; Bauer, M.; Amerstorfer, U.V.; et al. Evaluation of CME Arrival Prediction Using Ensemble Modeling Based on Heliospheric Imaging Observations. Space Weather 2021, 19, e2020SW002553. [Google Scholar] [CrossRef]
  26. Buonaguidi, B.; Mira, A.; Bucheli, H.; Vitanis, V. Bayesian Quickest Detection of Credit Card Fraud. Bayesian Anal. 2021, 1, 1–30. [Google Scholar] [CrossRef]
  27. Tseng, S.-H.; Kang, H.-Y.; Chen, H.-Y. A Test-Bed to Compare Alternative Bayesian Regression Formulations And An Application Of Cnc Milling Roughness Minimization. J. Qual. 2018, 25, 241–257. [Google Scholar]
  28. Perols, J. Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Audit. J. Pract. Theory 2011, 30, 19–50. [Google Scholar] [CrossRef]
  29. Fanning, K.M.; Cogger, K.O. Neural network detection of management fraud using published financial data. Intell. Syst. Account. Financ. Manag. 1998, 7, 21–41. [Google Scholar] [CrossRef]
  30. Kirkos, E.; Spathis, C.; Manolopoulos, Y. Data mining techniques for the detection of fraudulent financial statements. Expert Syst. Appl. 2007, 32, 995–1003. [Google Scholar] [CrossRef]
  31. Dong, W.; Liao, S.; Zhang, Z. Leveraging financial social media data for corporate fraud detection. J. Manag. Inf. Syst. 2018, 35, 461–487. [Google Scholar] [CrossRef]
  32. Liu, C.; Chan, Y.; Kazmi, S.H.A.; Fu, H. Financial fraud detection model: Based on random forest. Int. J. Econ. Financ. 2015, 7. [Google Scholar] [CrossRef] [Green Version]
  33. Baesens, B.; Höppner, S.; Verdonck, T. Data engineering for fraud detection. Decis. Support Syst. 2021, 113492, in press. [Google Scholar]
  34. Altman, E.I.; Iwanicz-Drozdowska, M.; Laitinen, E.K.; Suvas, A. Financial distress prediction in an international context: A review and empirical analysis of Altman’s Z-score model. J. Int. Financ. Manag. Account. 2017, 28, 131–171. [Google Scholar] [CrossRef]
  35. Summers, S.L.; Sweeney, J.T. Fraudulently misstated financial statements and insider trading: An empirical analysis. Account. Rev. 1998, 73, 131–146. [Google Scholar]
  36. Imhoff, G. Accounting quality, auditing and corporate governance. Audit. Corp. Gov. 2003. [Google Scholar] [CrossRef] [Green Version]
  37. Desai, M.A. The degradation of reported corporate profits. J. Econ. Perspect. 2005, 19, 171–192. [Google Scholar] [CrossRef] [Green Version]
  38. Davidson, R.; Goodwin-Stewart, J.; Kent, P. Internal governance structures and earnings management. Account. Financ. 2005, 45, 241–267. [Google Scholar] [CrossRef]
  39. Perols, J.L.; Lougee, B.A. The relation between earnings management and financial statement fraud. Adv. Account. 2011, 27, 39–53. [Google Scholar] [CrossRef]
  40. Lennox, C.; Pittman, J.A. Big Five audits and accounting fraud. Contemp. Account. Res. 2010, 27, 209–247. [Google Scholar] [CrossRef]
  41. Nusinovici, S.; Tham, Y.C.; Yan, M.Y.C.; Ting, D.S.W.; Li, J.; Sabanayagam, C.; Wong, T.Y.; Cheng, C.-Y. Logistic regression was as good as machine learning for predicting major chronic diseases. J. Clin. Epidemiol. 2020, 122, 56–69. [Google Scholar] [CrossRef] [PubMed]
  42. O’brien, S.M.; Dunson, D.B. Bayesian multivariate logistic regression. Biometrics 2004, 60, 739–746. [Google Scholar] [CrossRef] [PubMed]
  43. Polson, N.G.; Scott, J.G.; Windle, J. Bayesian inference for logistic models using Pólya–Gamma latent variables. J. Am. Stat. Assoc. 2013, 108, 1339–1349. [Google Scholar] [CrossRef] [Green Version]
  44. Sanchez-Lengeling, B.; Roch, L.M.; Perea, J.D.; Langner, S.; Brabec, C.J.; Aspuru-Guzik, A. A Bayesian approach to predict solubility parameters. Adv. Theory Simul. 2019, 2, 1800069. [Google Scholar] [CrossRef]
  45. Ghosh, J.; Li, Y.; Mitra, R. On the use of Cauchy prior distributions for Bayesian logistic regression. Bayesian Anal. 2018, 13, 359–383. [Google Scholar] [CrossRef]
  46. Gerlach, R.; Bird, R.; Hall, A.D. A Bayesian Approach to Variable Selection in Logistic Regression with Application to Predicting Earnings Direction from Accounting Information; School of Finance and Economics, University of Technology Sydney: Sydney, Australia, 2000. [Google Scholar]
  47. Jackman, S. Bayesian Analysis for the Social Sciences; John Wiley & Sons: New York, NY, USA, 2009; Volume 846. [Google Scholar]
  48. Rossi, P.E.; Allenby, G.M.; McCulloch, R. Bayesian Statistics and Marketing; John Wiley & Sons: New York, NY, USA, 2012. [Google Scholar]
  49. Hosmer, D.W., Jr.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression; John Wiley & Sons: New York, NY, USA, 2013; Volume 398. [Google Scholar]
  50. Peduzzi, P.; Concato, J.; Kemper, E.; Holford, T.R.; Feinstein, A.R. A simulation study of the number of events per variable in logistic regression analysis. J. Clin. Epidemiol. 1996, 49, 1373–1379. [Google Scholar] [CrossRef]
  51. Gilks, W.R.; Richardson, S.; Spiegelhalter, D. Markov Chain Monte Carlo in Practice; Chapman and Hall/CRC: Boca Raton, FL, USA, 1995. [Google Scholar]
  52. Geman, S.; Geman, D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. In Readings in Computer Vision; Elsevier: Amsterdam, The Netherlands, 1987; pp. 564–584. [Google Scholar]
  53. Von Toussaint, U. Bayesian inference in physics. Rev. Mod. Phys. 2011, 83, 943. [Google Scholar] [CrossRef] [Green Version]
  54. Bernstein, L.A. Analysis of Financial Statements; Irwin Professional Publishing: Chicago, IL, USA, 1993. [Google Scholar]
  55. Allen, T.T.; Tseng, S.H. Variance plus bias optimal response surface designs with qualitative factors applied to stem choice modeling. Qual. Reliab. Eng. Int. 2011, 27, 1199–1210. [Google Scholar] [CrossRef]
  56. Joseph, V.R.; Wang, D.; Gu, L.; Lyu, S.; Tuo, R. Deterministic sampling of expensive posteriors using minimum energy designs. Technometrics 2019, 61, 297–308. [Google Scholar] [CrossRef] [Green Version]
  57. Fielding, M.; Nott, D.J.; Liong, S.-Y. Efficient MCMC schemes for computationally expensive posterior distributions. Technometrics 2011, 53, 16–28. [Google Scholar] [CrossRef]
Figure 1. The sequence of the proposed method.
Figure 1. The sequence of the proposed method.
Axioms 10 00178 g001
Figure 2. Cross-Validation of Correct Rate for T-1 period (First Order).
Figure 2. Cross-Validation of Correct Rate for T-1 period (First Order).
Axioms 10 00178 g002
Figure 3. Cross-Validation of Correct Rate for T-2 period (First Order).
Figure 3. Cross-Validation of Correct Rate for T-2 period (First Order).
Axioms 10 00178 g003
Figure 4. Cross-Validation of Correct Rate for T-3 period (First Order).
Figure 4. Cross-Validation of Correct Rate for T-3 period (First Order).
Axioms 10 00178 g004
Figure 5. Cross-Validation of Correct Rate for T-1 period (First + Interaction term).
Figure 5. Cross-Validation of Correct Rate for T-1 period (First + Interaction term).
Axioms 10 00178 g005
Figure 6. Cross-Validation of Correct Rate for T-2 period (First + Interaction term).
Figure 6. Cross-Validation of Correct Rate for T-2 period (First + Interaction term).
Axioms 10 00178 g006
Figure 7. Cross-Validation of Correct Rate for T-3 period (First + Interaction term).
Figure 7. Cross-Validation of Correct Rate for T-3 period (First + Interaction term).
Axioms 10 00178 g007
Figure 8. Cross-Validation of Type I Error for T-1 period.
Figure 8. Cross-Validation of Type I Error for T-1 period.
Axioms 10 00178 g008
Figure 9. Cross-Validation of False Positive Errors in the T-2 Period.
Figure 9. Cross-Validation of False Positive Errors in the T-2 Period.
Axioms 10 00178 g009
Figure 10. Cross-Validation of False Positive Errors in the T-3 Period.
Figure 10. Cross-Validation of False Positive Errors in the T-3 Period.
Axioms 10 00178 g010
Figure 11. Cross-Validation of the False Negative Errors in the T-1 Period.
Figure 11. Cross-Validation of the False Negative Errors in the T-1 Period.
Axioms 10 00178 g011
Figure 12. Cross-Validation of the False Negative Errors in the T-2 Period.
Figure 12. Cross-Validation of the False Negative Errors in the T-2 Period.
Axioms 10 00178 g012
Figure 13. Cross-Validation of the False Negative Errors in the T-3 Period.
Figure 13. Cross-Validation of the False Negative Errors in the T-3 Period.
Axioms 10 00178 g013
Table 1. Independent Variables from Bernstein and Wild Bernstein [54].
Table 1. Independent Variables from Bernstein and Wild Bernstein [54].
5 Financial RatiosEquationIndexIndex Equation
Profitability (β1)Revenue growth ratioRevenue growth ratio(Net income of T period–Net income of T-1 period)/(Net income of T-1 period)
Liquidity (β2)(Current ratio + Working capital ratio)/2Current ratioCurrent assets/Current liabilities
Working capital ratio(current assets—current liabilities)/Total assets
Growth (β3)(Ratio of return on assets + Net profit rate + Net operating profit ratio)/3Return on assets ratioIncome after taxes/Total assets
Net profit ratioIncome after taxes/Sales revenue
Net operating profit ratioNet operating income/Sales revenue
Utility (β4)(Accounts receivable to total assets ratio + Sales to total assets ratio)/2Accounts receivable to total assets ratioAccounts receivable/Total assets
Sales to total assets ratioSales revenue/Total assets
Structure (β5)Debt ratio + Net liabilities ratio)/2Debt ratioTotal liabilities/Total assets
Equity RatioTotal liabilities/Shareholders’ equity
Table 2. Independent Variables from Corporate Governance.
Table 2. Independent Variables from Corporate Governance.
Corporate Governance VariableEquation/Explanation
Number of board members (β6)Number of directors
Ratio of external directors (β7)The ratio of the number of external directors to total director’s seats
The chairman also holds the position of general manager (β8)Dummy variable, chairman who also holds the position of general manager is represented by 1. If not, it is represented by 0.
Percentage of shareholding by directors (β9)The quantity of shares held by the directors/Total outstanding shares at the end of the period.
Percentage of shareholding by institutional investors (β10)The ratio of institutional investors in the company.
Deviation between one’s voting rights and earnings (β11)Voting rights minus earnings distribution rights
Table 3. Description of fraud samples.
Table 3. Description of fraud samples.
Definition of FraudAccording to Statements on Auditing Standards (SAS) No. 43: One or more managers, those in governance, or employee level personnel have deliberately used deception to obtain improper or illegal gains.
Fraud Sample Screening MethodsAnnouncements by the Securities and Futures Investors Protection Center
Court Judgments
Fraud Sample Years1999–2017
Fraud Sample TypesType 1Stock Price Manipulation42
Type 2Falsifying Financial Statements32
Type 3Insider Trading35
Total 109
Table 4. Distribution of companies engaged in fraud by industry and year.
Table 4. Distribution of companies engaged in fraud by industry and year.
199920012002200320042005200620072008200920102011201220132014201520162017TotalPercentage
Hardware and Furniture 11 21.8%
Motherboards 122 12 87.3%
Semiconductors 32 11 1 1 2 1110.1%
Petrochemicals 1 1 1 32.8%
Optoelectronics 111 11 1176.4%
Garments 1 1 21.8%
Bicycles 1 1 21.8%
Automotive Components 1 1 132.8%
Textiles 1 111 43.7%
Basic Metals 1 11 1 43.7%
Metal Products 1 1 21.8%
Construction11 11 1165.5%
Glass Ceramics 1 10.9%
Ocean Freight 1 10.9%
Freight Warehousing 1 10.9%
Software Services 1 2 1 43.7%
Communication equipment 1 10.9%
Weaving 1 10.9%
Dairy 1 1 21.8%
Information Channels 1 1 1 2 54.6%
Electronics Equipment 1 1 1 143.7%
Electronic Components 3112 1 211 1211%
Electrical Wires 1 10.9%
Electrical Products 1 1 21.8%
Network Equipment 1 1 1 32.8%
Shoes and Suitcases 1 10.9%
Resin 11 21.8%
Machinery Industry 1111 43.7%
Medical Supplies 11 21.8%
Medical Pharmaceuticals 1 1 21.8%
Chemical Material Products 2 21.8%
Other Electronics 1 10.9%
Tourism and Dining 1 10.9%
Foods and Animal Feed 1 10.9%
Cement Products 1 10.9%
Total143229131310945479554109100%
Table 5. Model Comparison.
Table 5. Model Comparison.
Test Set SummaryFirst Order
Correct Rate (Mean)
Interaction Term
Correct Rate (Mean)
T-Test
Period
Set 1T-145.5%53.3%−146.1 ***
T-248.6%54.9%−121.4 ***
T-366.8%63.6%84.1 ***
Set 2T-151.4%52.1%−11.5 ***
T-250.2%68.4%−372.6 ***
T-360.7%64.8%−101.1 ***
Set 3T-154.8%64.1%−173.4 ***
T-247.7%51.5%−89.0 ***
T-359.8%58.0%55.3 ***
Set 4T-165.2%61.0%160.9 ***
T-255.7%56.5%−17.5 ***
T-343.2%49.2%−118.8 ***
Set 5T-140.1%46.1%−101.3 ***
T-234.6%62.4%−1068.5 ***
T-351.4%61.8%−252.0 ***
Set 6T-160.4%48.4%236.2 ***
T-266.6%55.0%235.8 ***
T-343.7%50.6%−113.5 ***
Set 7T-150.5%43.4%107.4 ***
T-248.5%55.2%−90.0 ***
T-366.3%33.1%928.0 ***
Set 8T-134.1%54.6%−361.2 ***
T-237.9%53.4%−246.4 ***
T-369.4%59.2%253.0 ***
Set 9T-154.2%55.8%−31.8 ***
T-236.6%52.8%−379.4 ***
T-359.8%61.1%−22.3 ***
Set 10T-142.2%60.6%−269.5 ***
T-238.4%33.2%103.9 ***
T-369.7%47.7%329.4 ***
TotalT-149.8%53.9%−191.4 ***
T-246.5%54.3%−369.4 ***
T-359.1%54.9%199.1 ***
*** p < 0.001.
Table 6. Fraud sample prediction comparison results.
Table 6. Fraud sample prediction comparison results.
Test SetSampleT-1T-2T-3
LogisticMCMCLogisticMCMCLogisticMCMC
Set 11178.6%153.7%018.0%
4142.1%14.9%05.8%
7080.2%015.9%054.4%
10029.8%199.2%17.0%
13049.5%138.1%18.6%
16030.4%160.1%114.7%
19176.5%140.2%111.7%
22055.7%143.1%123.4%
25038.5%173.5%121.9%
28024.3%171.5%132.8%
31196.7%162.9%132.5%
Set 234058.8%161.2%023.5%
37037.0%187.2%077.1%
40178.0%133.3%07.2%
43156.0%124.3%15.7%
46166.8%115.7%00.6%
49143.0%123.3%016.6%
52142.7%110.7%04.5%
55139.4%135.0%011.5%
58090.3%146.2%017.8%
61059.4%19.0%082.3%
64116.1%084.5%054.7%
Set 367015.1%140. 7%013.6%
70022.8%182.2%010.3%
73162.8%198. 8%024.6%
7604.6%116.2%025.9%
79185.9%121.7%01.7%
82025.4%174.7%028.7%
85021.1%161.6%079.8%
88168.8%179.3%026.0%
91023.7%163.1%028.8%
94062.1%179.3%013.0%
97042.2%166.1%018.8%
Set 4100115.8%144.1%071.4%
10315.3%144.1%070.5%
10615.2%141.3%069.3%
10911.6%142.4%062.9%
11214.4%129.7%063.3%
115121.9%143. 5%172.8%
118148.4%185.4%062.9%
12116.6%126. 5%051.5%
12414.1%126.7%036.3%
127112.7%133.1%059.9%
130110.9%145.0%062.5%
Set 5133169.4%16.7%024.8%
136138.1%14.0%011.4%
139076.3%11.7%013.8%
142119.9%10.8%06.8%
145181.2%119.4%05.8%
14809.0%16.6%017.7%
151169.4%14.9%146.2%
154147.0%122.3%06.0%
157020.6%10.1%08.1%
160148.5%12.1%011.2%
163156.0%128.5%071.1%
Set 6166170.2%159.6%174.3%
169078.2%141.0%042.0%
172050.8%154.8%066.0%
175040.1%117.9%048.2%
178054.1%139.8%058.5%
181016.3%175.4%134.3%
184056.3%124.5%048.3%
187017.0%189.3%058.1%
190054.3%169.6%041.8%
193038.9%138.1%139.2%
196065.9%134.7%046.6%
Set 7199187.3%113.0%181.3%
202029.7%013.5%082.9%
205167.7%133.3%192.4%
208143.7%115.6%175.5%
211190.5%161.0%197.9%
214160.9%14.1%02.7%
217160.0%144.5%189.5%
220175.4%162.3%195.5%
223154.8%128.3%183.5%
226192.3%143.2%191.9%
229179.7%197.0%199.6%
Set 8232139.8%045.1%176.5%
235151.9%016.3%038.3%
238147.3%096.1%161.3%
241150.6%019.4%032.0%
244132.4%155.5%030.4%
247172.7%168. 5%022.2%
250053.8%071.5%142.7%
253132.1%030.2%016.5%
256184.2%098.8%037.1%
259133.9%019.0%016.3%
262129.0%09.8%010.3%
Set 9265056.1%171. 7%031.4%
268019.9%132.4%046.0%
271031.4%187.3%029.5%
274019.4%18.7%059.1%
277150.8%170.1%052.1%
280132.8%133.7%033.4%
283010.8%184.1%137.7%
286151.5%161.4%033.8%
289129.1%152.0%037.9%
292020.5%135.0%021.4%
295116.7%177.7%042.2%
Set 10298021.5%198. 7%183.4%
301037.7%196.9%067.4%
304035.2%172.2%051.5%
307151.6%138.0%045.8%
310027.1%196.8%043.6%
313022.1%197.4%066.1%
316017.8%031.9%13.4%
319025.7%197.7%048.1%
322088.8%199.4%088.0%
325035.7%195.3%056.2%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Tseng, S.-H.; Nguyen, T.S. A Method for Visualizing Posterior Probit Model Uncertainty in the Early Prediction of Fraud for Sustainability Development. Axioms 2021, 10, 178. https://doi.org/10.3390/axioms10030178

AMA Style

Tseng S-H, Nguyen TS. A Method for Visualizing Posterior Probit Model Uncertainty in the Early Prediction of Fraud for Sustainability Development. Axioms. 2021; 10(3):178. https://doi.org/10.3390/axioms10030178

Chicago/Turabian Style

Tseng, Shih-Hsien, and Tien Son Nguyen. 2021. "A Method for Visualizing Posterior Probit Model Uncertainty in the Early Prediction of Fraud for Sustainability Development" Axioms 10, no. 3: 178. https://doi.org/10.3390/axioms10030178

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop