Next Article in Journal
Hyers-Ulam Stability of Euler’s Equation in the Calculus of Variations
Next Article in Special Issue
The Problem of Determining Discount Rate for Integrated Investment Projects in the Oil and Gas Industry
Previous Article in Journal
New and Improved Criteria on Fundamental Properties of Solutions of Integro—Delay Differential Equations with Constant Delay
Previous Article in Special Issue
The Robust Efficiency Estimation in Lower Secondary Education: Cross-Country Evidence
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluation of Bank Innovation Efficiency with Data Envelopment Analysis: From the Perspective of Uncovering the Black Box between Input and Output

1
School of Economic Information Engineering, Southwestern University of Finance and Economics, Chengdu 611130, China
2
Chongqing College of Electronic Engineering, Chongqing 401331, China
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(24), 3318; https://doi.org/10.3390/math9243318
Submission received: 29 November 2021 / Revised: 14 December 2021 / Accepted: 18 December 2021 / Published: 20 December 2021
(This article belongs to the Special Issue Quantitative Analysis and DEA Modeling in Applied Economics)

Abstract

:
The evaluation of corporation operation efficiency (especially innovation efficiency) has been always a hot topic. The currently popular evaluation methods are data envelopment analysis (DEA) and its improved methods. However, these methods have the following problems: the production process is regarded as a black box, and the actual production relationship between input and output is not analyzed. To solve these problems: (1) the black box theory and production function theory are introduced to uncover the black box of input and output; (2) regression models are used to alleviate the multicollinearity problem of inputs, and the most appropriate model of production relationship is selected; and (3) the results of the production function are compared with the results of the efficiency evaluation from multiple perspectives. Taking rural commercial banks in China as examples to evaluate their innovation efficiency, this article shows the following: (1) with the black box theory and production function theory, the staff, equipment, and intermediate business cost are suitable as innovation input variables, and intermediate business income is suitable as an innovation output variable; (2) the main challenges faced by rural commercial banks are reducing the reliance on human capital investment, strengthening technological innovation, and improving the efficiency of intermediate business cost management, which is hard to reveal with traditional DEA. The method proposed in this article provides an applicable reference for improving DEA method analysis.

1. Introduction

Enterprise innovation is related to the survival and sustainable development of enterprises, as well as the quantity and quality of product supply, and has been always a hot topic. One of the key points of the research is how to assess the efficiency of corporate innovation, which refers to comparing input and output production or operating efficiency [1]. In other words, for innovation activities, what is the output with the same input? Innovation efficiency is always regarded as the core power of enterprise development and profit sustainable growth [2]. Assessing the efficiency of an innovation system can also be an important tool for decision making, providing an important reference for developing strategies and improving actions, as well as helping us to better understand the nature and dynamics of innovation at different stages and levels [3]. Efficiency evaluation methods mainly include data envelopment analysis (DEA) and stochastic frontier analysis (SFA). Data envelopment analysis (DEA) was first proposed by Charnes et al. (1978) to provide a tool for efficiency evaluation [4]. As a nonparametric method, DEA is more widely used than SFA because it does not need to consider the specific relationship between input and output. Some scholars evaluated the management or production efficiency by improving the DEA method [5,6,7,8,9,10]. Some scholars used DEA to assess innovation efficiency [11,12,13,14,15]. DEA is also widely used to evaluate the operating efficiency of banks [16,17,18,19]. Shao et al. (2020) used a fixed correlation model (FCM) and a variable correlation model (VCM) based on the DEA method to evaluate commercial banks’ innovation efficiency [20].
However, for innovation efficiency, most existing studies ignore the correlation analysis of input and output. The DEA model has certain restrictions on the number of input-output variables. As Li et al. (2005) pointed out, multiple input variables often have problems such as multicollinearity [21]. Therefore, in actual research of efficiency evaluations, scholars mostly use the DEA method, avoiding the estimation and inference of the specific form of the production function. The input-output process has become a “black box”. This may lead to the following problems: first, input and output are not necessarily related, and the evaluation is not effective; secondly, due to the lack of analysis of the internal structure and interaction of the innovation system, it is difficult to find the deep-seated reasons leading to the inefficiency of the system. Some scholars tried to study the black boxes that open bank efficiency evaluations. For example, some scholars use CAMELS to establish input indicators [22,23]; after Tone (2009) proposed network DEA [24], some scholars tried to evaluate the internal complex system of a bank with that method [25,26]. However, the CAMELS indicators still cannot solve the problem of whether input and output have a production relationship and how many indicators are appropriate. Network DEA only divides the interior into several black box subsystems, such as production and operation, and into two stages, and the input and output relationship of these substages was not yet substantially resolved.
Therefore, the main contributions of this article are as follows: (1) we introduce “black box” theory and production function theory to uncover the “black box” of innovation input and innovation output; (2) to alleviate the multicollinearity problem of input indicators, we use partial least squares regression, lasso regression, and ridge regression for comparison, and we select the most appropriate model to explain the relationship between input and output, exploring the impact of various input variables on output; (3) we regress the impact of various input variables on efficiency, observe the impact of various input variables on returns to scale under the efficiency evaluation, and obtain the weight of each input variable under the optimization path to compare these results with the results obtained by the production function; (4) we explain the difference in economic terms. For regional small and medium-sized banks, because they are geographically similar, their business environments, organizational structures, and customers are similar, so they are a good sample for the comparison of innovation efficiency. Ninety-five rural commercial banks in Guangdong Province in China are considered as DMUs (decision-making units) in this article to evaluate their innovation efficiency, with a view to providing a reference for business decision-making for small and medium-sized banks to enhance their competitiveness and sustainable development. In the efficiency evaluation of innovation input and innovation output in this article, the super efficiency epsilon-based measure (super efficiency EBM) model, which is compatible with radial and non-radial models, will be used. This article provides economic significance to the relationship of innovation evaluation indicators to better understand the internal process of the innovation system.

2. Theoretical Analysis on the Relationship between Input and Output

2.1. The Black Box Theory of Input and Output

Most of the existing research literature avoids the input-output process of efficiency evaluation and directly compares input and output. However, some scholars noticed this problem and attribute it to a “black box” problem. The earliest prototype of the “black box” was the Closed Box, proposed by the cybernetic expert Norbert Wiener in 1945. W.R. Ashby provided a comprehensive explanation in his book on cybernetics in 1956. Cybernetics usually refers to unknown areas or systems as black boxes, and black box theory is used to study black box problems and their solutions, which became an important theory in cybernetics. The black box means that people only know the input variables and output variables and cannot directly observe and understand their internal structure, so they can only solve them in other ways.
The output process of commercial banks has many common influences, but what we generally see is the value of input and output, and we cannot see how they interact. In this way, the input can be controlled (that is, it can be selected), and the output is observable, but the internal condition becomes a black box. The black box theory is not used to study the internal conditions of the black box, but to study the input changes and output changes of the black box to infer its internal conditions. To uncover this black box, there are usually two methods: one is to infer its internal situation by studying its external input and output without opening it; the second is to open and influence it through certain tools and methods, and at the same time observe the inside of it.
In summary, we can deduce the internal situation of the input-output black box through scientific modeling. Traditionally, banks, as special enterprises operating deposits and loans, generally have three methods for defining input and output [27]. (1) In the production method, banks are regarded as producers, and deposits, loans, etc., are outputs, while labor, capital, etc., are inputs. (2) In the intermediary method, banks are regarded as intermediaries that convert deposits into investments. (3) In the asset method, banks are regarded as financial intermediaries, and the asset items in the balance sheet are outputs. Each of these three methods has advantages and disadvantages. The production method can be used more intuitively by establishing a production function. The production function is used in this article as a means to open the black box of the innovation system, which not only provides a theoretical basis for the work of selecting input variables and output variables, but also allows more effective indicators to be screened through empirical research.

2.2. The Production Function Theory and Its Application in Bank

The Cobb–Douglas production function was used. Its basic form is
Y = A K α L β
where K is capital input, L is labor input, and α and β are the output elasticities of capital and labor, respectively. According to the basic form of the Cobb–Douglas function, the input of production mainly includes capital input (K) and labor input (L), as well as technical factors, management factors, etc. (A). The Douglas function attributes output to capital, labor, and technology, to a large extent and from a long-term perspective. In the long term, for tangible investment, except for human labor, the rest is classified into materialized capital. In the short term, capital also has long-term fixed asset investment as well as short-term variable material input and other business costs.
China’s Guangdong rural commercial banks are selected as examples in this study. Traditionally, there are three to four input variables for bank innovation, mainly divided into three categories [27,28,29,30,31]: one is the net value of fixed assets, the second is the number of employees, and the third is expenses (intermediate business costs, management expenses, operating expenses, etc.). The output variable is generally one, which is based on intermediate business income or noninterest income, handling fees, and commission expenses. The following will discuss the selection of input-output indicators based on the actual situation of the sample. (1) From the perspective of the net value of fixed assets as an input indicator (previous literature), the previous indicator included assets such as real estate and was not directly related to innovation. However, the data collected by Guangdong rural commercial banks includes electronic equipment. Electronic equipment is the foundation for the development of information technology, online banking, and e-banking and is more directly related to innovation. Therefore, in this article, electronic equipment is considered as a capital investment. (2) From the perspective of the number of employees as input indicators (previous literature), compared with large and medium-sized banks (usually relying on a group of high-quality talents to innovate), the rural commercial bank, as a small and medium-sized bank, has fewer people with a high-level education and high-ranking professional titles. The innovation needs the support of the vast majority of employees, not a part of the population. Therefore, the overall number of employees is used as labor input. (3) In addition, starting from reality, the cost of input indicators (equivalent to variable costs such as materials) should also be considered. (4) For output indicators, rural commercial banks have a relatively simple business development and relatively low levels of innovation. Therefore, intermediate business income is used to measure innovation output.
Therefore, the final form of the rural commercial bank’s innovation output function can be
Y = A X 1 α X 2 β X 3 γ
The natural logarithm is taken on both sides to obtain the logarithmic form:
ln Y = ln A + α ln X 1 + β ln X 2 + γ ln X 3
where Y is the intermediate business income, X 1 is the number of employees, X 2 is the electronic equipment, and X 3 is the intermediate business cost.

3. Data Analysis

3.1. Data Description

In this article, Income represents intermediate business income, Equipment represents electronic equipment, Staff represents the number of employees, and Cost represents the intermediate business cost. The data come from the annual report of Guangdong Rural Commercial Banks. These banks are banks owned by private capital and are all independent enterprises. The number of banks is 95, and the time is from 2011 to 2018 (in 2018, some rural commercial banks were merged, so eight years was selected for effective comparison). Thus, the total sample is 760, since the rural commercial banks in a province are geographically similar, and the policy environment, customer groups, and cultures they face are relatively similar. Beyond a province, many conditions change, and the resulting efficiency score may not be in line with reality. It is important to select similar objects to be evaluated, which was often overlooked in past studies. Therefore, the 95 banks we selected comprise an adequate sample.
Statistical descriptive analysis was carried out on the sample data. Table 1 shows that the standard deviations of the four variables are relatively large, especially the difference between equipment investment (Equipment) and intermediate business income (Income).

3.2. Correlation Analysis

The Pearson test method was used to analyze the correlation between input variables and the output variable. The results are shown in Table 2. In Table 2, the results are basically the same regardless of whether or not the logarithm is not considered. The correlation coefficients of Staff, Equipment, Cost, and Income are all positive, and the coefficients are between 0.8179 and 0.8906. There is a significant positive correlation between the three input variables and the output variable.
Figure 1 shows a scatter diagram to show the fitting relationship between each input variable and output. The fitting relationship between each input variable and output shows the same trend of positive correlation, and the fitting relationship is better, which also shows the positive effects of various input variables on output.

3.3. Multicollinearity Problem

Since the input variables have a common trend, they may have multiple collinearity. This is relatively common because, for banking institutions and the general public, the idea of “big but not falling” is deeply ingrained, and financial regulatory authorities will also impose restrictions on single-account or group loan grants with regard to net capital. Therefore, for large organizations, whether it is investment or output, they are more powerful, and the scale of customers served is also larger. For small organizations, whether it is investment, technical level, staff quality, etc., they have fewer advantages. Therefore, the “strong–strong effect” of large institutions’ investments is likely to exist, leading to the existence of multicollinearity in input variables. The correlation test results are shown in Table 3. In Table 3, the correlation coefficients of Staff, Equipment, and Cost are above 0.85. Even if the variables are logarithmic, the correlation coefficients are between 0.74 and 0.82. If the panel regression model is used directly, it is likely to be affected by multicollinearity and lead to deviations in the results.

4. Alleviating Multicollinearity and Obtaining the Production Function between Input and Output

If the explanatory variables have multicollinearity, the model obtained by ordinary regression is biased. The main purpose of this section is to alleviate multicollinearity and obtain a more reasonable model and production function parameters. To alleviate the impact of multicollinearity, partial least squares regression (PLS), lasso regression, and ridge regression models are used for regression analysis, and their results are then compared. Finally, the model most appropriate for determining the production function relationship is chosen.

4.1. Partial Least Squares Regression (PLS) Analysis

4.1.1. Model Introduction

Partial least squares regression (PLS) combines the advantages of the principal component analysis method, the canonical correlation analysis method, and the multiple linear regression method. It is often used when the number of input variables is more than three, the sample data are small, and the variables may have multicollinearity. If the original data set has a collinearity problem, it is easy to produce over-fitting due to the correlation between the independent variables in the process of multiple linear regression. The PLS algorithm finds linear and independent variables for replacement in the process of regression calculation, so as to solve the multicollinearity problem as much as possible. The specific implementation steps of PLS are as follows:
By matrixing the original data set into X and Y matrices, and requiring the largest covariance, we can obtain
max { C o v ( t 1 , u 1 ) } = max E o w 1 , F 0 c 1
s . t . { w 1 T w 1 = 1 c 1 T c 1 = 1
Let E 0 and F 0 be the standardized data of the matrix X and Y. Using Lagrangian multiplication, we can obtain
E 0 = t 1 p 1 T + E 1
F 0 = t 1 r 1 T + F 1
For the regression coefficient vector, we have
p 1 = E 0 T t 1 t 1 2
r 1 = F 0 T t 1 t 1 2
Using the residual matrix to replace the original data matrix to standardize the data, obtaining the post-divided principal component data, and establishing the regression equation, we can obtain
t 2 = E 1 w 2 , u 2 = F 1 c 2
E 1 = t 2 p 2 T + E 2
F 1 = t 2 r 2 T + F 2
By obtaining the regression coefficient and iterative calculation, letting n be the rank of X, we finally have
F 0 = t 1 r 1 T + + t n r n T + F n = E 0 [ i = 1 n w i r i T ] + F n

4.1.2. Results

After taking the logarithm of each variable, we classified the three items of Staff, Equipment, and Cost as the main component U1 and classified the single item of Income as the main component V1. The results are shown in Table 4.
Table 4 shows the mathematical relationship expressions between the principal components and the research items, including the relationship expressions between the principal components and the independent variables, and the relationship expressions between the principal components and the dependent variables, as shown below:
U 1 = 0.602 ln S t a f f + 0.552 ln E q u i p m e n t + 0.577 ln C o s t
V 1 = 1 * ln I n c o m e
We can obtain the regression coefficient between the explanatory variables and the explained variable, as shown in Table 5.
Finally, we can obtain the relational expression of the production function:
ln I n c o m e = 0.348 ln s t a f f + 0.320 ln E q u i p m e n t + 0.334 ln C o s t

4.2. Lasso Regression Analysis

4.2.1. Model Introduction

Lasso regression is a regularized linear regression, which is widely used in statistics and machine learning. Lasso regression is also an effective method to alleviate the problem of multicollinearity. Lasso regression can reduce the deviation and enhance the accuracy of the model by adding a penalty to the absolute value of the regression coefficient. It also adopts the idea of gradient descent, adding a regular term to the linear regression, thereby solving the problem of overfitting and alleviating the multicollinearity. The key implementation steps of the gradient descent algorithm of lasso regression are as follows:
min w 1 2 n s a m p l e s X w y 2 2 + α w 1
By transforming the equation, we have
J ( w ) = X w y 2 2 + α w 1
If we take the maximum absolute value of the vector norm and derive it, we obtain
w J ( w ) = 2 X T X w 2 X T y + α sgn ( w )
Finally, in gradient descent, the coefficients have the following rules:
w = w ε w J ( w ) w = w ε ( X T X w X T y ) ε α s g n ( w ) w = [ w ε ( X T X w X T y ) ] ε α s g n ( w )

4.2.2. Results

We used the logarithm of the variables and performed lasso regression. The results are shown in Table 6, and the relational expression of the production function was obtained:
ln I n c o m e = 0.854 + 0.695 ln S t a f f + 0.071 ln E q u i m e n t + 0.245 ln C o s t

4.3. Ridge Regression Analysis

4.3.1. Model Introduction

Ridge regression is also a method to solve multicollinearity. Ridge regression can effectively solve the problem of overfitting through the addition of regular terms and the calculation of the regression regular equation. The key steps of ridge regression are as follows:
For the loss function, we have
J ( w ) = min w X w y 2 2 + α w 2 2
J ( w ) = ( X w y ) T ( X w y ) + α w T w
For the derivation, we have
w J ( w ) = 2 X T X w 2 X T y + 2 α w
Let the derivative be zero. Thus,
X T X w X T y + α I w = 0
( X T X + α I ) w = X T y
Thus,
( X T X + α I ) 1 ( X T X + α I ) w = ( X T X + α I ) 1 X T y
The final coefficient w is
w = ( X T X + α I ) 1 X T y

4.3.2. Results

We used the logarithm of the variables and performed ridge regression. The results are shown in Table 7, and the relational expression of the production function was obtained:
ln I n c o m e = - 0.185 + 0.643 ln S t a f f + 0.227 ln E q u i m e n t + 0.299 ln C o s t

4.4. Model Comparison

The above PLS, lasso, and ridge regression models all have the following model formulas:
y = ln I n c o m e = ε + α * ln S t a f f + β * ln E q u i p m e n t + γ * ln C o s t
We can find the predicted value of Income under each regression model y ^ . For the forecast of the Income data set, we used
D ^ i = { y ^ i 1 , y ^ i 2 , , y ^ i 760 }
The predicted value was calculated. A line chart is illustrated below (Figure 2). Moreover, the results obtained by using the three regression models to predict Income in this article are basically the same as the original data in terms of data trends, but the errors are different.
The mean square error (MSE) and root mean square error (RMSE) under each model were calculated, and the results were compared.
The MSE was calculated as follows:
M S E = 1 n n = 1 760 ( y i n y ^ i n ) 2
The RMSE was calculated as follows:
R M S E = 1 n n = 1 760 ( y i n y ^ i n ) 2
Through calculation, the MSE and RMSE values of the partial least-squares regression, lasso regression, and ridge regression models were obtained, as shown in Table 8 and Figure 3.
By comparing the value of the test items of each model, the mean values of the MSE and RMSE of the lasso regression are the smallest, which is relevant for alleviating multicollinearity. Therefore, for the production function,
I n c o m e = A S t a f f α E q u i p m e n t β C o s t γ .
α = 0.695, β = 0.071, γ = 0.245, and α + β + γ = 1.011 > 1.
For every 1% increase in Staff, there is an average increase of 0.695% in Income; for an average increase of 1% in Equipment, there is an average increase of 0.071% in Income; for an average increase of 1% in Cost, there is an average increase of 0.245% in Income. If Staff, Equipment, and Cost increase by 1% together, Income will increase by 1.011%, showing an increasing trend in returns to scale.

5. Efficiency Evaluation with the Super Efficiency EBM Model

Since it was determined that the input variables and output variables do have a production relationship, in this section, the innovation efficiency of the rural commercial banks in the sample will be evaluated.

5.1. Introduction to the Super Efficiency EBM Model

The traditional radial model (CCR, BCC, etc.) solves all indicators in a proportional form, which may lead to errors in the results because it ignores the problem of slack improvement. The proposal of non-radial models such as SBM effectively solves this problem. The SBM model starts with slack variables to solve the efficiency problem of decision-making units, which effectively improves the accuracy of efficiency calculation. However, the SBM model ignores the original proportion information of the decision-making unit and performs non-radial processing on the parts that should be radially optimized. There may be a problem of excessive consideration of slack variables, so efficiency calculation errors may still be produced. Tone et al. (2010) proposed an EBM (epsilon-based measure) model that is compatible with radial (CCR) and nonradial (SBM) mixed distance functions [32].
In our sample, input and output are not always increasing. For example, for one of the rural commercial banks, from 2017 to 2018, Staff dropped by 63, Equipment increased by about 20 million yuan, Cost increased by about 10 million yuan, and Income dropped by about 60 million yuan. Therefore, input and output are not in the same proportion and are much more complicated. While considering the radial ratio between the input frontier value and the actual value, EBM can reflect the differentiated nonradial slack variables between the inputs, so it can eliminate the error in the calculation results caused by considering a single distance function. Thus, we will use EBM for analysis. The EBM model is constructed as follows:
γ * = m i n θ ε x i = 1 m ω i s i x i k
s . t . j = 1 n x i j λ j + s i θ x i k ,   i = 1 ,   ,   m
j = 1 n y r j λ j y r k ,   r = 1 ,   ,   s
j = 1 n λ j = 1
λ j 0 , s i 0
In the formula, γ * is the efficiency value of the decision-making unit. When it is equal to 1, it is technically efficient; when it is less than 1, it is inefficient; x i k and y r k are the input and output variables of decision-making unit k, respectively; λ j is the linear combination variables; θ is a moderator variable, which is used as the radial optimization; s is the slack of the input variable; ε x is the core parameter representing the importance of the non-radial part in the model, with a value range of [0, 1]. When ε x = 0 , the EBM model is equivalent to the CCR model; when θ = ε x = 1 , the EBM model is transformed into an SBM model; ω i is the relative importance of each variable. The simple EBM model cannot distinguish the effective DMU efficiency (efficiency value is 1). We can learn from the idea of super-efficiency DEA and compare these effective values. The basic idea of the super-efficiency model is to replace the input-output of the kth DMU with the combination of the input-output of all the remaining DMUs when measuring the efficiency of the kth DMU. In this way, the kth DMU is eliminated.

5.2. Result Analysis

Super efficiency EBM was used to evaluate the innovation efficiency of the 760 samples, and the results are shown in Figure 4. This figure shows that the innovation efficiency is generally low; the innovation efficiency of the first 100 samples or so is high, because these samples are mainly concentrated in the Pearl River Delta region of Guangdong Province, where the economy is relatively developed and the scale of rural commercial banks is relatively large; in the non-Pearl River Delta region, the economy is relatively backward, and the overall level of innovation efficiency is low, but these poor areas also have outstanding performance in individual years, and the innovation efficiency is relatively high.
Figure 5 shows the trend of annual average efficiency. From the perspective of development trends: Firstly, there was a decline in 2011, but an upward trend is apparent over the three consecutive years afterward. This is mainly because the Chinese government adopted relatively large stimulus measures after the subprime mortgage crisis in the United States, which promoted the development of rural commercial banks. Secondly, after 2015, innovation efficiency began to decline again, which may be related to the country’s economic growth. From 2015 to 2018, the economic growth situation was more complicated, and nonperforming loans increased, which had a certain impact on small and medium banks such as rural commercial banks, thus affecting their innovation efficiency.

6. Further Discussion

6.1. The Impact of Input Variables on Output and Efficiency

After completing the calculation of efficiency, we compared the relationship between input variables and output and between input variables and efficiency, respectively, so as to identify input variables that may have inconsistent impact directions. If such a variable existed, the economic significance behind it was explored.
A static effect panel model was used to analyze the impact of input on innovation efficiency. Since the unit of input variables is relatively large, all input variables were divided by 10,000. The F test result was significant, so the mixed regression (pooled OLS) model was rejected, and the individual fixed effect (FE) model was accepted. Furthermore, the Hausman test was significant, so the random effect model was rejected, and the FE model was similarly accepted. The results of the FE model are shown in Table 9.
Table 9 shows that staff, equipment, and cost have a significant impact on innovation efficiency, but the impact of equipment and staff is positive (consistent with the impact on output), and the impact of cost is negative (inconsistent with the impact on output). An important thing is revealed here: there are indeed input variables that have inconsistent effects on output and efficiency, verifying the previous hypothesis.

6.2. Returns to Scale

In Section 4, we showed that the production function of input and output increases returns to scale. At the same time, the super-efficiency EBM showed that each DMU is in the return to scale stage. Our super-efficiency EBM model is based on variable returns to scale, which can be divided into increasing returns to scale (IRS), constant returns to scale (CRS), and decreasing returns to scale (DRS). In our results, there are a total of 693 in the IRS state (about 91% of the total sample), and a total of 67 in the DRS state (about 9% of the total sample). Therefore, IRS account for the vast majority, which is basically consistent with the conclusion of the production function. However, in the efficiency evaluation, what are the impacts of input variables on the scale stages of the DMUs? This needs to be further explored.
We introduced dummy variables and treated the IRS state as 1 and the DRS state as 0. The explained variable was then discrete. There are two main types of discrete panel data models: one is the panel Probit model, and the other is the panel Logit model. While the Probit model requires a random error term to obey a normal distribution, the Logit model does not have this requirement. To facilitate analysis, we used the Logit model. The results of Logit regression are shown in Table 10.
Table 10 shows that the increase in the number of employees will reduce the probability of an IRS appearance, and the increase in equipment and costs will increase this probability.

6.3. The Weight of Input Variables

In Section 4, we obtained the elastic coefficient of each input variable in the production function: α = 0.695; β = 0.071; γ = 0.245. However, the difference seems to be relatively large. From the perspective of optimal production efficiency, does this arrangement need to be optimized? Is there a benchmark value? These are the issues to be discussed in this section.
The DEA model is data-oriented, so ε and w depend on the input-output data set ( X , Y ) . Assuming that a R + n and b R + n are two n-dimensional positive vectors, they represent the observation values of the input variables of the DMU. Tone defines an affinity index with the following characteristics [32,33]: (1) S ( a , a ) = 1 ; (2) S ( a , b ) = S ( b , a ) ; (3) S ( t a , b ) = S ( a , b ) ; (4) 0 S ( a , b ) 1 .
The following is further defined:
c j = ln ( b j a j ) ,   c ¯ = 1 / n ( j = 1 n c j ) ,   c m a x = m a x j { c j } ,   c m i n = m i n j { c j } .
Furthermore, Tone defines a diversity index D ( a , b ) :
When c m a x > c m i n , D ( a , b ) = j = 1 n | c j c ¯ | n ( c m a x c m i n ) .
When c m a x < c m i n , D ( a , b ) = 0 .
Moreover, 0 D ( a , b ) = D ( b , a ) 1 2 , and S ( a , b ) = 1 2 D ( a , b ) .
Through the above definitions, the parameters can be calculated.
Firstly, all the decision-making units are projected. In most DEA models, the frontier of the production is composed of the small number of most efficient DMUs. To improve the accuracy of the calculation, it is necessary to project all the DMUs in the VRS (variable return to scale) model. Relying on the optimal slack variables s * and s + * , the projection value of the input and output of the decision-making unit can be measured:
x ¯ i o = x i o s * ,   y ¯ i o = y i o + s + *
Secondly, a similarity matrix is constructed. S = [ s i j ] R m × m is based on the projection value of each input variable: s i j = S ( x ¯ i , x ¯ j ) and 0 s i j 1 .
Thirdly, the maximum eigenvalue and eigenvector of the similarity matrix are calculated. The similarity matrix S is symmetrical and non-negative. The diagonal element values are all 1, and they have m pairs of eigenvalues and eigenvectors. According to the Perron–Frobenius theorem, the non-negative similarity matrix S has the largest eigenvalue w x related to the non-negative eigenvector ρ x , and the non-negative eigenvector C is the weight of the input variable. 1 ρ x m .
Finally, ε x   and w of the EBM model are calculated.
When m > 1 , ε x = m ρ x m 1 ; when m = 1, ε x = 0 .
w = w x i = 1 m w x i .
Through the above formula, we can calculate the ε x and w of the super-efficiency EBM as follows:
1. ε x = 0.223. Since ε x is the core parameter representing the importance of the non-radial part in the model, the non-radial tendency is low, and the radial tendency is high. In the economic sense, the interdependence of input variables is higher, and an increase or decrease of an input, to a large extent, needs a corresponding amount of adjustment.
2. w S t a f f = 0.311, w E q u i p m e n t = 0.343, and w C o s t = 0.346. This is the proportion obtained under the target optimization situation and can be used as a benchmark value for the optimization path. Under the optimization path for efficiency, the weights of Staff, Equipment, and Cost should be similar. However, in the production function, the coefficient value of Staff is 0.695, which is much higher than w S t a f f , and the coefficient value of Equipment is 0.071, which is much smaller than w E q u i p m e n t . This shows that, in terms of improving efficiency, personnel investment needs to be further optimized, and the role of equipment investment needs to be further improved. This reveals a more important economic significance: rural commercial banks should abandon “tactics of human sea” and increase the role of financial technology.

6.4. Conclusion of This Section

Based on an analyses of Section 6.1, Section 6.2, Section 6.3, we can conclude the following:
1. Human input is currently driving the innovation of rural commercial banks, but it is unsustainable in the long run. As mentioned earlier, these small and medium banks are not as successful as large banks in terms of financial technology development and high-quality talent training. Therefore, the innovation of these banks still relies to a large extent on “manpower innovation”, improving some traditional businesses. Such expansion will bring about a greater promotion effect on innovation output in the short term. From the early stages of the development of rural commercial banks, human input can make a relatively large contribution, because it can quickly promote the scale of innovation output, complete capital accumulation, and lay the foundation for subsequent innovation. However, in the long run, manpower investment tends to reduce the returns to scale, which slows down the improvement of efficiency. In the end, it is likely to drag down innovation efficiency. Therefore, the innovation model driven by manpower is not sustainable.
2. Equipment input has a positive effect on output and efficiency. This effect is still relatively small at present and should be significantly improved in the long run. There is also a possibility that the current financial technology innovation of rural commercial banks mainly imitates big banks and adopts a follow-up strategy, so the role of equipment investment has not been well manifested. However, from a long-term perspective, the production effect of equipment input should be enhanced, and the role of financial technology should be prominent.
3. The cost of intermediate business promotes the output of intermediate business (innovation output) but reduces the efficiency of innovation. This may be difficult to understand. From an intuitive numerical understanding, the increase rate of intermediate business output is lower than the increase rate of intermediate business cost input. Explaining from an economic sense, on average, intermediate business costs, which includes marketing expenses, office expenses, and financial expenses, have not achieved an effective output, showing a lack of business management of rural commercial banks. This is also where management needs to be strengthened in the future.
From the above analyses, the main challenges faced by the rural commercial banks in terms of innovative business development are reducing the reliance on human capital investment, strengthening technological innovation, and improving the efficiency of intermediate business cost management. These banks should carefully check their business management strategy implementation and team management. Otherwise, they may lose their advantage in future Fintech competition.

7. Conclusions

The relationship between input and output is often ignored by traditional data envelopment analysis methods. This article proposes a new research method to solve this problem, using black box theory and production function theory to reveal the relationship between input and output, which can exclude input variables that have no actual production relationship with output. This article suggests that practical significance must be considered when variables are chosen. Some variables cannot be eliminated because of the existence of multicollinearity, and there are ways to alleviate this multicollinearity. Finally, from multiple angles, the relationship between input -output and the relationship between input-efficiency are compared, so as to identify input variables that may have inconsistent impact directions.
Taking rural commercial banks in China as examples to evaluate their innovation efficiency, this article shows the following: (1) with the black box theory and production function theory, the staff, equipment, and intermediate business cost are suitable as innovation input variables, and the intermediate business income is suitable as an innovation output variable, and they show a good production function relationship. (2) The investment in manpower is still an important driving force for the innovation of rural commercial banks, but it is unsustainable in the long run. (3) Equipment input has a positive effect on output and efficiency, and it should play a more prominent role. (4) The cost of intermediate business promotes the output of intermediate business but reduces the efficiency of innovation. The main problems faced by rural commercial banks in terms of innovative business development are reducing the reliance on human capital investment, strengthening technological innovation, and improving the efficiency of intermediate business cost management. Therefore, these banks should carefully evaluate their business management strategy implementation and team management. The case of rural commercial banks has well verified the method proposed in this article.
In general, the data envelopment analysis method will still play an important role in evaluating operating efficiency and providing a reference for operating management decision-making. The method proposed in this article has supplemented the data envelopment analysis method and will have a certain promotion significance.

Author Contributions

Conceptualization, K.Z.; methodology, K.Z.; writing—original draft preparation, K.Z.; writing—review and editing, K.Z., C.L. and Q.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare that there is no conflict of interest.

References

  1. Gaston, N.; Trefler, D. The Labour Market Consequences of the Canada-U.S. Free Trade Agreement. Can. J. Econ. 1997, 30, 18–41. [Google Scholar] [CrossRef]
  2. Qiao, S.; Shen, T.; Zhang, R.R.; Chen, H.H. The impact of various factor market distortions and innovation efficiencies on profit sustainable growth: From the view of China’s renewable energy industry. Energy Strategy Rev. 2021, 38, 100746. [Google Scholar] [CrossRef]
  3. Carayannis, E.G.; Grigoroudis, E.; Goletsis, Y. A multilevel and Multistage Efficiency Evaluation of Innovation Systems: A Multiobjective DEA Approach. Expert Syst. Appl. 2016, 62, 63–80. [Google Scholar] [CrossRef]
  4. Charnes, A.; Cooper, W.W.; Rhodes, E. Measuring the Efficiency of Decision Making Units. Eur. J. Oper. Res. 1978, 2, 429–444. [Google Scholar] [CrossRef]
  5. Sherzod, B.; Kim, K.-R.; Lee, S.H. Agricultural Transition and Technical Efficiency: An Empirical Analysis of Wheat-Cultivating Farms in Samarkand Region, Uzbekistan. Sustainability 2018, 10, 3232. [Google Scholar] [CrossRef] [Green Version]
  6. Huang, Y.; Luo, S.; Xu, G.; Zhou, G. Quantitative Analysis and Evaluation of Enterprise Group Financial Company Efficiency in China. Sustainability 2018, 10, 3210. [Google Scholar] [CrossRef] [Green Version]
  7. Zhao, H.; Zhao, H.; Guo, S. Operational Efficiency of Chinese Provincial Electricity Grid Enterprises: An Evaluation Employing a Three-Stage Data Envelopment Analysis (DEA) Model. Sustainability 2018, 10, 3168. [Google Scholar] [CrossRef] [Green Version]
  8. Moreno, P.; Lozano, S. Fuzzy Ranking Network DEA with General Structure. Mathematics 2020, 8, 2222. [Google Scholar] [CrossRef]
  9. Villa, G.; Lozano, S.; Redondo, S. Data Envelopment Analysis Approach to Energy-Saving Projects Selection in an Energy Service Company. Mathematics 2021, 9, 200. [Google Scholar] [CrossRef]
  10. Ratner, S.; Lychev, A.; Rozhnov, A.; Lobanov, I. Efficiency Evaluation of Regional Environmental Management Systems in Russia Using Data Envelopment Analysis. Mathematics 2021, 9, 2210. [Google Scholar] [CrossRef]
  11. Wang, Q.; Hang, Y.; Sun, L.; Zhao, Z. Two-stage Innovation Efficiency of New Energy Enterprises in China: A Non-radial DEA Approach. Technol. Forecast. Soc. Chang. 2016, 112, 254–261. [Google Scholar] [CrossRef]
  12. Li, H.; He, H.; Shan, J.; Cai, J. Innovation Efficiency of Semiconductor Industry in China: A New Framework based on Generalized Three-stage DEA Analysis. Socio-Econ. Plan. Sci. 2019, 66, 136–148. [Google Scholar] [CrossRef]
  13. Ma, X.; Liu, Z.; Gao, Y.; Na, L. Innovation Efficiency Evaluation of Listed Companies Based on the DEA Method. Procedia Comput. Sci. 2020, 174, 382–386. [Google Scholar]
  14. Wang, Y.; Pan, J.F.; Pei, R.M.; Yi, B.W.; Yang, G.L. Assessing the Technological Innovation Efficiency of China’s High-tech Industries with A Two-stage Network DEA Approach. Socio-Econ. Plan. Sci. 2020, 71. [Google Scholar] [CrossRef]
  15. Zhong, K.Y.; Wang, Y.F.; Pei, J.M.; Tang, S.M.; Han, Z.L. Super Efficiency SBM-DEA and Neural Network for Performance Evaluation. Inf. Process. Manag. 2021, 58, 102728. [Google Scholar] [CrossRef]
  16. Staub, R.B.; Souza, G.; Tabak, B.M. Evolution of bank efficiency in brazil: A dea approach. Eur. J. Oper. Res. 2010, 202, 204–213. [Google Scholar] [CrossRef]
  17. Shyu, J.; Chiang, T. Measuring the true managerial efficiency of bank branches in taiwan: A three-stage dea analysis. Expert Syst. Appl. 2012, 39, 11494–11502. [Google Scholar] [CrossRef]
  18. Ohsato, S.; Takahashi, M. Management efficiency in japanese regional banks: A network dea. Procedia-Soc. Behav. Sci. 2015, 172, 511–518. [Google Scholar] [CrossRef] [Green Version]
  19. Li, Y. Analyzing efficiencies of city commercial banks in china: An application of the bootstrapped dea approach. Pacific-Basin Financ. J. 2020, 62, 101372. [Google Scholar] [CrossRef]
  20. Shao, L.; You, J.; Xu, T.; Shao, Y. Non-Parametric Model for Evaluating the Performance of Chinese Commercial Banks’ Product Innovation. Sustainability 2020, 12, 1523. [Google Scholar] [CrossRef] [Green Version]
  21. Li, X.L.; Guo, Y.; Zhan, L.G. An empirical study of the output function of commercial banks in China. Stat. Res. 2005, 5, 59–62. [Google Scholar]
  22. Wanke, P.; Azad, M.A.K.; Barros, C.P. Efficiency factors in oecd banks: A ten-year analysis. Expert Syst. Appl. 2016, 64, 208–227. [Google Scholar] [CrossRef]
  23. Silva, T.P.D.; Leite, M.; Guse, J.C.; Gollo, V. Financial and economic performance of major Brazilian credit cooperatives. Contad. Admin. 2017, 62, 1442–1459. [Google Scholar]
  24. Tone, K.; Tsutsui, M. Network DEA: A slack-based measure approach. Eur. J. Oper. Res. 2009, 197, 243–252. [Google Scholar] [CrossRef] [Green Version]
  25. Azad, A.K.; Kian-Teng, K.; Talib, M.A. Unveiling black-box of bank efficiency: An adaptive network data envelopment analysis approach. Int. J. Islamic Middle East. Financ. Manag. 2017, 10, 149–169. [Google Scholar] [CrossRef]
  26. Fukuyama, H.; Weber, W.L. Measuring Bank Performance: From Static Black Box to Dynamic Network Models. In Handbook of Operations Analytics Using Data Envelopment Analysis; Hwang, S.N., Lee, H.S., Zhu, J., Eds.; International Series in Operations Research & Management Science; Springer: Boston, MA, USA, 2016; Volume 239. [Google Scholar] [CrossRef]
  27. Berger, A.N.; Hunter, W.C.; Timme, S.G. The Efficiency of Financial Institutions: A Review and Preview of Research Past, Present, and Future. J. Bank. Financ. 1993, 17, 221–249. [Google Scholar] [CrossRef]
  28. Ruan, X.M.; Zhen, X. Research on Measurement and Evaluation of Innovation Efficiency of Chinese City Commercial Banks. Jiang-huai Trib. 2015, 2, 39–45. [Google Scholar]
  29. Lv, X.M. Research on the Evaluation of Listed Banks’ Innovation Ability under the Impact of Internet Finance—Based on the Panel Data Generalized DEA Model. Account. Econ. Res. 2016, 5, 96–114. [Google Scholar]
  30. Hu, W. Research on the Impact of Internet Finance on the Innovation Performance of City Commercial Banks. Financ. Theory Pract. 2017, 8, 19–23. [Google Scholar]
  31. Liu, C.Y. Analysis of the Technology Gap Ratio of Listed Banks’ Innovation Ability—Based on the Perspective of Generalized DEA Efficiency. Account. Econ. Res. 2018, 3, 119–130. [Google Scholar]
  32. Tone, K.; Tsutsui, M. An Epsilon-based Measure of Efficiency in DEA—A Third Pole of Technical efficiency. Eur. J. Oper. Res. 2010, 207, 1554–1563. [Google Scholar] [CrossRef]
  33. Shi, B.; Shen, K. Urbanization, Industrial Agglomeration and EBM Energy Efficiency. Ind. Econ. Res. 2012, 6, 10–16. [Google Scholar]
Figure 1. Fitted scatter plot of each input variable and Income.
Figure 1. Fitted scatter plot of each input variable and Income.
Mathematics 09 03318 g001
Figure 2. Predicted values of four variables.
Figure 2. Predicted values of four variables.
Mathematics 09 03318 g002
Figure 3. Comparison of MSE and RMSE values of three models.
Figure 3. Comparison of MSE and RMSE values of three models.
Mathematics 09 03318 g003
Figure 4. Efficiency score distribution of all samples.
Figure 4. Efficiency score distribution of all samples.
Mathematics 09 03318 g004
Figure 5. Trend of annual average efficiency.
Figure 5. Trend of annual average efficiency.
Mathematics 09 03318 g005
Table 1. Data statistics.
Table 1. Data statistics.
VariableMeaningSampleMinimum ValueMaximum ValueAverage ValueStandard Deviation
Innovation
Input (three variables)
StaffNumber of employees760938051720.0261056.533
EquipmentElectronic equipment (10,000 yuan)7601774,3143607.79510,359.898
CostIntermediate business income (10,000 yuan)760325,708567.1622170.641
Innovation
Output (one variable)
IncomeIntermediate business income (10,000 yuan)7603330,36035103.25723,881.986
Table 2. Correlation test results of Staff, Equipment, Cost, and Income.
Table 2. Correlation test results of Staff, Equipment, Cost, and Income.
Pearson Test Pearson Test
Income lnIncome
Staff0.8645lnStaff0.8906
Equipment0.8837lnEquipment0.8179
Cost0.8645lnCost0.8543
Table 3. Correlation test results.
Table 3. Correlation test results.
Pearson Test (Unlogarithm)
StaffEquipmentCost
Staff1.0000
Equipment0.92061.0000
Cost0.84580.85741.0000
Pearson Test (logarithm, ln)
lnStafflnEquipmentlnCost
lnStaff1.0000
lnEquipment0.81931.0000
lnCost0.77830.74151.0000
Table 4. Mathematical relationship between principal components and research items.
Table 4. Mathematical relationship between principal components and research items.
TermThe Ratio of Items to Principal Components
lnStaff0.602
lnEquipment0.552
lnCost0.577
lnIncome1.000
Table 5. Regression coefficients between explanatory variables and explained variables.
Table 5. Regression coefficients between explanatory variables and explained variables.
lnIncome lnIncome (Standardization)
Constant−0.8440.000
lnStaff0.5840.348
lnEquipment0.3360.320
lnCost0.2980.334
Table 6. Lasso regression results.
Table 6. Lasso regression results.
VariablelnIncome
lnStaff0.695 ***
(14.36)
lnEquipment0.071 **
(2.50)
lnCost0.245 ***
(10.97)
Constant0.854 ***
(4.64)
R20.816
F-test1115.252 ***
Note: t-values are reported in parentheses.** p < 0.05; *** p < 0.01.
Table 7. Ridge regression results.
Table 7. Ridge regression results.
VariableLnIncome
LnStaff0.643 ***
(27.808)
LnEquipment0.227 ***
(15.709)
LnCost0.299 ***
(24.739)
Constant−0.185
(1.46)
R20.860
F-test1546.000 ***
Note: t-values are reported in parentheses. *** p < 0.01.
Table 8. MSE and RMSE values of three models.
Table 8. MSE and RMSE values of three models.
TermsPLSLassoRidge
MSE0.9472690.3274710.369476
RMSE0.9732780.572250.607845
Table 9. Impact of input variables on innovation efficiency.
Table 9. Impact of input variables on innovation efficiency.
VariableResult (FE)
Staff1.408 *
(1.75)
Equipment0.071 ***
(2.93)
Cost−0.104 **
(2.12)
Constant0.076
(1.46)
R20.234
F-test8.80 ***
Hausman test9.51 **
Note: t-values are reported in parentheses. * p < 0.1; ** p < 0.05; *** p < 0.01.
Table 10. Logit regression result.
Table 10. Logit regression result.
TermCoefficient
Constant7.287 ***
(10.94)
Staff−75.552 ***
(8.47)
Equipment4.427 ***
(7.05)
Cost8.868 ***
(5.92)
Note: z-values are reported in parentheses. *** p < 0.01.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhong, K.; Li, C.; Wang, Q. Evaluation of Bank Innovation Efficiency with Data Envelopment Analysis: From the Perspective of Uncovering the Black Box between Input and Output. Mathematics 2021, 9, 3318. https://doi.org/10.3390/math9243318

AMA Style

Zhong K, Li C, Wang Q. Evaluation of Bank Innovation Efficiency with Data Envelopment Analysis: From the Perspective of Uncovering the Black Box between Input and Output. Mathematics. 2021; 9(24):3318. https://doi.org/10.3390/math9243318

Chicago/Turabian Style

Zhong, Kaiyang, Chenglin Li, and Qing Wang. 2021. "Evaluation of Bank Innovation Efficiency with Data Envelopment Analysis: From the Perspective of Uncovering the Black Box between Input and Output" Mathematics 9, no. 24: 3318. https://doi.org/10.3390/math9243318

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop