Solving Regression Problems with Intelligent Machine Learner for Engineering Informatics

Chou, Jui-Sheng; Truong, Dinh-Nhat; Tsai, Chih-Fong

doi:10.3390/math9060686

Open AccessArticle

Solving Regression Problems with Intelligent Machine Learner for Engineering Informatics

by

Jui-Sheng Chou

^1,*

,

Dinh-Nhat Truong

^1,2

and

Chih-Fong Tsai

³

¹

Department of Civil and Construction Engineering, National Taiwan University of Science and Technology, Taipei City 106335, Taiwan

²

Department of Civil Engineering, University of Architecture Ho Chi Minh City (UAH), Ho Chi Minh City 700000, Vietnam

³

Department of Information Management, National Central University, Taoyuan City 320317, Taiwan

^*

Author to whom correspondence should be addressed.

Mathematics 2021, 9(6), 686; https://doi.org/10.3390/math9060686

Submission received: 18 February 2021 / Revised: 17 March 2021 / Accepted: 19 March 2021 / Published: 23 March 2021

(This article belongs to the Special Issue Advances in Artificial Intelligence: Models, Optimization, and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

Machine learning techniques have been used to develop many regression models to make predictions based on experience and historical data. They might be used singly or in ensembles. Single models are either classification or regression models that use one technique, while ensemble models combine various single models. To construct or find the best model is very complex and time-consuming, so this study develops a new platform, called intelligent Machine Learner (iML), to automatically build popular models and identify the best one. The iML platform is benchmarked with WEKA by analyzing publicly available datasets. After that, four industrial experiments are conducted to evaluate the performance of iML. In all cases, the best models determined by iML are superior to prior studies in terms of accuracy and computation time. Thus, the iML is a powerful and efficient tool for solving regression problems in engineering informatics.

Keywords:

applied machine learning; classification and regression; data mining; ensemble model; engineering informatics

1. Introduction

Machine Learning (ML)-based methods for building prediction models have attracted abundant scientific attention and are extensively used in industrial engineering [1,2,3], design optimization of electromagnetic devices, and other areas [4,5]. The ML-based methods have been confirmed to be effective for solving real-world engineering problems [6,7,8]. Various supervised ML techniques (e.g., artificial neural network, support vector machine, classification and regression tree, linear (ridge) regression, and logistic regression) are typically used individually to construct single models and ensemble models [9,10]. To construct a series of models and identify the best one among these ML techniques, users need a comprehensive knowledge of ML and spend a significant effort building advanced models.

The primary objective of this research is to develop a user-friendly and powerful ML platform, called intelligent Machine Learner (iML), to help its users to solve real-world engineering problems with a shorter training time and greater accuracy than before. The iML can automatically build and scan all regression models, and then identify the best one. Novice users with no experience of ML can easily use this system. Briefly, the iML (1) helps users to make prediction model easily; (2) provides an overview of the parameter settings for the purpose of making objective choices; and (3) yields clear performance indicators, facilitating reading and understanding of the results, on which decisions can be based.

Four experiments were carried out to evaluate the performance of iML and were compared with previous studies. In the first experiment, empirical data concerning enterprise resource planning (ERP) for software projects by a leading Taiwan software provider over the last five years were collected and analyzed [1]. The datasets in the other three experiments were published on the UCI website [11,12,13]. Specifically, the purpose of the second experiment was to train a regression model of comparing the performance of CPU processors by using some characteristics as input. The third experiment involved forecasting the demand supporting structured productivity and high levels of customer service, and the fourth experiment involved estimating the total bikes rented per day.

The rest of this paper is organized as follows. Section 2 reviews application of machine learning techniques in various disciplines. Section 3 presents the proposed methodology and iML framework. Section 4 introduces the evaluation metrics to measure accuracy of the developed system. Section 5 demonstrates iML’s interface. Section 6 shows benchmarks between iML and WEKA (a free, open source program). Section 7 exhibits the applicability of iML in numerical experiments. Section 8 draws conclusions, and provides managerial implications and suggestions for future research.

2. Literature Review

Numerous researchers in various fields, such as ecology [14,15], materials properties [16,17,18], water resource [19], energy management [20], and decision support [21,22], use data-mining techniques to solve regression problems, and especially project-related problems [23,24]. Artificial neural network (ANN), support vector machine/regression (SVM/SVR) classification and regression tree (CART), linear ridge regression (LRR), and logistic regression (LgR) are the most commonly used methods for this purpose and are all considered to be among the best machine learning techniques [25,26,27]. Similarly, four popular ensemble models, including voting, bagging, stacking and tiering [28,29,30], can be built based on the meta-combination rules of aforementioned single models.

Chou (2009) [31] developed a generalized linear model-based expert system for estimating the cost of transportation projects. Dandikas et al. (2018) [32] assessed the advantages and disadvantages of regression models for predicting potential of biomethane. The results indicated that the regression method could predict variations in the methane yield and could be used to rank substrates for production quality. However, least squares-based regression usually leads to overfitting a model, failure to find unique solutions, and issues dealing with multicollinearity among the predictors [33], so ridge regression, another type of regularized regression, is favorably integrated in this study to avoid the above problems. Additionally, Sentas and Angelis (2006) [34] investigated the possibility of using some machine learning methods for estimating categorical missing values in software cost databases. They concluded that multinomial logistic regression was the best for imputation owing to its superior accuracy.

The general regression neural network was originally designed chiefly to solve regression problems [24,35]. Caputo and Pelagagge (2008) [36] compared the ANN with the parametric methods for estimating the cost of manufacturing large, complex-shaped pressure vessels in engineer-to-order manufacturing systems. Their comparison demonstrated that the ANN was more effective than the parametric models, presumably because of its better mapping capabilities. Rocabruno-Valdés et al. (2015) [37] developed models based on ANN for predicting the density, dynamic viscosity, and cetane number of methyl esters and biodiesel. Similarly, Ganesan et al. (2015) [38] used ANN to predict the performance and exhaust emissions of a diesel electricity generator.

SVM was originally developed by Vapnik (1999) for classification (SVM) and regression (SVR) [39,40]. Jing et al. (2018) [41] used SVM to classify air balancing, which is a key element for heating, ventilating, air-conditioning (HAVC), and variable air volume (VAV) system installation, and is useful for improving the energy efficiency by minimizing unnecessary fresh air to the air-conditioned zones. The results demonstrated that SVM achieved 4.6% of relative error value and is a promising approach for air balancing. García-Floriano et al. (2018) [42] used SVR to model software maintenance (SM) effort prediction. The SVR model was superior to regression, neural networks, association rules and decision trees, with 95% confidence level.

The classification and regression tree method (CART), introduced by Breiman et al. (2017) [43], is an effective method to solve classification and regression problems [42]. Choi and Seo (2018) [44] predicted the fecal coliform in the North Han River, South Korea by CART models, the test results showed the total correct classification rates of the four models ranged from 83.7% to 93.0%. Ru et al. (2016) [45] used the CART model to predict cadmium enrichment levels in reclaimed coastal soils. The results showed that cadmium enrichment levels had an accuracy of 78.0%. Similarly, Li (2006) [16] used CART to predict materials properties and behavior. Chou et al. (2014, 2017) [26,46] utilized the CART method to modeling steel pitting risk and corrosion rate and forecasting project dispute resolutions.

In addition to the aforementioned single models, Elish (2013) [47] used voting ensemble for estimating software development effort. The ensemble model outperformed all the single models in terms of Mean Magnitude of Relative Error (MMRE), and achieved competitive percentage of observations whose Magnitude of Relative Error (MRE) is less than 0.25 (PRED (25)) and recently proposed Evaluation Function (EF) results. Wang at el. (2018) demonstrated that ensemble bagging tree (EBT) model could accurately predict hourly building energy usage with MAPE ranging from 2.97% to 4.63% [48]. Comparing to the conventional single prediction model, EBT is superior in prediction accuracy and stability. However, it requires more computation time and is short of interpretability owing to its sophisticated model structure.

Chen et al. (2019) [49] showed that the stacking model outperformed the individual models, achieving the highest

R^{2}

of 0.85, followed by XGBoost (0.84), AdaBoost (0.84) and random forest (0.82). For the estimation of hourly PM2.5 in China, the stacking model exhibited relatively high stability, with

R^{2}

ranging from 0.79 to 0.92. Basant at el. (2016) [50] proposed a three-tier quantitative structure-activity relationship (QSAR) model. This model can be used for the screening of chemicals for future drug design and development process and safety assessment of the chemicals. In comparison with previously studies, the QSAR models on the same endpoint property showed the encouraging statistical quality of the proposed models.

According to the reviewed literature, various machine learning platforms have been developed for the past decades, such as the Scikit-Learn Python libraries, Google’s TensorFlow, WEKA and Microsoft Research’s CNTK. Users can find it easy to use a machine learning tool and/or framework to solve numerous problems as per their needs [51]. ML-based approaches have been confirmed to be effective in providing decisive information. Since there is no best model suitable to predict all problems (the “No Free Lunch” theorem [52,53]), a comprehensive comparison of single and ensemble models embedded within an efficient forecasting platform for solving real-world engineering problems is imperatively needed. The iML platform proposed in this study can efficiently address this issue.

3. Applied Machine Learning

3.1. Classification and Regression Model

3.1.1. Artificial Neural Network (ANN)

Neural networks (or artificial neural networks) comprise information-processing units, which are similar to the neurons in the human brain, except that a neural network is composed of artificial neurons (Figure 1) [54]. Particular, back-propagation networks (BPNNs) are widely used, and are known to be the most effective network models [55,56].

Equation (1) uses sigmoid function to activate each neuron in a hidden output layer, and the Scaled Conjugate Gradient Algorithm is used to calculate the weights of the network. BPNNs will be trained until the stopping criteria is reached by default settings in MATLAB.

n e t_{k} = \sum w_{k j} O_{j} and y_{k} = f (n e t_{k}) = \frac{1}{1 + e^{- n e t_{k}}}

(1)

where

n e t_{k}

is the activation of the

k^{th}

neuron; j is the set of neurons in the preceding layer;

w_{k j}

is the weight of the connection between neuron k and neuron j;

O_{j}

is the output neuron j; and

y_{k}

is the sigmoid or logistic transfer function.

3.1.2. Support Vector Machine (SVM) and Support Vector Regression (SVR)

Developed by Cortes and Vapnik (1995) [57], SVM is used for binary classification problems. The SVM was created based on decision hyper-planes that determine decision boundaries in an input space or a high-dimensional feature space [40,58]. Binary classification can only classify samples into negative and positive while multi-class classification problems are complex (Figure 2). In this study, One Against All (OAA) is used to solve multiple classification problems.

The OAA-SVM constructs m SVM models for

m

-class classification problems, and the

i^{t h}

SVM model is trained based on the dataset of the

i^{t h}

class which includes a positive class and a negative class. In training, a set of

l

data points

{(x_{i}, y_{i})}_{i = 1}^{l}

, where

x_{i} \in R^{n}

the input data, and

y_{i} \in (1, 2, \dots, m)

is the class label of

x_{i}

; the

i^{t h}

SVM model is solved using the following optimization problem equation [59].

\min_{w^{i}, b, ξ} J (w^{i}, b, ξ) = \frac{1}{2} {(w^{i})}^{T} w^{i} + C \sum_{i = 1}^{l} ξ_{j}^{i}

(2)

subject to : {\begin{matrix} {(w^{i})}^{T} φ (x_{j}) + b^{i} \geq 1 - ξ_{j}^{i}, y_{j} = i, \\ {(w^{i})}^{T} φ (x_{j}) + b^{i} \leq - 1 + ξ_{j}^{i}, y_{j} \neq i, \\ ξ_{j}^{i} \geq 0, j = 1, \dots, l . \end{matrix}

(3)

When the SVM models have been solved, the class label of example

x

is predicted as follows:

y (x) = \arg \max_{i = 1 \dots m} ({(w^{i})}^{T} φ (x) + b^{i})

(4)

where

i

is the

i^{th}

SVM model;

w^{i}

is a vector normal to the hyper-plane;

b^{i}

is a bias,

φ (x

) is a nonlinear function that maps

x

to a high-dimension feature space,

ξ^{i}

is the error in misclassification, and

C \geq 0

is a constant that specifies the trade-off between the classification margin and the cost of misclassification.

To train the SVM model, radial basic function (RBF) kernel maps samples non-linearly into a feature space with more dimensions. In this study, the RBF kernel is used as SVM kernel function.

K (x_{i}, x_{j}) = \exp (\frac{- ‖ x_{i} - x_{j} ‖^{2}}{2 σ^{2}})

(5)

where

σ

is a positive parameter that controls the radius of RBF kernel function.

Support vector regression (SVR) [40] is one version of SVM. SVR computes a linear regression function for the new higher-dimensional feature space using

ε

-insensitive loss while simultaneously reducing model complexity of the model by minimizing

{‖ w ‖}^{2}

. This process can be implemented by introducing (non-negative) slack variables

ξ_{i}, ξ_{i}^{*}

to measure the deviation in training samples outside the

ε

-insensitive zone. The SVR can be formulated as the minimization of the following equation:

\min_{w, b, ξ} J (w, b, ξ) = \frac{1}{2} {(w)}^{T} w + C \sum_{i = 1}^{l} (ξ_{i} + ξ_{i}^{*})

(6)

subject to : {\begin{matrix} y_{i} - f (x_{i}, w) \leq ε + ξ_{i}^{*} \\ f (x_{i}, w) - y_{i} \leq ε + ξ_{i} \\ ξ_{i}^{*}, ξ_{i} \geq 0, i = 1, \dots, n \end{matrix}

(7)

When SVR model has been solved, the value of example

x

is predicted as follows.

f (x) = \sum (α_{i} - α_{i}^{*}) K (x_{i}, x) + b

(8)

where

K (x_{i}, x)

is the kernel function and

α_{i}^{*}, α_{i}

are Lagrange multipliers in the dual function.

3.1.3. Classification and Regression Tree (CART)

Classification and regression tree technique is described as a tree on which each internal (non-leaf) node represents a test of an attribute, each branch represents the test result, and each leaf (or terminal) node has a class label and class result (Figure 3) [60]. The tree is “trimmed” until total error is minimized to optimize the predictive accuracy of the tree by minimizing the number of branches. The training CART is constructed through the Gini index. The formulas are as follows.

g (t) = \sum_{j \neq i} p (j | t) p (i | t)

(9)

p (j | t) = \frac{p (j, t)}{p (t)}

(10)

p (j, t) = \frac{p (j) N_{j} (t)}{N_{j}}

(11)

p (t) = \sum_{j} p (j, t)

(12)

G i n i i n d e x = 1 - \sum p {(j, t)}^{2}

(13)

where

i

and

j

are the categorical variables in each item;

N_{j} (t)

is the recorded number of nodes

t

in category

j

; and

N_{j}

. is the recorded number of the root nodes in category

j

; and

p (j)

is the prior probability value for category

j

.

3.1.4. Linear Ridge Regression (LRR) and Logistic Regression (LgR)

Statistical models of the relationship between dependent variables (response variables) and independent variables (explanatory variables) are developed using linear regression (Figure 4). The general formula for multiple regression models is as follows.

y = f (x) = β_{o} + \sum_{j = 1}^{n} β_{j} x_{j} + ε

(14)

where

y

is a dependent variable;

β_{o}

is a constant;

β_{j}

is a regression coefficient (

j = 1, 2, \dots, n

), and

ε

is an error term.

Linear ridge regression (LRR) is a regularization technique that can be used together with generic regression algorithms to model highly correlated data [61,62]. Least squares method is a powerful technique for training the LRR model, which denotes

β

to minimize the Residual Sum Squares (RSS)-function. Therefore, the cost function is presented as below.

C o s t (β) = R S S (β) = \sum_{i = 1}^{l} {(y - y^{'})}^{2} + λ (\sum_{j = 1}^{n} β_{j}^{2})

(15)

y^{'} = β_{0} + \sum β_{j} x_{j}

(16)

where

λ

is a pre-chosen constant, which is the product of a penalty term and the squared norm in the

β

vector of regression method, and

y^{'}

is the predicted values.

Statistician David Cox developed logistic regression in 1958 [63]. An explanation of logistic regression begins with an explanation of the standard logistic function. Equation (17) mathematically represents the logistic regression model.

p (x) = \frac{1}{1 + e^{- (β_{o} + \sum_{j = 1}^{n} β_{j} x_{j})}}

(17)

where

p (x)

is the probability that the dependent variable equals a “success” or “case” rather than a failure or non-case.

β_{o}

and

β_{j}

are found by minimizing cost function defined in Equation (18).

C o s t (β) = - (\sum_{i = 1}^{l} (y_{i} \ln (p (x_{i})) + (1 - y_{i}) \ln (1 - p (x_{i})))) + \frac{λ}{2} \sum_{j = 1}^{n} β_{j}^{2}

(18)

where

y_{i}

is the observed outcome of case

x_{i}

, having 0 or 1 as possible values [64]

3.2. Ensemble Regression Model

In this study, several ensemble schemes, including voting, bagging, stacking, and tiering were investigated using the input data and described as below.

Voting: The voting ensemble model combines the outputs of the single models using a meta-rule. The mean of the output values is used in this study. According to the adopted ML models, 11 voting models are trained in this study, including (1) ANN + SVR, (2) ANN + CART, (3) ANN + LRR, (4) SVR + CART, (5) SVR + LRR, (6) CART + LRR, (7) ANN + SVR + CART, (8) ANN + CART + LRR, (9) ANN + CART + LRR, (10) SVR + CART + LRR, (11) ANN + SVR + CART + LRR. Figure 5a presents the voting ensemble model.
Bagging: The bagging ensemble model duplicates samples at random, and each regression model predicts values from the samples independently. The meta-rule is applied to all of the outputs in this study. Bagging ensemble model is depicted at Figure 5b.
Stacking: The stacking ensemble model is a two-stage model, and Figure 5c describes the principle of the model. In stage 1, each single model predicts one output value. Then, these outputs are used as inputs to train a model by these machine learning techniques again to make a meta-prediction in stage 2. There are four stacking models herein, including ANN (ANN, SVR, CART, LRR); SVR (ANN, SVR, CART, LRR); CART (ANN, SVR, CART, LRR); LRR (ANN, SVR, CART, LRR).
Tiering: Figure 5d illustrates the tiering ensemble model. There are two tiers inside a tiering ensemble model in this study. The first tier is to classify data into $k$ classes on the basis of $T$ value [18]. Machine learning technique in the first tier for classifying data needs to be identified. After classifying the data, the regression machine learning is used to train each data (Sub Data) of each class (second tier) to predict results. In the iML, we developed three types of models, including 2-class, 3-class, and 4-class. The equation for calculating $T$ value is:

$T = \frac{y_{m a x} + y_{m i n}}{k}$

(19)

where T is standard value, $k$ is the number of classes, and $y_{m a x}$ and $y_{m i n}$ are the maximum and minimum of actual values, respectively.

3.3. K-Fold Cross Validation

K-fold cross validation is used to compare two or more prediction models. This method randomly divides a sample into a training sample and a test sample by splitting into K subsets. K-1 subsets are selected to train the model while the other is used to test, and this training process is repeated K times (Figure 6). To compare models, the average of performance results (e.g., RMSE, and MAPE) is computed. Kohavi (1995) stated that K = 10 provides analytical validity, computational efficiency, and optimal deviation [65]. Thus, K = 10 is used in this study. Performance metrics will be explained in details Section 4.

3.4. Intelligent Machine Learner Framework

Figure 7 presents the structure of iML. In stage 1 (data preprocessing), the data is classified distinctly for particular use in the Tiering ensemble model. Meanwhile, all data is divided into two main data groups, namely, learning data and test data, and the learning data is duplicated for training ensemble models.

At the next stage, all retrieved data is automatically used for training models, which include single models (ANN, SVR, LRR, and CART), and ensemble models (voting, bagging, stacking, and tiering). Notably, the tiering ensemble model needs to employ a classification technique to assign a class label to the original input at the first tier. A corresponding regression model for the particular class is then adopted at the second tier to obtain the predictive value [17,26].

Finally, in stage 3 (find the best model), the predictive performances of all the models learned (trained) in stage 2 using test dataset are compared to identify the best models. Section 4 describes the performance evaluation metrics in detail.

4. Mathematical Formulas for Performance Measures

To measure the performance of classification models, the accuracy, precision, sensitivity, specificity and the area under the curve (AUC) are calculated. For the regression models, five-performance measures, (i.e., correlation coefficient (R), mean absolute error (MAE), mean absolute percentage error (MAPE), root mean squared error (RMSE), and total error rate (TER)) are calculated. Table 1 presents a confusion matrix and Table 2 exhibits those performance measures [17,66].

In Table 2, MAE is the mean absolute difference between the prediction and the actual value. MAPE represents the mean percentage error between prediction and actual value, the smaller value of MAPE, the better prediction result achieved by the model. The MAPE is the index typically used to evaluate the accuracy of prediction models. RMSE represents the dispersion of errors by a prediction model. The statistical index that shows the linear correlation between two variables is denoted as R. Lastly, TER is the total difference of predicted and actual values [17].

The goal is to identify the model that yields the lowest error of test data. To obtain a comprehensive performance measure, the five statistical measures (RMSE, MAE, MAPE, 1-R, and TER) were combined into a synthesis index (SI) using Equation (20). Based on the

SI

values, the best model is identified.

SI = \frac{1}{m_{p}} \sum_{i = 1}^{m_{p}} (\frac{P_{i} - P_{m i n, i}}{P_{m a x, i} - P_{m i n, i}})

(20)

where

m_{p}

= number of performance measures;

P_{i} = i^{th}

performance measure; and

P_{m i n, i}

and

P_{m a x, i}

are the maximum and minimum of

i^{th}

measure. The

SI

range is 0–1; the

SI

value close to 0 indicates a better accuracy of the predictive model.

5. Design and Implementation of iML Interface

The iML was developed in MATLAB R2016a on a PC with an Intel Core i5-750 CPU, a clock speed of 3.4 GHz, and 8 GB of RAM, running Windows 10. Figure 8 presents a user-friendly interface for iML. First, users select models on setting-parameters board and set the parameters for the chosen models, which will be trained and analyzed. Next, users choose whether to test with either “K-Fold Validation” or “Percentage Split” before uploading the data. Notably, if “Percentage Split” is selected, the user only has to input percentage value of learning data. Then, users click on the “Run” button to train the model. Finally, the “Make Report” function is to create a report containing performance metrics of all selected models and the identified best model. Figure 9 displays a snapshot of report file in notepad.

6. Benchmarks between iML and WEKA

6.1. Publicly Available Datasets

Table 3 shows the publicly available datasets from the UCI Machine Learning Repository (https://archive.ics.uci.edu/mL/index.php; accessed 1 March 2021). The iML is benchmarked with WEKA (a free, open source program) using hold-out validation and K-fold cross-validation on the target datasets. All algorithm parameters are set default for both iML and WEKA platforms.

6.2. Hold-Out Validation

In this test, datasets are randomly partitioned into 80% and 20% for learning and test, respectively. Table 4, Table 5, Table 6, Table 7 and Table 8 show the one-time performance results on these five datasets. A model with a normalized SI value of 0.000 is the best prediction model among all the models tested by iML and WEKA. Notably, the best model can be automatically identified by iML with “one-click”. To train models with WEKA, the users need to build each model individually. Moreover, iML gives better test results of single, voting and bagging models than those of WEKA. Based on the benchmark results, iML is effective to find the best model in the hold-out validation.

6.3. K-Fold Cross-Validation

Tenfold cross-validation is used to evaluate the generalized performance of WEKA and iML. Table 9, Table 10, Table 11, Table 12 and Table 13 show the average performance measures of five datasets, respectively. Similarly, iML identifies better models in single, voting, and bagging schemes than those trained by WEKA. The best model for each dataset is automatically determined by iML. Therefore, iML is a powerful tool to find the best model in the cross-fold validation.

6.4. Discussion

Single, voting, bagging, and stacking models are compared using WEKA and iML, except for the tiering method, which is not available in WEKA. Additionally, unlike manual construction of individual models in WEKA interface, iML can automatically build and identify the best model for the imported datasets. Hold-out validation and tenfold cross-validation are used to evaluate the performance results (R, MAE, RMSE, and MAPE) in each scheme (single, voting, bagging, and stacking). The analytical results of either validation show that most of the models trained by iML are superior to those trained by WEKA using the same datasets. Hence, iML is an effective platform to solve regression problems.

7. Numerical Experiments

This section validates iML by using various industrial datasets, including (1) enterprise resource planning data [1], (2) CPU computer performance data [12], (3) customer data for a logistics company [13], and (4) daily data bike rentals [11]. Table 14 presents the initial parameter settings for these problems.

7.1. Enterprise Resource Planning Software Development Effort

Enterprise Resource Planning (ERP) data for 182 software projects of a leading Taiwan software provider over the last five years was collected, analyzed, and tested with K-fold cross validation.

7.1.1. Variable Selection

Experienced in-house project managers were interviewed to identify factors that affect the ERP software development effort (SDE). There are 182 samples and 17 attributes, and Table 15 summarizes the descriptive statistical data in details. The input and output attributes are defined by Chou el at. (2012) [1].

7.1.2. iML Results

iML automatically trains the models and calculates the performance values. Then, it compares the

SI

values (

{SI}_{local} and {SI}_{globlal}

) among the selected modeling type (singe, voting ensemble, bagging ensemble, stacking ensemble and tiering ensemble). Table 16 presents the detailed results of iML and Figure 10 plots the RMSE of best models for the studied case. Both

{SI}_{local}

and

{SI}_{global}

values of bagging ANN ensemble are equal to zero, which indicate that the bagging ANN ensemble is the best model in terms of prediction accuracy.

Three models (single, voting, and bagging) provided better results in terms of R (0.94 to 0.99) than the tiering and stacking ensemble models, which had the R values of 0.58 to 0.95. Among these three best models, in terms of MAPE, the bagging model exhibited the best balance of MAPE results from learning and test data (21.45% and 19.50%, respectively). The single and voting models depicted un-balanced MAPEs for training and test data (19.91% and 30.65% for the single model; 16.83% and 33.90% for the voting model). Thus, the bagging model was the best model to predict ERP.

The first experiment indicates that, the iML not only identifies the best model, but also reports the performance values of all the training models. Chou et al. (2012) obtained training and testing MAPEs of 26.8% and 27.3%, and RMSEs of 234.0157 h and 97.2667 h using Evolutionary Support Vector Machine Inference Model (ESIM) [1]. The iML yields the bagging ensemble model with MAPEs of 21.45% and 19.50%, and RMSEs of 70.28hr and 65.58 h for the same training and test data, respectively. As a result, the iML is effective to find the best model among the popular regression models.

7.2. Experiments on Industrial Datasets

Three additional experiments were performed to evaluate iML. To ensure a fair comparison, 70 % of the data was used for learning whereas the remaining 30% was utilized for testing.

7.2.1. Performance of CPU Processors

This experiment is about the comparison of performance of CPU processors. The data for this experiment was taken from Maurya and Gupta (2015) [12]. This dataset contained 209 samples with a total of 6 attributes (Table 17). The descriptions of the attributes are as follows: X₁: Machine cycle time in nanoseconds (integer, input); X₂: Minimum main memory in kilobytes (integer, input); X₃: Maximum main memory in kilobytes (integer, input); X₄: Cache memory in kilobytes (integer, input); X₅: Minimum channels in units (integer, input); X₆: Maximum channels in units (integer, input); and Y: Estimated relative performance (integer, output).

7.2.2. Daily Demand Forecasting Orders

This experiment is about the daily demand forecasting orders. The data used in this experiment was taken from Ferreira et al. (2016) [13]. Table 18 shows a statistical analysis of the data. There were 60 samples with 12 attributes, including X₁: Week of the month (first week, second, third or fourth week of month, input); X₂: Day of the week (Monday to Friday, input); X₃: Urgent orders (integer, input); X₄: Non-urgent orders (integer, input); X₅: Type A orders (integer, input); X₆: Type B orders (integer, input); X₇: Orders of type C (integer, input); X₈: Orders from the tax sector (integer, input); X₉: Orders from the traffic controller sector (integer, input); X₁₀: Orders from the banking sector 1 (integer, input); X₁₁: Orders from the banking sector 2 (integer, input); X₁₂: Banking orders 3 (integer, input); and Y: Total orders (integer, output).

7.2.3. Total Hourly-Shared Bike Rental per Days

The experiment is about the total hourly-shared bike rental per days. The data was adopted from Fanaee-T and Gama (2014) [11], and statistically analyzed in Table 19. In total, there were 731 samples and 11 attributes, defined as follows: X₁: Season (1: spring, 2: summer, 3: fall, 4: winter, input); X₂: Month (1 to 12, input); X₃: Year (0:2011, 1:2012, input); X₄: Weather day is holiday or not (input); X₅: Day of the week (input); X₆: Working day if day is neither weekend nor holiday is 1, otherwise is 0 (input); X₇: Weather condition (1: Clear, Few clouds, partly cloudy; 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist; 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds; 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog, input); X₈: Normalized temperature in Celsius. The values are divided to 41 (max) (input); X₉: Normalized feeling temperature in Celsius. The values are divided to 50 (max) (input); X₁₀: Normalized humidity. The values are divided to 100 (max) (input); X₁₁: Normalized wind speed. The values are divided to 67 (max) (input); and Y: Count of total rental bikes including both casual and registered (output).

In this study, to calculate MAPE, the output was normalized and 0.1 was added to prevent a zero value.

y_{i} = \frac{y_{i} - y_{m i n}}{y_{m a x} - y_{m i n}} + 0.1

(21)

where

y_{i}

,

y_{m i n}

, and

y_{m a x}

are actual value, minimum and maximum of actual value, respectively.

7.2.4. Performance Results

Table 20 presents the performance results of all models for the three additional datasets. Using the same dataset in the experiment No. 2, Maurya and Gupta (2015) [12] trained ANN models with the maximum R-learn and R-test values of 0.98146 and 0.98662, respectively. Meanwhile, the iML identifies the ANN single model as the best model with R-learn and R-test values of 0.99990 and 0.99629, respectively. The iML gives out a slightly better model than those of the previous research in this numerical experiment.

In the experiment No. 3, Ferreira et al. (2016) had an analytical result of MAPE 3.45% and iML confirms ANN single model as the best model, with MAPE values for learning and test of 0.023% and 0.093%, respectively [13]. The stacking ANN ensemble also performs well with the MAPEs for the learning and test data by 0.026% and 0.010%, respectively.

Finally, in the experiment No. 4, iML achieves R-learn and R-test values of 0.97660 and 0.94790, with bagging ANN as the best model. In contrast, Fanaee-T and Gama (2014) obtained a maximum R value of 0.91990 [11].

As shown in the above numerical experiments, iML trains and identifies the best models which are better than those in the previous studies.

8. Conclusions and Future Work

This study develops an iML platform to efficiently operate data-mining techniques. The iML is designed to be user-friendly, so users can get the results with only “One-Click”. The numerical experiments have demonstrated that iML is a powerful soft computing to identify the best prediction model by automating comparison among diverse machine learning techniques.

To benchmark the effectiveness of iML with WEKA, five datasets collected from the UCI Machine Learning Repository were analyzed via hold-out validation and tenfold cross validation. The performance results indicate that iML can find a more accurate model than that of WEKA in the publicly available datasets. The best prediction model identified by iML is also the best model among all the models trained by iML and WEKA. Notably, iML requires minimal effort from the users to build single, voting, bagging, and stacking models in comparison with WEKA.

Four industrial experiments were carried out to validate the performance of iML. The first experiment involved training a model for prediction of ERP development effort, in which iML yielded an RMSE for learning data with 70.28 h and for testing data with 65.58 h, by using the bagging ANN ensemble (best model). In contrast, Chou et al. (2012) [1] obtained training and testing RMSE values of 234.0157 h and 97.2667 h, respectively.

In the second experiment on performance of CPU processors, iML yielded 0.99990 for R-learning and 0.99629 for R-testing, which are better than those reported in Maurya and Gupta (2015) [12], and confirmed that single ANN was the best model. In the third experiment of daily demand forecasting orders, iML achieved MAPE values of 0.026% (learning) and 0.010% (testing). The results are as excellent as those obtained in Ferreira et al. (2016) [13]. In the fourth experiment for total hourly-shared bike rental, R-learning and R-testing values of 0.97660 and 0.94790 were reached using iML. The test performance was 6% better than that obtained by Fanaee-T and Gama (2014) [11]. In addition to the enhanced prediction performance, the iML possesses ability to determine the best models on the basis of multiple evaluation metrics.

In conclusion, the iML is a powerful and promising prediction platform for solving diverse engineering problems. Since the iML platform can only deal with regression problems, future research should upgrade iML for solving complex classification and time series problems by automatically presenting the alternative models for practical use in engineering applications, as well as adding some other advanced ML methods (such as deep learning models). Moreover, metaheuristic optimization algorithms could be integrated with the iML to help the users finetune the hyperparameters of chosen machine learning models.

Author Contributions

Conceptualization, J.-S.C.; data curation, D.-N.T.; formal analysis, J.-S.C. and D.-N.T.; funding acquisition, J.-S.C.; investigation, J.-S.C., D.-N.T. and C.-F.T.; methodology, J.-S.C. and C.-F.T.; project administration, J.-S.C.; resources, J.-S.C. and C.-F.T.; software, D.-N.T.; supervision, J.-S.C.; validation, J.-S.C., D.-N.T. and C.-F.T.; visualization, J.-S.C. and D.-N.T.; writing—original draft, J.-S.C., D.-N.T. and C.-F.T.; writing—review and editing, J.-S.C. and D.-N.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Science and Technology, Taiwan, under grants 108-2221-E-011-003-MY3 and 107-2221-E-011-035-MY3.

Data Availability Statement

The data that support the findings of this study are available from the UCI Machine Learning Repository or corresponding author upon reasonable request.

Acknowledgments

The authors would like to thank the Ministry of Science and Technology, Taiwan, for financially supporting this research.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

Chou, J.-S.; Cheng, M.-Y.; Wu, Y.-W.; Wu, C.-C. Forecasting enterprise resource planning software effort using evolutionary support vector machine inference model. Int. J. Proj. Manag. 2012, 30, 967–977. [Google Scholar] [CrossRef]
Pham, A.-D.; Ngo, N.-T.; Nguyen, Q.-T.; Truong, N.-S. Hybrid machine learning for predicting strength of sustainable concrete. Soft Comput. 2020. [Google Scholar] [CrossRef]
Cheng, M.-Y.; Chou, J.-S.; Cao, M.-T. Nature-inspired metaheuristic multivariate adaptive regression splines for predicting refrigeration system performance. Soft Comput. 2015, 21, 477–489. [Google Scholar] [CrossRef]
Li, Y.; Lei, G.; Bramerdorfer, G.; Peng, S.; Sun, X.; Zhu, J. Machine Learning for Design Optimization of Electromagnetic Devices: Recent Developments and Future Directions. Appl. Sci. 2021, 11, 1627. [Google Scholar] [CrossRef]
Piersanti, S.; Orlandi, A.; Paulis, F.d. Electromagnetic Absorbing Materials Design by Optimization Using a Machine Learning Approach. IEEE Trans. Electromagn. Compat. 2018, 1–8. [Google Scholar] [CrossRef]
Chou, J.S.; Pham, A.D. Smart artificial firefly colony algorithm-based support vector regression for enhanced forecasting in civil engineering. Comput.-Aided Civ. Infrastruct. Eng. 2015, 30, 715–732. [Google Scholar] [CrossRef]
Cheng, M.-Y.; Prayogo, D.; Wu, Y.-W. A self-tuning least squares support vector machine for estimating the pavement rutting behavior of asphalt mixtures. Soft Comput. 2019, 23, 7755–7768. [Google Scholar] [CrossRef]
Al-Ali, H.; Cuzzocrea, A.; Damiani, E.; Mizouni, R.; Tello, G. A composite machine-learning-based framework for supporting low-level event logs to high-level business process model activities mappings enhanced by flexible BPMN model translation. Soft Comput. 2019. [Google Scholar] [CrossRef]
López, J.; Maldonado, S.; Carrasco, M. A novel multi-class SVM model using second-order cone constraints. Appl. Intell. 2016, 44, 457–469. [Google Scholar] [CrossRef]
Bogawar, P.S.; Bhoyar, K.K. An improved multiclass support vector machine classifier using reduced hyper-plane with skewed binary tree. Appl. Intell. 2018, 48, 4382–4391. [Google Scholar] [CrossRef]
Fanaee-T, H.; Gama, J. Event labeling combining ensemble detectors and background knowledge. Prog. Artif. Intell. 2014, 2, 113–127. [Google Scholar] [CrossRef] [Green Version]
Maurya, V.; Gupta, S.C. Comparative Analysis of Processors Performance Using ANN. In Proceedings of the 2015 5th International Conference on IT Convergence and Security (ICITCS), Kuala Lumpur, Malaysia, 24–27 August 2015; pp. 1–5. [Google Scholar]
Ferreira, R.P.; Martiniano, A.; Ferreira, A.; Ferreira, A.; Sassi, R.J. Study on Daily Demand Forecasting Orders using Artificial Neural Network. IEEE Lat. Am. Trans. 2016, 14, 1519–1525. [Google Scholar] [CrossRef]
De’ath, G.; Fabricius, K.E. Classification and regression trees: A powerful yet simple technique for ecological data analysis. Ecology 2000, 81, 3178–3192. [Google Scholar] [CrossRef]
Li, H.; Wen, G. Modeling reverse thinking for machine learning. Soft Comput. 2020, 24, 1483–1496. [Google Scholar] [CrossRef] [Green Version]
Li, Y. Predicting materials properties and behavior using classification and regression trees. Mater. Sci. Eng. A 2006, 433, 261–268. [Google Scholar] [CrossRef]
Chou, J.-S.; Yang, K.-H.; Lin, J.-Y. Peak Shear Strength of Discrete Fiber-Reinforced Soils Computed by Machine Learning and Metaensemble Methods. J. Comput. Civ. Eng. 2016, 30, 04016036. [Google Scholar] [CrossRef] [Green Version]
Qi, C.; Tang, X. Slope stability prediction using integrated metaheuristic and machine learning approaches: A comparative study. Comput. Ind. Eng. 2018, 118, 112–122. [Google Scholar] [CrossRef]
Chou, J.-S.; Ho, C.-C.; Hoang, H.-S. Determining quality of water in reservoir using machine learning. Ecol. Inform. 2018, 44, 57–75. [Google Scholar] [CrossRef]
Chou, J.-S.; Bui, D.-K. Modeling heating and cooling loads by artificial intelligence for energy-efficient building design. Energy Build. 2014, 82, 437–446. [Google Scholar] [CrossRef]
Alkahtani, M.; Choudhary, A.; De, A.; Harding, J.A. A decision support system based on ontology and data mining to improve design using warranty data. Comput. Ind. Eng. 2018. [Google Scholar] [CrossRef] [Green Version]
Daras, G.; Agard, B.; Penz, B. A spatial data pre-processing tool to improve the quality of the analysis and to reduce preparation duration. Comput. Ind. Eng. 2018, 119, 219–232. [Google Scholar] [CrossRef]
Chou, J.-S.; Tsai, C.-F. Preliminary cost estimates for thin-film transistor liquid–crystal display inspection and repair equipment: A hybrid hierarchical approach. Comput. Ind. Eng. 2012, 62, 661–669. [Google Scholar] [CrossRef]
Chen, T. An ANN approach for modeling the multisource yield learning process with semiconductor manufacturing as an example. Comput. Ind. Eng. 2017, 103, 98–104. [Google Scholar] [CrossRef]
Wu, X.; Kumar, V.; Quinlan, J.R.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G.J.; Ng, A.; Liu, B.; Philip, S.Y. Top 10 algorithms in data mining. Knowl. Inf. Syst. 2008, 14, 1–37. [Google Scholar] [CrossRef] [Green Version]
Chou, J.-S.; Ngo, N.-T.; Chong, W.K. The use of artificial intelligence combiners for modeling steel pitting risk and corrosion rate. Eng. Appl. Artif. Intell. 2017, 65, 471–483. [Google Scholar] [CrossRef]
Das, D.; Pratihar, D.K.; Roy, G.G.; Pal, A.R. Phenomenological model-based study on electron beam welding process, and input-output modeling using neural networks trained by back-propagation algorithm, genetic algorithms, particle swarm optimization algorithm and bat algorithm. Appl. Intell. 2018, 48, 2698–2718. [Google Scholar] [CrossRef]
Tewari, S.; Dwivedi, U.D. Ensemble-based big data analytics of lithofacies for automatic development of petroleum reservoirs. Comput. Ind. Eng. 2018. [Google Scholar] [CrossRef]
Priore, P.; Ponte, B.; Puente, J.; Gómez, A. Learning-based scheduling of flexible manufacturing systems using ensemble methods. Comput. Ind. Eng. 2018, 126, 282–291. [Google Scholar] [CrossRef] [Green Version]
Fang, K.; Jiang, Y.; Song, M. Customer profitability forecasting using Big Data analytics: A case study of the insurance industry. Comput. Ind. Eng. 2016, 101, 554–564. [Google Scholar] [CrossRef]
Chou, J.-S. Generalized linear model-based expert system for estimating the cost of transportation projects. Expert Syst. Appl. 2009, 36, 4253–4267. [Google Scholar] [CrossRef]
Dandikas, V.; Heuwinkel, H.; Lichti, F.; Drewes, J.E.; Koch, K. Predicting methane yield by linear regression models: A validation study for grassland biomass. Bioresour. Technol. 2018, 265, 372–379. [Google Scholar] [CrossRef] [PubMed]
Ngo, S.H.; Kemény, S.; Deák, A. Performance of the ridge regression method as applied to complex linear and nonlinear models. Chemom. Intell. Lab. Syst. 2003, 67, 69–78. [Google Scholar] [CrossRef]
Sentas, P.; Angelis, L. Categorical missing data imputation for software cost estimation by multinomial logistic regression. J. Syst. Softw. 2006, 79, 404–414. [Google Scholar] [CrossRef]
Slowik, A. Application of an Adaptive Differential Evolution Algorithm With Multiple Trial Vectors to Artificial Neural Network Training. IEEE Trans. Ind. Electron. 2011, 58, 3160–3167. [Google Scholar] [CrossRef]
Caputo, A.C.; Pelagagge, P.M. Parametric and neural methods for cost estimation of process vessels. Int. J. Prod. Econ. 2008, 112, 934–954. [Google Scholar] [CrossRef]
Rocabruno-Valdés, C.I.; Ramírez-Verduzco, L.F.; Hernández, J.A. Artificial neural network models to predict density, dynamic viscosity, and cetane number of biodiesel. Fuel 2015, 147, 9–17. [Google Scholar] [CrossRef]
Ganesan, P.; Rajakarunakaran, S.; Thirugnanasambandam, M.; Devaraj, D. Artificial neural network model to predict the diesel electric generator performance and exhaust emissions. Energy 2015, 83, 115–124. [Google Scholar] [CrossRef]
Vapnik, V.N. An overview of statistical learning theory. IEEE Trans. Neural Netw. 1999, 10, 988–999. [Google Scholar] [CrossRef] [Green Version]
Vapnik, V. The Nature of Statistical Learning Theory, 2nd ed.; Springer: New York, NY, USA, 2013. [Google Scholar]
Jing, G.; Cai, W.; Chen, H.; Zhai, D.; Cui, C.; Yin, X. An air balancing method using support vector machine for a ventilation system. Build. Environ. 2018, 143, 487–495. [Google Scholar] [CrossRef]
García-Floriano, A.; López-Martín, C.; Yáñez-Márquez, C.; Abran, A. Support vector regression for predicting software enhancement effort. Inf. Softw. Technol. 2018, 97, 99–109. [Google Scholar] [CrossRef]
Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; Routledge: New York, NY, USA, 2017; p. 368. [Google Scholar] [CrossRef]
Choi, S.Y.; Seo, I.W. Prediction of fecal coliform using logistic regression and tree-based classification models in the North Han River, South Korea. J. Hydro-Environ. Res. 2018, 21, 96–108. [Google Scholar] [CrossRef]
Ru, F.; Yin, A.; Jin, J.; Zhang, X.; Yang, X.; Zhang, M.; Gao, C. Prediction of cadmium enrichment in reclaimed coastal soils by classification and regression tree. Estuar. Coast. Shelf Sci. 2016, 177, 1–7. [Google Scholar] [CrossRef]
Chou, J.-S.; Tsai, C.-F.; Pham, A.-D.; Lu, Y.-H. Machine learning in concrete strength simulations: Multi-nation data analytics. Constr. Build. Mater. 2014, 73, 771–780. [Google Scholar] [CrossRef]
Elish, M.O. Assessment of voting ensemble for estimating software development effort. In Proceedings of the 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Singapore, 16–19 April 2013; pp. 316–321. [Google Scholar]
Wang, Z.; Wang, Y.; Srinivasan, R.S. A novel ensemble learning approach to support building energy use prediction. Energy Build. 2018, 159, 109–122. [Google Scholar] [CrossRef]
Chen, J.; Yin, J.; Zang, L.; Zhang, T.; Zhao, M. Stacking machine learning model for estimating hourly PM2.5 in China based on Himawari 8 aerosol optical depth data. Sci. Total Environ. 2019, 697, 134021. [Google Scholar] [CrossRef] [PubMed]
Basant, N.; Gupta, S.; Singh, K.P. A three-tier QSAR modeling strategy for estimating eye irritation potential of diverse chemicals in rabbit for regulatory purposes. Regul. Toxicol. Pharmacol. 2016, 77, 282–291. [Google Scholar] [CrossRef]
Lee, K.M.; Yoo, J.; Kim, S.-W.; Lee, J.-H.; Hong, J. Autonomic machine learning platform. Int. J. Inf. Manag. 2019, 49, 491–501. [Google Scholar] [CrossRef]
Wolpert, D.H.; Macready, W.G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1997, 1, 67–82. [Google Scholar] [CrossRef] [Green Version]
Wolpert, D.H.; Macready, W.G. No Free Lunch Theorems for Search; Technical Report SFI-TR-95-02-010; Santa Fe Institute: Santa Fe, NM, USA, 1995. [Google Scholar]
Cheng, D.; Shi, Y.; Gwee, B.; Toh, K.; Lin, T. A Hierarchical Multiclassifier System for Automated Analysis of Delayered IC Images. IEEE Intell. Syst. 2019, 34, 36–43. [Google Scholar] [CrossRef]
Basheer, I.A.; Hajmeer, M. Artificial neural networks: Fundamentals, computing, design, and application. J. Microbiol. Methods 2000, 43, 3–31. [Google Scholar] [CrossRef]
Jain, A.K.; Jianchang, M.; Mohiuddin, K.M. Artificial neural networks: A tutorial. Computer 1996, 29, 31–44. [Google Scholar] [CrossRef] [Green Version]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Chamasemani, F.F.; Singh, Y.P. Multi-class Support Vector Machine (SVM) Classifiers—An Application in Hypothyroid Detection and Classification. In Proceedings of the 2011 Sixth International Conference on Bio-Inspired Computing: Theories and Applications, Penang, Malaysia, 27–29 September 2011; pp. 351–356. [Google Scholar]
Yang, X.; Yu, Q.; He, L.; Guo, T. The one-against-all partition based binary tree support vector machine algorithms for multi-class classification. Neurocomputing 2013, 113, 1–7. [Google Scholar] [CrossRef]
Tuv, E.; Runger, G.C. Scoring levels of categorical variables with heterogeneous data. IEEE Intell. Syst. 2004, 19, 14–19. [Google Scholar] [CrossRef]
Chiang, W.; Liu, X.; Zhang, T.; Yang, B. A Study of Exact Ridge Regression for Big Data. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 3821–3830. [Google Scholar]
Marquardt, D.W.; Snee, R.D. Ridge Regression in Practice. Am. Stat. 1975, 29, 3–20. [Google Scholar] [CrossRef]
Cox, D.R. The regression analysis of binary sequences. J. R. Stat. Society. Ser. B 1958, 20, 215–242. [Google Scholar] [CrossRef]
Jiang, F.; Guan, Z.; Li, Z.; Wang, X. A method of predicting visual detectability of low-velocity impact damage in composite structures based on logistic regression model. Chin. J. Aeronaut. 2021, 34, 296–308. [Google Scholar] [CrossRef]
Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the International Joint Conference on Artificial Intelligence 1995, Montreal, QC, Canada, 20–25 August 1995; pp. 1137–1143. [Google Scholar]
Chou, J.; Truong, D.; Le, T. Interval Forecasting of Financial Time Series by Accelerated Particle Swarm-Optimized Multi-Output Machine Learning System. IEEE Access 2020, 8, 14798–14808. [Google Scholar] [CrossRef]
Yeh, I.-C. Analysis of Strength of Concrete Using Design of Experiments and Neural Networks. J. Mater. Civ. Eng. 2006, 18, 597–604. [Google Scholar] [CrossRef]
Yeh, I.C.; Hsu, T.-K. Building real estate valuation models with comparative approach through case-based reasoning. Appl. Soft Comput. 2018, 65, 260–271. [Google Scholar] [CrossRef]
Tsanas, A.; Xifara, A. Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy Build. 2012, 49, 560–567. [Google Scholar] [CrossRef]
Lau, K.; López, R. A Neural Networks Approach to Aerofoil Noise Prediction; International Center for Numerical Methods in Engineering: Barcelona, Spain, 2009. [Google Scholar]

Figure 1. Artificial neural network (ANN) model.

Figure 2. Support Vector Machine (SVM) and Support Vector Regression (SVR) models.

Figure 3. The classification and regression tree (CART) model.

Figure 4. Linear Ridge Regression (LRR) and Logistic Regression (LgR) models.

Figure 5. Ensemble models.

Figure 6. K-fold cross-validation method.

Figure 7. Intelligent machine leaner framework.

Figure 8. Snapshot of intelligent Machine Learner (iML) interface.

Figure 9. Snapshot of report file.

Figure 10. Root mean square errors of best models.

Table 1. Confusion matrix.

		Actual Class
		Positive	Negative
Predicted class	Positive	True positive	False Negative
Predicted class	Negative	False positive	True negative

Table 2. Mathematical formulas for performance measures.

Measure	Formula	Measure	Formula
Accuracy	$Accuracy = \frac{tp + tn}{tp + fp + tn + fn}$	Mean absolute error	$MAE = \frac{1}{n} \sum_{i = 1}^{n} \| y_{i} - y_{i}^{'} \|$
Precision	$Precision = \frac{tp}{tp + fp}$	Mean absolute percentage error	$MAPE = \frac{1}{n} \| \frac{y_{i} - y_{i}^{'}}{y_{i}} \|$
Sensitivity	$Sensitivity = \frac{tp}{tp + fn}$	Root mean square error	$RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - y_{i}^{'})}^{2}}$
Specificity	$Specificity = \frac{tn}{tn + fp}$	Correlation coefficient	$R = \frac{n \sum y_{i} . y_{i}^{'} - (\sum y_{i}) (\sum y_{i}^{'})}{\sqrt{n (y_{i}^{2}) - {(\sum y_{i})}^{2}} \sqrt{n ({y_{i}^{'}}^{2}) - {(\sum y_{i}^{'})}^{2}}}$
Area under the curve	$AUC = \frac{1}{2} [(\frac{tp}{tp + fn}) + (\frac{tn}{tn + fp})]$	Total error rate	$TER = \frac{\| \sum_{i = 1}^{n} y_{i}^{'} - \sum_{i = 1}^{n} y_{i} \|}{\sum_{i = 1}^{n} y_{i}}$

tp is the true positives (number of correctly recognized class examples); tn is the true negatives (number of correctly recognized examples that do not belong to the class); fp is the number of false positives (number of examples that were incorrectly assigned to a class); fn is the number of false negatives (number of examples that were not assigned to a class);

y_{i}

is actual value;

y_{i}^{'}

is predicted value; n is sample size.

Table 3. Characteristic of data from UCI Machine Learning Repository.

UCI Data Set	No. of Samples	No. of Attributes	Output Information
Concrete Compressive Strength (Yeh (2006) [67])	1030	8	Concrete compressive strength (MPa)
Real estate valuation (Yeh and Hsu (2018) [68])	414	6	Y = house price of unit area (10,000 New Taiwan Dollar/Ping, where Ping is a local unit, 1 Ping = 3.3 m squared)
Energy efficiency (Tsanas and Xifara (2012) [69])	768	8	y1 Heating Load (kW)
Energy efficiency (Tsanas and Xifara (2012) [69])	768	8	y2 Cooling Load (kW)
Airfoil Self-Noise (Lau and López (2009) [70])	1503	5	Scaled sound pressure level (dB).

Table 4. Test results by WEKA and iML on concrete compressive strength dataset via hold-out validation.

Model	WEKA				SI (Ranking)	iML				SI (Ranking)
Model	R	RMSE (MPa)	MAE (MPa)	MAPE (%)	SI (Ranking)	R	RMSE (MPa)	MAE (MPa)	MAPE (%)	SI (Ranking)
I. Single	CART					ANN
	0.927	6.546	5.170	18.770	0.142 (7)	0.946	5.302	3.728	12.673	0.023 (3)
II. Voting	ANN + CART					ANN + CART
	0.936	6.202	4.930	19.090	0.124 (6)	0.956	4.771	3.550	12.723	0.000 (1)
III. Bagging	CART					ANN
	0.960	5.044	3.983	15.130	0.032 (4)	0.951	5.056	3.647	12.249	0.010 (2)
IV. Stacking	(*) CART					(*) LRR
	0.939	5.986	4.792	17.520	0.104 (5)	0.444	14.829	11.779	56.775	1.000 (8)