Next Article in Journal
Critical Challenges and Potential for Widespread Adoption of Mass Timber Construction in Australia—An Analysis of Industry Perceptions
Previous Article in Journal
Seismic Performance of a Novel Precast Beam-Column Joint Using Shape Memory Alloy Fibers-Reinforced Engineered Cementitious Composites
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Automatic Modeling for Concrete Compressive Strength Prediction Using Auto-Sklearn

1
Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 610031, China
2
Department of Building and Real Estate, The Hong Kong Polytechnic University, Hong Kong SAR 999077, China
*
Author to whom correspondence should be addressed.
Buildings 2022, 12(9), 1406; https://doi.org/10.3390/buildings12091406
Submission received: 14 July 2022 / Revised: 18 August 2022 / Accepted: 31 August 2022 / Published: 7 September 2022
(This article belongs to the Section Building Materials, and Repair & Renovation)

Abstract

:

Highlights

What are the main findings?
  • It’s feasible to use AutoML for concrete compressive strength prediction.
  • Concrete compressive strength model based on AutoML has stronger robustness than ML.
What is the implication of the main finding?
  • AutoML can reduce modeling time and reliance upon engineer modeling experience.

Abstract

Machine learning is widely used for predicting the compressive strength of concrete. However, the machine learning modeling process relies on expert experience. Automated machine learning (AutoML) aims to automatically select optimal data preprocessing methods, feature preprocessing methods, machine learning algorithms, and hyperparameters according to the datasets used, to obtain high-precision prediction models. However, the effectiveness of modeling concrete compressive strength using AutoML has not been verified. This study attempts to fill the above research gap. We construct a database comprising four different types of concrete datasets and compare one AutoML algorithm (Auto-Sklearn) against five ML algorithms. The results show that Auto-Sklearn can automatically build an accurate concrete compressive strength prediction model without relying on expert experience. In addition, Auto-Sklearn achieves the highest accuracy for all four datasets, with an average R 2 of 0.953; the average R 2 values of the ML models with tuned hyperparameters range from 0.909 to 0.943. This study verifies for the first time the feasibility of AutoML for concrete compressive strength prediction, to allow concrete engineers to easily build accurate concrete compressive strength prediction models without relying on a large amount of ML modeling experience.

1. Introduction

Concrete is a heterogeneous composite material comprising several materials with different properties (e.g., cement, water, and coarse and fine aggregates), which are mixed together [1,2]. Compared with other civil construction materials, concrete has the advantages of higher economy, plasticity, safety, durability, and so on. Therefore, it is widely used in projects such as housing construction, bridges, and roads. Compressive strength is an important indicator of concrete quality [3]. To ensure the safety of engineering construction, it is necessary to understand the development trends of concrete compressive strength during the planning, design, and construction stages [4]. Therefore, predicting the compressive strength of concrete is of great significance.
The compressive strength of concrete is affected by several factors. Studies have shown that it has a complex nonlinear relationship with the cement–mixing water ratio, the cement–aggregate ratio, and the gradation of aggregate particles [5,6]. In addition, in practical engineering, to achieve the two objectives of improving concrete strength and performance, certain admixtures are often added during the concrete preparation process, which also increases the complexity of concrete strength prediction [7,8,9,10]. The above complex conditions limit the accuracy of traditional empirical models and linear regression methods (LR) in the prediction of concrete compressive strength.
Machine learning (ML) algorithms have been widely applied for the compressive strength prediction of concrete, owing to their excellent nonlinear modeling abilities in complex problems [11,12,13,14,15,16]. The machine-learning-based prediction process for concrete compressive strength generally includes data preprocessing, feature preprocessing, ML algorithm selection, and hyperparameter optimization stages. Table 1 reviews the latest research on concrete compressive strength prediction from the perspective of the methods used in the various stages of the modeling process. As displayed in Table 1, in terms of data preprocessing methods, directly applying the raw data or using normalization [17] for prediction represent the primary methods. Owing to the powerful capabilities of ML algorithms, concrete researchers need not perform complex data preprocessing on concrete data [18,19]. In terms of feature preprocessing, most existing studies require human experts to analyze the factors affecting the compressive strength of concrete [20]. Algorithm selection and hyperparameter optimization constitute the focus of ML-based concrete compressive strength prediction research. In terms of model selection, the artificial neural network (ANN) [21,22,23,24], support vector regression (SVR) [11,25], random forest (RF) [13,26], adaptive boosting (AdaBoost) [27], Laplacian kernel ridge regression (LKRR) [28], light gradient boosting method (LGBM) [29], and extreme gradient boosting (XGBoost) [30,31] are widely used for concrete compressive strength prediction; however, different ML algorithms are suitable for different concrete datasets. For example, XGBoost performs best on steel-fiber-reinforced concrete datasets [32], and gradient boosting (GB) outperforms XGBoost [11] on recycled aggregate concrete datasets. This means that when dealing with new concrete datasets, concrete engineers must perform significant amounts of testing to select the optimal modeling method. In addition, several studies have integrated multiple ML models to develop models with higher accuracy. For example, on a concrete dataset containing recycled concrete aggregate (RCA) and ground granular blast furnace slag (GGBFS), the accuracy of an integrated model comprising LR and RF exceeds that of a single model [33]. In terms of hyperparameter optimization, the choice of hyperparameters significantly affects the performance of ML models. Therefore, to improve the modeling ability of ML algorithms, researchers have used grid search (GS) or metaheuristics to optimize hyperparameters [34,35]. For example, a hybrid of the SVR and GS models outperformed SVR [12] on a common concrete dataset.
To summarize, when required to use ML to build a compressive strength prediction model for a new type of concrete, concrete engineers must optimize the parameters of numerous algorithms in the ML algorithm library and test the performance of each on the new concrete dataset. In addition, to obtain a higher prediction performance, concrete engineers must consider the possibility of an ensemble of multiple ML models; however, complex combination testing is time-consuming and highly dependent on human expertise. Therefore, it is difficult for concrete engineers who lack experience in ML modeling to build an accurate concrete compressive strength prediction model. Concrete engineers with ML modeling experience spend significant amounts of time conducting comparative experiments to select the optimal model. Time consumption and reliance upon ML modeling experience slow down the development of new concrete materials or the application of predictive models. Hence, automated ML methods that utilize computer programs are urgently required to free concrete engineers from the complex and time-consuming process of ML modeling, so that they can focus on concrete material research.
Automated ML (AutoML) is a research frontier at the intersection of automation technology and ML [41]. The goal of AutoML is to replace the complex selection and parameter optimization problems of ML learning algorithms in the process of using computer programs, so that ML users can obtain accurate prediction models based on end-to-end datasets [42]. When using AutoML to build a concrete compressive strength prediction model, the data preprocessing, feature preprocessing, model selection, parameter optimization, and evaluation stages are encapsulated, and concrete engineers can automatically obtain a concrete compressive strength prediction model without focusing on the intermediate process. This greatly simplifies the concrete compressive strength modelling process and reduces the requirement of ML modeling experience. However, AutoML, as a new technology, has not been verified as a feasible approach for predicting concrete compressive strength.
To address the gaps in the existing research, this study makes the following three contributions.
(1)
We conduct—for the first time in the literature—a feasibility study of AutoML for the prediction of concrete compressive strength.
(2)
We obtain a database (containing four types of concrete datasets) from the literature, and we conduct a comprehensive comparison of one AutoML algorithm (i.e., Auto-Sklearn) against five ML algorithms (ANN, SVR, RF, AdaBoost, and XGBoost), to verify the superiority of AutoML over ML.
(3)
We verify that Auto-Sklearn can automatically build an accurate concrete compressive strength prediction model without relying on expert experience, and the resulting method is more robust than traditional ML methods.
The remainder of this paper is organized as follows: First, the principles of the proposed method are given in Section 2. Then, an experimental case study is presented in Section 3, to validate the effectiveness of the proposed method. Finally, conclusions are drawn in Section 4.

2. Materials and Methods

To improve the reproducibility and practical applicability of this work, this section presents detailed materials and methods information, including the constructed concrete compressive strength database, the AutoML algorithm, and a comparison of the five ML algorithms using the concrete compressive strength prediction model performance evaluation index.

2.1. Concrete Database

Most concrete databases thus far studied have contained only one type of concrete [33,43,44], which is not conducive to testing the performance of ML algorithms on multiple types of concrete. Hence, to test the robustness of the AutoML algorithm for predicting the compressive strength of various types of concrete, we collected four concrete compressive strength datasets via a literature survey. The dataset comprised four types of concrete: ordinary concrete, rice husk ash concrete, high-strength concrete, and machine-made sand concrete. The sample size, variable type, variable number, and data distribution of the four datasets differed. All datasets were randomly divided into training and test sets at a ratio of 80%:20%.

2.1.1. Conventional Concrete Dataset

Conventional concrete (CC) is the most widely used building material for its purpose. The CC dataset adopted in this study was obtained experimentally by a research group at Chung Hwa University, Taiwan [45]. The dataset consisted of 1030 pieces of data, and each piece of data included eight independent variables and one dependent one. The content range of each ingredient in concrete is listed in Table 2. Figure 1 shows the correlation matrix for the dataset.

2.1.2. Rice Husk Ash Concrete Dataset

A large amount of agricultural waste has been used as a substitute for cement to produce sustainable concrete, which helps to reduce greenhouse gas emissions. The agricultural-waste-based concrete dataset used in this study was the rice husk ash concrete (RHA) compressive strength prediction dataset [46]. The dataset comprises 192 pieces of data, and each piece of data includes six independent variables and one dependent one. The content range of each ingredient in the RHA is shown in Table 3. Figure 2 shows the correlation matrix for the dataset.

2.1.3. High-Strength Concrete Dataset

High-strength concretes (HSCs) are widely used in the modern construction industry because of their superior strength and durability. The HSC compressive strength prediction dataset [47] used in this study consists of 357 pieces of data, each of which includes five independent variables and one dependent variable. The content range of each ingredient in the HSC dataset is listed in Table 4. Figure 3 shows the correlation matrix for the dataset.

2.1.4. Concrete with Manufactured Sand Dataset

Artificial sand made from crushed stone or gravel, also known as machine-made sand, artificial sand, or gravelly sand, has been used as a substitute for natural sand in concrete, to preserve limited natural sand resources. Concrete with manufactured sand (MSC) has gradually become an indispensable green building material. The MSC dataset [48] used in this study comprises 280 pieces of data, each of which included 11 independent variables and one dependent variable. The content range of each ingredient in MSC is listed in Table 5. Figure 4 shows the correlation matrix for the dataset.

2.2. AutoML Algorithm

AutoML is a current research frontier in the computing community. The goal of AutoML is to automatically select the optimal data modeling pipeline in the data preprocessing, feature preprocessing, model selection, and hyperparameter optimization stages of the ML modeling process, without human intervention or time delays. Figure 5 shows the difference between AutoML and ML for the prediction of the concrete compressive strength.

2.2.1. Mathematical Model

In the mathematical description of AutoML, the dataset is denoted by D and is divided into the disjoint training set D t r a i n and validation set D v a l i d . The configuration space of data preprocessing methods is D P , and each data preprocessing method can be defined as d p D P ; the configuration space of feature preprocessing methods is F P , and each feature preprocessing method can be defined as f p F P ; the configuration space of ML algorithms is M ; each ML algorithm m M has N hyperparameters, and its hyperparameter space is H = h 1 × h 2 × × h N (where each h i can be an integer, real, floating-point, or label value); the hyperparameter of each m can be defined as h H . The evaluation function for calculating the loss is defined as S c o r e , and the data pipeline is P , where
P = argmin d p D P , f p F P , m M , h H S c o r e ( D t r a i n , D v a l i d , d p , f p , m , h )
According to Equation (1), when the configuration spaces of D P , F P , and M are known, it is only necessary to input the training set D t r a i n and test set D v a l i d , because the optimal model can be obtained by minimizing the error of the model on the test set D v a l i d data pipeline. In addition to model and parameter selection, Equation (1) also considers the data preprocessing and feature preprocessing links in the ML pipeline; thus, the above problem can be defined as a generalized joint optimization problem of combined algorithm selection and hyperparameter tuning (CASH) [49]. Thus, the construction of the configuration space and the optimization of the generalized CASH are key steps for realizing AutoML.

2.2.2. Auto-Sklearn

Auto-Sklearn [50] is the current “state-of-the-art” algorithm in AutoML research [51]. Auto-Sklearn first incorporates the entire ML pipeline design problem (including its structural design and hyperparameter configuration) into a custom hyperparameter space; then, it uses a Bayesian optimizer to solve the generalized CASH problem in this new hyperparameter space, to obtain the optimal predictive model. In addition, Auto-Sklearn integrates two techniques to further improve algorithm performance. First, a meta-learner was used to obtain the initial configuration space according to prior information, to improve the efficiency of the algorithm; second, a model integrator was used to combine multiple ML pipelines to improve the algorithm’s accuracy. The Auto-Sklearn algorithm framework is shown in Figure 6, which consists of a configuration space, Bayesian optimizer, meta-learner, and model integrator.

2.3. Machine Learning Algorithms

We selected five ML algorithms (ANN, SVR, RF, AdaBoost, and XGBoost), widely used for concrete compressive strength prediction, as comparison algorithms for AutoML. This section briefly reviews the principles of the five ML algorithms.

2.3.1. Artificial Neural Network

By simulating the structure and function of a biological neural network (brain), an ANN connects a large number of artificial neurons to model complex relationships between data [52]. The focus of ANNs is to build artificial neuron models and network structures. For each artificial neuron, if we take the input values { X 1 , X 2 , , X n } and their weight coefficients { W 1 , W 2 , , W n } and we further assume that the bias of the neuron was b , then the activity value of the neuron is a = ( X 1 × W 1 ) + ( X 2 × W 2 ) + + ( X i × W i ) + + ( X n × W n ) + b . To obtain the output value of the neuron, its activity value is passed through the activation function. ANNs are composed of many neurons designed according to the above rules and combined with certain other rules.

2.3.2. Support Vector Regression

SVR was obtained by generalizing the support vector machine (SVM) from classification problems to regression ones [53]. The principle of SVR in implementing data modeling is to identify a hyperplane that minimizes the distance to the sample point farthest from the hyperplane (an SVM needs to maximize the distance to the sample point closest to the hyperplane). SVR transformed the process of identifying a hyperplane into a convex quadratic programming problem, and it obtained a hyperplane by solving the problem, thereby realizing nonlinear data modeling.

2.3.3. Random Forest

The core idea of RF was to combine a single-classifier decision tree (DT) with overfitting and local convergence problems into multiple-classifier forests [54]. The bootstrap resampling method was used to extract multiple samples from the original samples, train a DT for each bootstrap sample, combine these DTs, and obtain the final evaluation result by arithmetically averaging the predicted values of a single DT. Assuming that the inputs x and y represent the prediction result of the RF model, n represents the number of DTs, and y i represents the prediction value of the i -th DT, then the calculation formula for y is
y = 1 n i = 1 n y i ( x ) .

2.3.4. Adaptive Boosting

AdaBoost is one of the best boosting algorithms. Its core idea is to upgrade a weak classifier (that has a classification accuracy slightly better than random guessing) to a strong classifier with a high classification accuracy [55]. The AdaBoost algorithm uses multiple iterations. It determines the weights of the samples in the dataset according to the correctness of the sample set classification after each training round and the accuracy of the previous classification. Further, it sends the new dataset with modified weights to the lower-level classifier for training. Each classifier obtained from the second training is fused, resulting in a classifier more accurate than a weak classifier; this is used as the final decision classifier.

2.3.5. Extreme Gradient Boosting

XGBoost has been widely praised in academia and industry for its fast computational speed, good model performance, and excellent efficacy and efficiency in application practice [56]. XGBoost selects a DT as its weak learner. When training a single weak learner, it marginally increases the weight of the previous misclassified data, learns the current single weak learner, then adds a new weak learner to try to correct the residuals of all the previous weak learners; finally, the weighted summation of multiple learners is used to obtain the final prediction.

2.4. Performance Evaluation Metrics

After training the models, several well-known metrics are used to evaluate their performances, including the root-mean-squared error (RMSE) [57], mean absolute error (MAE) [58], coefficient of determination ( R 2 ) [59], and mean absolute percentage error (MAPE) [60].

2.4.1. Root-Mean-Squared Error

The RMSE is generally used as a loss function in regression, and it can be defined as
R M S E = 1 z i = 1 Z ( a i b i ) 2 ,
where a is the predicted output value, b is the actual value, and Z is the number of data samples. The higher the RMSE value, the larger the error is. Therefore, the RMSE value should be minimized to improve the performance of the model.

2.4.2. Mean Absolute Error

The MAE is the arithmetic mean of the deviation, which can be expressed as
M A E = 1 z i = 1 Z | a i b i | .
The optimal value for the MAE is 0.0.

2.4.3. Coefficient of Determination ( R 2 )

R 2 represents the level of accuracy. The higher the value of R 2 , the higher the similarity between the predicted and actual values is. R 2 ranges from 0 to 1 and is expressed as
R 2 = 1 i = 1 N ( P r e d i c t e d i A c t u a l i ) 2 i = 1 N ( A c t u a l i A c t u a l ¯ i ) 2
where P r e d i c t e d i is the predicted intensity of the i th sample, A c t u a l i denotes the actual i th sample, and A c t u a l ¯ i is the average of the actual intensities of all samples.

2.4.4. Mean Absolute Percentage Error

M A P E = 1 z i = 1 Z | A c t u a l i P r e d i c t e d i A c t u a l i | × 100 % .

3. Results and Discussion

To verify the efficacy of AutoML, this study first tested it on the constructed database, before testing the ML algorithm. Finally, the results of AutoML and ML are discussed. All experiments were performed on a computer with an integrated NVIDIA GTX 1080 graphics card (8-GB RAM), 32 GB RAM, and an Intel Core i7-6770 CPU. The algorithm used for the experiment was implemented in the Python programming language using an Ubuntu 16.04 operating system. The ML algorithm was implemented using the Scikit library (https://scikit-learn.org/) (accessed on 10 July 2022).

3.1. Concrete Compressive Strength Prediction Using AutoML

To verify the effectiveness of using AutoML for concrete compressive strength prediction, this study applied Auto-Sklearn, a representative AutoML algorithm, to conduct experiments on four concrete datasets. The max runtime of each Auto-Sklearn was set to 2.0 h. To prevent overfitting, the ten-fold crossover method [61] was used to calculate the optimizer score.
R 2 is an important index for evaluating the concrete compressive strength predictions. To monitor the optimization process of Auto-Sklearn, we plotted the change curves of training R 2 , optimized R2, and test R 2 for the optimal single model and ensemble model during the training process, as shown in Figure 7. In the initial stage of optimization, each indicator shows a significant upward trend as the optimization progresses, which indicates that the performance of the model rapidly improves in the initial stage of optimization. Among the datasets, Auto-Sklearn achieves a high level after ~15 min optimization on the CC dataset, and it gradually converges. On the RHA, HSC, and MSC datasets, Auto-Sklearn converges within ~5 min, ~30 min, and ~3 min, respectively. The difference of each index suggests that the accuracy of the ensemble model (i.e., a weighted combination of multiple models) identified by Auto-Sklearn for each dataset exceeded that of a single model, which indicates that the ensemble model is more suitable for accurate concrete resistance predictions; the compressive strength prediction model corroborates the findings from previous research [33]. In addition, the performance of the model obtained via Auto-Sklearn optimization for the training set exceeds that for the test set, which accords with the general laws of ML modeling.
After training, Auto-Sklearn obtained four ensemble models, each of which was weighted and combined via multiple ML pipelines with certain weights. Table 6 shows the detailed parameters of the four ensemble models; it can be seen that the ensemble models built for the CC, RHA, HSC, and MSC datasets consisted of 10, 9, 7, and 4 ML pipelines. This complex combination is difficult, even for experienced concrete engineers.
Table 7 shows the performance evaluation results of the four ensemble models on the test set: all the test R 2 values exceeded 0.9. Among the models, the R 2 value of the ensemble MSC model was the highest, reaching 0.991. The predicted and real results of the four ensemble models for the four concrete datasets are shown in Figure 8, further validating the high performance of the Auto-Sklearn algorithm.
To summarize, Auto-Sklearn can automatically build accurate compressive strength prediction models for various types of concretes.

3.2. Prediction of Concrete Compressive Strength Using Machine Learning

Five ML algorithms were used to conduct experiments on the four datasets. In the data preprocessing stage, most algorithms did not perform data preprocessing. In terms of feature preprocessing, the features used were derived from features manually selected through expert experience; for ML algorithm selection, the ANN, SVR, RF, AdaBoost, and XGBoost algorithms were used; for hyperparameter selection, to obtain the optimal performance of each algorithm, the GS method was used [28,32] to select the hyperparameters.
Through the experiments, we obtained the hyperparameters adopted by each model and the results of the model performance evaluation. As can be seen from Table 8, for Datasets 1, 2, and 4, the multiple performance evaluation metrics of the XGBoost algorithm were optimal. For the HSC dataset, the ANN algorithm achieved the best concrete compressive strength prediction performance. Thus, on the one hand, XGBoost is the most robust ML algorithm for concrete compressive strength prediction among the five ML algorithms, though on the other hand, none of the ML algorithms tested in this study can be used to build an optimal compressive strength prediction model for all concrete datasets. To construct accurate concrete compressive strength prediction models, concrete engineers must extensively test multiple ML algorithms.

3.3. Comparison of Concrete Compressive Strength Prediction Using AutoML and ML

Box plots were used to count the test results of the representative algorithms under the AutoML and ML methods, and the results are shown in Figure 9. Through comparison and analysis, we summarize the advantages of the AutoML representative algorithm, Auto-Sklearn, for building concrete compressive strength prediction models:
  • The accuracy of the Auto-Sklearn algorithm is higher. The multiple algorithm performance metrics presented in Figure 9 show that the Auto-Sklearn algorithm outperforms the five ML algorithms (ANN, SVR, RF, AdaBoost, and XGBoost) on all four datasets. This is because the Auto-Sklearn algorithm can both build complex ensemble models and optimize the entire ML pipeline (including data preprocessing methods, feature preprocessing methods, ML algorithms, and hyperparameters).
  • The Auto-Sklearn algorithm is more robust. By comparing the range of the box plot in Figure 9, it can be seen that the fluctuation range of each performance evaluation index of the Auto-Sklearn algorithm (applied to multiple datasets) is significantly smaller than that of the other five ML algorithms. Existing studies have shown that each machine-learning algorithm has a certain scope of application, and there is currently no ML algorithm that performs best on any given dataset [50]. The Auto-Sklearn algorithm can automatically identify the optimal machine-learning pipeline for the dataset in the configuration space and combine them. Therefore, the Auto-Sklearn algorithm is more robust.
  • The Auto-Sklearn algorithm can reduce the modeling time and the dependence on concrete engineer expertise. When building a compressive strength prediction model based on a new concrete dataset, concrete engineers must comprehensively compare multiple ML algorithms and exhaustively optimize the hyperparameters. This results in a considerable time restraint. This study shows that the Auto-Sklearn algorithm can train an accurate concrete compressive strength prediction model within a short time. In addition, once the Auto-Sklearn algorithm is run, there is no need for manual intervention from the concrete engineer, which means that the concrete engineer spends very little time performing modeling. Meanwhile, the automated modeling process means that concrete engineers do not need machine-learning modeling experience and can therefore devote more time to concrete research.
  • The Auto-Sklearn algorithm has better scalability. More advanced ML algorithms can be integrated into the configuration space of the Auto-Sklearn algorithm (in particular, numerous ML algorithms that perform well in concrete compressive strength prediction), to satisfy more complex modeling requirements. Traditional ML algorithms can only improve the model performance in a limited manner, by tuning the hyperparameters.

3.4. Comparison with Related Work

Table 9 presents a comparison of the present method and several previous methods reported in studies regarding the use of ML for predicting the compressive strength of concrete. Through comparison, it can be concluded that the advanced nature of the proposed method lies in the following:
  • High degree of automation; no reliance upon human experience. To a certain extent, existing research relies upon expert experience to select the hyperparameters. The selection of the hyperparameters is important, but difficult. The present method facilitates automated modeling without relying upon expert experience.
  • Stronger robustness. The proposed method achieves accuracies greater than 0.9 (R2) on all datasets, and most of the accuracies approach or exceed those of well-tuned methods in existing studies.
Through comparison, it can be seen that the adopted method does not achieve the optimal performance on certain datasets. This shows that Auto-Sklearn did not search for the optimal model in a short period of time. The reason is that expertise produces higher accuracies than automatic ML on certain datasets. For example, on the HSC dataset, existing studies have used RF to achieve higher accuracies, and the methods employed do not ensure that the parameters are optimal. On the other hand, the limitation of the adapted method’s model library is also responsible. For example, the best model on the CC dataset is LKRR, and the best model on the RHA dataset is GEP. Neither of these has yet been included in Auto-Sklearn’s configuration space.
To summarize, the greatest significance of the method adopted in this paper is to simplify the modeling process for concrete compressive strength and reduce the dependence upon engineer expertise. To further improve the performance of the Auto-Sklearn algorithm, it is necessary to further expand the configuration space and improve the performance of the optimizer in the future.

4. Conclusions and Future Work

This study aimed to verify—for the first time in the literature—the feasibility of using AutoML for concrete compressive strength prediction. We first collected four different types of concrete datasets, introduced the principles of AutoML and a representative algorithm (Auto-Sklearn), and compared this representative against five ML algorithms. The following conclusions were drawn:
  • Auto-Sklearn could automatically build compressive strength prediction models for various types of concrete (CC, RHA, HSC, and MSC).
  • The robustness of Auto-Sklearn for different concrete datasets surpassed that of the five ML algorithms (ANN, SVR, RF, AdaBoost, and XGBoost). For all datasets (CC, RHA, HSC, and MSC), Auto-Sklearn achieved the highest R 2 (0.938, 0.968, 0.914, 0.991, respectively), and the R 2 values of the ML algorithm with adjusted hyperparameters were as follows (where the order CC, RHA, HSC, MSC is adopted): ANN (0.894, 0.931, 0.843, 0.981), SVR (0.854, 0.890, 0.866, 0.963), RF (0.890, 0.892, 0.845, 0.957), AdaBoost (0.893, 0.904, 0.833, 0.960), and XGBoost (0.929, 0.966, 0.899, 0.978). The average R 2   of Auto-Sklearn was 0.953, and the average R 2 values of the ML models were ANN (0.909), SVR (0.890), RF (0.896), AdaBoost (0.891), and XGBoost (0.943).
  • Concrete engineers could use the Auto-Sklearn algorithm to quickly build an accurate concrete compressive strength prediction model, without relying on ML modeling experience; thus, they could devote more time to concrete research.
Although Auto-Sklearn is a “state-of-the-art” algorithm in the current AutoML research field, the accuracy and computational efficiency of this algorithm still need to be improved. The following research should be conducted in the future:
  • More advanced ML algorithms should be integrated in the configuration space, especially those that have a proven superior performance in concrete compressive strength prediction, to further improve the accuracy of AutoML in this task.
  • Better optimization methods for ML pipelines should be investigated, to further improve the computational efficiency of AutoML for concrete compressive strength prediction.

Author Contributions

M.S.: conceptualization, methodology, writing. W.S.: resources, supervision, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 42107155), the Fundamental Research Funds for the Central Universities (No. 2682021CX061), and the research project of the Department of Natural Resources of Sichuan Province (Kj-2022-29).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Some or all data, models, or codes that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

AutoMLautomated machine learning
MLmachine learning
LRlinear regression
ANNartificial neural network
SVRsupport vector regression
RFrandom forest
AdaBoostadaptive boosting
LKRRLaplacian kernel ridge regression
LGBMlight gradient boosting method
XGBoostextreme gradient boosting
GBgradient boosting
GEPgene expression programming
RCArecycled concrete aggregate
GGBFSground granular blast furnace slag
GSgrid search
RHArice husk ash concrete
HSChigh-strength concrete
MSCconcrete with manufactured sand
CASHcombined algorithm selection and hyperparameter tuning
DTdecision tree
RMSEroot-mean-squared error
MAEmean absolute error
R2coefficient of determination
MAPEmean absolute percentage error
CCconventional concrete
RHArice husk ash concrete
HSChigh strength concrete
MSCconcrete with manufactured sand

References

  1. Das, R.N. High-performance concrete compressive strength’s mean-variance models. J. Mater. Civ. Eng. 2017, 29, 05016003. [Google Scholar] [CrossRef]
  2. Shah, N.; Mavani, V.; Kumar, V.; Mungule, M.; Iyer, K.K. Impact assessment of plastic strips on compressive strength of concrete. J. Mater. Civ. Eng. 2019, 31, 04019148. [Google Scholar] [CrossRef]
  3. Kumar Singh, A.; Nigam, M.; Srivastava, R.K. Study of stress profile in cement concrete road of expansive soil due to swell pressure. Mater. Today Proc. 2022, 56, 347–355. [Google Scholar] [CrossRef]
  4. Ouyang, X.; Wu, Z.; Shan, B.; Chen, Q.; Shi, C. A critical review on compressive behavior and empirical constitutive models of concrete. Constr. Build. Mater. 2022, 323, 126572. [Google Scholar] [CrossRef]
  5. Kim, K.-M.; Lee, S.; Cho, J.-Y. Influence of friction on the dynamic increase factor of concrete compressive strength in a split Hopkinson pressure bar test. Cem. Concr. Compos. 2022, 129, 104517. [Google Scholar] [CrossRef]
  6. Cremonez, C.; Maria McCartney da Fonseca, J.; Carolina Seguro Cury, A.; Otto Ferreira, E.; Mazer, W. Analysis of the influence of the type of curing on the axial compressive strength of concrete. Mater. Today Proc. 2022, 58, 1211–1214. [Google Scholar] [CrossRef]
  7. Miguel Solak, A.; José Tenza-Abril, A.; Eugenia García-Vera, V. Adopting an image analysis method to study the influence of segregation on the compressive strength of lightweight aggregate concretes. Constr. Build. Mater. 2022, 323, 126594. [Google Scholar] [CrossRef]
  8. Suryanita, R.; Maizir, H.; Zulapriansyah, R.; Subagiono, Y.; Arshad, M.F. The effect of silica fume admixture on the compressive strength of the cellular lightweight concrete. Results Eng. 2022, 14, 100445. [Google Scholar] [CrossRef]
  9. Benaicha, M.; Burtschell, Y.; Alaoui, A.H. Prediction of compressive strength at early age of concrete–Application of maturity. J. Build. Eng. 2016, 6, 119–125. [Google Scholar] [CrossRef]
  10. Gong, J.; Wang, Y. Stochastic Development Model for Compressive Strength of Fly Ash High-Strength Concrete. J. Mater. Civ. Eng. 2021, 33, 04021367. [Google Scholar] [CrossRef]
  11. Quan Tran, V.; Quoc Dang, V.; Si Ho, L. Evaluating compressive strength of concrete made with recycled concrete aggregates using machine learning approach. Constr. Build. Mater. 2022, 323, 126578. [Google Scholar] [CrossRef]
  12. Wu, Y.; Zhou, Y. Hybrid machine learning model and Shapley additive explanations for compressive strength of sustainable concrete. Constr. Build. Mater. 2022, 330, 127298. [Google Scholar] [CrossRef]
  13. de-Prado-Gil, J.; Palencia, C.; Silva-Monteiro, N.; Martínez-García, R. To predict the compressive strength of self compacting concrete with recycled aggregates utilizing ensemble machine learning models. Case Stud. Constr. Mater. 2022, 16, e01046. [Google Scholar] [CrossRef]
  14. Kang, S.; Lloyd, Z.; Kim, T.; Ley, M.T. Predicting the compressive strength of fly ash concrete with the Particle Model. Cem. Concr. Res. 2020, 137, 106218. [Google Scholar] [CrossRef]
  15. Hwang, K.; Noguchi, T.; Tomosawa, F. Prediction model of compressive strength development of fly-ash concrete. Cem. Concr. Res. 2004, 34, 2269–2276. [Google Scholar] [CrossRef]
  16. Ren, Q.; Ding, L.; Dai, X.; Jiang, Z.; De Schutter, G. Prediction of compressive strength of concrete with manufactured sand by ensemble classification and regression tree method. J. Mater. Civ. Eng. 2021, 33, 04021135. [Google Scholar] [CrossRef]
  17. Chen, H.; Yang, J.; Chen, X. A convolution-based deep learning approach for estimating compressive strength of fiber reinforced concrete at elevated temperatures. Constr. Build. Mater. 2021, 313, 125437. [Google Scholar] [CrossRef]
  18. Yin, X.; Gao, F.; Wu, J.; Huang, X.; Pan, Y.; Liu, Q. Compressive strength prediction of sprayed concrete lining in tunnel engineering using hybrid machine learning techniques. Undergr. Space 2022, 7, 928–943. [Google Scholar] [CrossRef]
  19. Cook, R.; Lapeyre, J.; Ma, H.; Kumar, A. Prediction of compressive strength of concrete: Critical comparison of performance of a hybrid machine learning model with standalone models. J. Mater. Civ. Eng. 2019, 31, 04019255. [Google Scholar] [CrossRef]
  20. Dabiri, H.; Kioumarsi, M.; Kheyroddin, A.; Kandiri, A.; Sartipi, F. Compressive strength of concrete with recycled aggregate; a machine learning-based evaluation. Clean. Mater. 2022, 3, 100044. [Google Scholar] [CrossRef]
  21. Zheng, Z.; Tian, C.; Wei, X.; Zeng, C. Numerical investigation and ANN-based prediction on compressive strength and size effect using the concrete mesoscale concretization model. Case Stud. Constr. Mater. 2022, 16, e01056. [Google Scholar] [CrossRef]
  22. Mohamed, O.; Kewalramani, M.; Ati, M.; Hawat, W.A. Application of ANN for prediction of chloride penetration resistance and concrete compressive strength. Materialia 2021, 17, 101123. [Google Scholar] [CrossRef]
  23. Ni, H.-G.; Wang, J.-Z. Prediction of compressive strength of concrete by neural networks. Cem. Concr. Res. 2000, 30, 1245–1250. [Google Scholar] [CrossRef]
  24. Naderpour, H.; Rafiean, A.H.; Fakharian, P. Compressive strength prediction of environmentally friendly concrete using artificial neural networks. J. Build. Eng. 2018, 16, 213–219. [Google Scholar] [CrossRef]
  25. Penido, R.E.-K.; da Paixão, R.C.F.; Costa, L.C.B.; Peixoto, R.A.F.; Cury, A.A.; Mendes, J.C. Predicting the compressive strength of steelmaking slag concrete with machine learning–Considerations on developing a mix design tool. Constr. Build. Mater. 2022, 341, 127896. [Google Scholar] [CrossRef]
  26. Li, H.; Lin, J.; Lei, X.; Wei, T. Compressive strength prediction of basalt fiber reinforced concrete via random forest algorithm. Mater. Today Commun. 2022, 30, 103117. [Google Scholar] [CrossRef]
  27. Feng, D.-C.; Liu, Z.-T.; Wang, X.-D.; Chen, Y.; Chang, J.-Q.; Wei, D.-F.; Jiang, Z.-M. Machine learning-based compressive strength prediction for concrete: An adaptive boosting approach. Constr. Build. Mater. 2020, 230, 117000. [Google Scholar] [CrossRef]
  28. Ekanayake, I.U.; Meddage, D.P.P.; Rathnayake, U. A novel approach to explain the black-box nature of machine learning in compressive strength predictions of concrete using Shapley additive explanations (SHAP). Case Stud. Constr. Mater. 2022, 16, e01059. [Google Scholar] [CrossRef]
  29. Nguyen, H.D.; Truong, G.T.; Shin, M. Development of extreme gradient boosting model for prediction of punching shear resistance of r/c interior slabs. Eng. Struct. 2021, 235, 112067. [Google Scholar] [CrossRef]
  30. Ma, L.; Zhou, C.; Lee, D.; Zhang, J. Prediction of axial compressive capacity of CFRP-confined concrete-filled steel tubular short columns based on XGBoost algorithm. Eng. Struct. 2022, 260, 114239. [Google Scholar] [CrossRef]
  31. Nguyen, N.-H.; Abellán-García, J.; Lee, S.; Garcia-Castano, E.; Vo, T.P. Efficient estimating compressive strength of ultra-high performance concrete using XGBoost model. J. Build. Eng. 2022, 52, 104302. [Google Scholar] [CrossRef]
  32. Li, S.; Richard Liew, J.Y. Experimental and Data-Driven analysis on compressive strength of steel fibre reinforced high strength concrete and mortar at elevated temperature. Constr. Build. Mater. 2022, 341, 127845. [Google Scholar] [CrossRef]
  33. Imran, H.; Ibrahim, M.; Al-Shoukry, S.; Rustam, F.; Ashraf, I. Latest concrete materials dataset and ensemble prediction model for concrete compressive strength containing RCA and GGBFS materials. Constr. Build. Mater. 2022, 325, 126525. [Google Scholar] [CrossRef]
  34. Shadbahr, E.; Aminnejad, B.; Lork, A. Determining post-fire residual compressive strength of reinforced concrete shear walls using the BAT algorithm. Structures 2021, 32, 651–661. [Google Scholar] [CrossRef]
  35. Khorshidi Paji, M.; Gordan, B.; Biklaryan, M.; Armaghani, D.J.; Zhou, J.; Jamshidi, M. Neuro-swarm and neuro-imperialism techniques to investigate the compressive strength of concrete constructed by freshwater and magnetic salty water. Measurement 2021, 182, 109720. [Google Scholar] [CrossRef]
  36. Zeng, Z.; Zhu, Z.; Yao, W.; Wang, Z.; Wang, C.; Wei, Y.; Wei, Z.; Guan, X. Accurate prediction of concrete compressive strength based on explainable features using deep learning. Constr. Build. Mater. 2022, 329, 127082. [Google Scholar] [CrossRef]
  37. Tam, V.W.Y.; Butera, A.; Le, K.N.; Silva, L.C.F.D.; Evangelista, A.C.J. A prediction model for compressive strength of CO2 concrete using regression analysis and artificial neural networks. Constr. Build. Mater. 2022, 324, 126689. [Google Scholar] [CrossRef]
  38. Shahmansouri, A.A.; Yazdani, M.; Ghanbari, S.; Bengar, H.A.; Jafari, A.; Ghatte, H.F. Artificial neural network model to predict the compressive strength of eco-friendly geopolymer concrete incorporating silica fume and natural zeolite. J. Clean. Prod. 2021, 279, 123697. [Google Scholar] [CrossRef]
  39. Shahmansouri, A.A.; Yazdani, M.; Hosseini, M.; Bengar, H.A.; Ghatte, H.F. The prediction analysis of compressive strength and electrical resistivity of environmentally friendly concrete incorporating natural zeolite using artificial neural network. Constr. Build. Mater. 2022, 317, 125876. [Google Scholar] [CrossRef]
  40. Asteris, P.G.; Koopialipoor, M.; Armaghani, D.J.; Kotsonis, E.A.; Lourenço, P.B. Prediction of cement-based mortars compressive strength using machine learning techniques. Neural Comput. Appl. 2021, 33, 13089–13121. [Google Scholar] [CrossRef]
  41. Schwen, L.O.; Schacherer, D.; Geißler, C.; Homeyer, A. Evaluating generic AutoML tools for computational pathology. Inform. Med. Unlocked 2022, 29, 100853. [Google Scholar] [CrossRef]
  42. He, X.; Zhao, K.; Chu, X. AutoML: A survey of the state-of-the-art. Knowl.-Based Syst. 2021, 212, 106622. [Google Scholar] [CrossRef]
  43. Atoyebi, O.D.; Modupe, A.E.; Aladegboye, O.J.; Odeyemi, S.V. Dataset of the density, water absorption and compressive strength of lateritic earth moist concrete. Data Brief 2018, 19, 2340–2343. [Google Scholar] [CrossRef]
  44. Ding, X.; Li, C.; Xu, Y.; Li, F.; Zhao, S. Dataset of long-term compressive strength of concrete with manufactured sand. Data Brief 2016, 6, 959–964. [Google Scholar] [CrossRef] [PubMed]
  45. Yeh, I.-C. Modeling of strength of high-performance concrete using artificial neural networks. Cem. Concr. Res. 1998, 28, 1797–1808. [Google Scholar] [CrossRef]
  46. Iftikhar, B.; Alih, S.C.; Vafaei, M.; Elkotb, M.A.; Shutaywi, M.; Javed, M.F.; Deebani, W.; Khan, M.I.; Aslam, F. Predictive modeling of compressive strength of sustainable rice husk ash concrete: Ensemble learner optimization and comparison. J. Clean. Prod. 2022, 348, 131285. [Google Scholar] [CrossRef]
  47. Farooq, F.; Nasir Amin, M.; Khan, K.; Rehan Sadiq, M.; Faisal Javed, M.; Aslam, F.; Alyousef, R. A Comparative Study of Random Forest and Genetic Engineering Programming for the Prediction of Compressive Strength of High Strength Concrete (HSC). Appl. Sci. 2020, 10, 7330. [Google Scholar] [CrossRef]
  48. Zhao, S.; Hu, F.; Ding, X.; Zhao, M.; Li, C.; Pei, S. Dataset of tensile strength development of concrete with manufactured sand. Data Brief 2017, 11, 469–472. [Google Scholar] [CrossRef]
  49. Mu, T.; Wang, H.; Wang, C.; Liang, Z.; Shao, X. Auto-CASH: A meta-learning embedding approach for autonomous classification algorithm selection. Inf. Sci. 2022, 591, 344–364. [Google Scholar] [CrossRef]
  50. Feurer, M.; Klein, A.; Eggensperger, K.; Springenberg, J.; Blum, M.; Hutter, F. Efficient and robust automated machine learning. Adv. Neural Inf. Process. Syst. 2015, 2, 2755–2763. [Google Scholar]
  51. Feurer, M.; Eggensperger, K.; Falkner, S.; Lindauer, M.; Hutter, F. Auto-sklearn 2.0: Hands-free automl via meta-learning. arXiv 2020, arXiv:2007.04074. [Google Scholar] [CrossRef]
  52. Yegnanarayana, B. Artificial Neural Networks; PHI Learning Pvt. Ltd.: Delhi, India, 2009. [Google Scholar]
  53. Noble, W.S. What is a support vector machine? Nat. Biotechnol. 2006, 24, 1565–1567. [Google Scholar] [CrossRef]
  54. Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
  55. Sun, Y.; Liu, Z.; Todorovic, S.; Li, J. Adaptive boosting for SAR automatic target recognition. IEEE Trans. Aerosp. Electron. Syst. 2007, 43, 112–125. [Google Scholar] [CrossRef]
  56. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  57. Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
  58. Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
  59. Edwards, L.J.; Muller, K.E.; Wolfinger, R.D.; Qaqish, B.F.; Schabenberger, O. An R2 statistic for fixed effects in the linear mixed model. Stat. Med. 2008, 27, 6137–6157. [Google Scholar] [CrossRef] [PubMed]
  60. Tayman, J.; Swanson, D.A. On the validity of MAPE as a measure of population forecast accuracy. Popul. Res. Policy Rev. 1999, 18, 299–322. [Google Scholar] [CrossRef]
  61. Refaeilzadeh, P.; Tang, L.; Liu, H. Cross-validation. Encycl. Database Syst. 2009, 5, 532–538. [Google Scholar]
  62. Chou, J.-S.; Chiu, C.-K.; Farfoura, M.; Al-Taharwa, I. Optimizing the prediction accuracy of concrete compressive strength based on a comparison of data-mining techniques. J. Comput. Civ. Eng. 2011, 25, 242–253. [Google Scholar] [CrossRef]
Figure 1. Correlation coefficient matrix of each variable in the CC dataset.
Figure 1. Correlation coefficient matrix of each variable in the CC dataset.
Buildings 12 01406 g001
Figure 2. Correlation coefficient matrix of each variable in the RHA dataset.
Figure 2. Correlation coefficient matrix of each variable in the RHA dataset.
Buildings 12 01406 g002
Figure 3. Correlation coefficient matrix of each variable in the HSC dataset.
Figure 3. Correlation coefficient matrix of each variable in the HSC dataset.
Buildings 12 01406 g003
Figure 4. Correlation coefficient matrix of each variable in the MSC dataset.
Figure 4. Correlation coefficient matrix of each variable in the MSC dataset.
Buildings 12 01406 g004
Figure 5. Differences between ML and AutoML for concrete compressive strength prediction.
Figure 5. Differences between ML and AutoML for concrete compressive strength prediction.
Buildings 12 01406 g005
Figure 6. Auto-Sklearn algorithm framework.
Figure 6. Auto-Sklearn algorithm framework.
Buildings 12 01406 g006
Figure 7. The R 2 curves of Auto-Sklearn during training on the four datasets. (a) CC dataset, (b) RHA dataset, (c) HSC dataset, and (d) MSC dataset.
Figure 7. The R 2 curves of Auto-Sklearn during training on the four datasets. (a) CC dataset, (b) RHA dataset, (c) HSC dataset, and (d) MSC dataset.
Buildings 12 01406 g007
Figure 8. Comparison of predicted and true values of Auto-Sklearn for the training and test set. (a) CC dataset, (b) RHA dataset, (c) HSC dataset, and (d) MSC dataset.
Figure 8. Comparison of predicted and true values of Auto-Sklearn for the training and test set. (a) CC dataset, (b) RHA dataset, (c) HSC dataset, and (d) MSC dataset.
Buildings 12 01406 g008
Figure 9. Box plot of four performance evaluation index values for AutoML and ML algorithms. (a) RMSE, (b) MAE, (c) R 2 , and (d) MAPE.
Figure 9. Box plot of four performance evaluation index values for AutoML and ML algorithms. (a) RMSE, (b) MAE, (c) R 2 , and (d) MAPE.
Buildings 12 01406 g009aBuildings 12 01406 g009b
Table 1. Representative studies on compressive strength prediction using ML algorithms.
Table 1. Representative studies on compressive strength prediction using ML algorithms.
Ref.YearConcrete TypeSizeParametersData Preprocessing MethodFeature Preprocessing MethodHyperparameter Optimization Method Algorithm   ( R 2 )
[33]2022Concrete with RCA and GGBFS1255 NoneHuman expertHuman expertLR+RF (0.93)
ANN (0.71)
SVR (0.56)
RF (0.81)
[36]2022Conventional concrete, high-strength concrete, and recycled aggregate concrete3809 NormalizedHuman expertGSSVM (0.783)
ANN (0.939)
AdaBoost (0.950)
[37]2022CO2 concrete617NoneHuman expertHuman expertLR (0.88)
ANN (0.95)
[32]2022Steel-fiber-reinforced concrete67410NoneHuman expertGSSVR (0.684)
ANN (0.822)
RF (0.851)
AdaBoost (0.782)
XGBoost (0.886)
[11]2022Recycled and aggregate concrete7218NoneHuman expertHuman expertSVR (0.451)
XGBoost (0.850)
GB (0.835)
Particle swarm optimizationSVR (0.740)
XGBoost (0.872)
GB (0.875)
[12]2022Conventional concrete5599NoneHuman expertHuman expertSVR (0.853)
GSSVR (0.931)
[28]2022Conventional concrete10309NoneHuman expertGSXGBoost (0.95)
AdaBoost (0.93)
LKRR (0.96)
LGBM (0.95)
[38]2021Eco-friendly geopolymer concrete3515NoneHuman expertHuman expertANN (0.960)
GEP (0.920)
[39]2022Environmentally friendly concrete3247NoneHuman expertGrowing algorithmANN (0.993)
GEP (0.953)
[40]2021Cement-based mortars4246NoneSensitivity analysisHuman expertKNN (0.874)
SVM (0.4023)
RF (0.9439)
DT (0.8526)
AdaBoost (0.9473)
Table 2. Statistical information for CC datasets (data source: [45]).
Table 2. Statistical information for CC datasets (data source: [45]).
TypeVariableUnitMinimumMaximum ValueAverage ValueStandard Deviation
Independent variableCementkg/m3102.00540.00281.10104.54
Blast furnace slagkg/m30.00359.4073.9786.29
Fly ashkg/m30.00200.1054.2464.01
Waterkg/m3121.80247.00181.5521.35
Superplasticizerkg/m30.0032.206.215.97
Coarse aggregatekg/m3801.001145.00972.9277.79
Fine aggregatekg/m3594.00992.60773.5880.21
Agedays1.00365.0045.6263.19
Dependent variableCompressive strengthMPa2.3382.6035.8216.71
Table 3. Statistics of the RHA dataset (data source: [46]).
Table 3. Statistics of the RHA dataset (data source: [46]).
TypeVariableUnitMinimumMaximum ValueAverage ValueStandard Deviation
Independent variableAgedays1.0090.0034.5733.52
Cementkg/m3249.00783.00409.02105.47
Rice husk ashkg/m30.00171.0062.3341.55
Waterkg/m3120.00238.00193.5431.93
Superplasticizerkg/m30.0011.253.343.52
Aggregatekg/m31040.001970.001621.51267.77
Dependent variableCompressive strengthMPa16.00104.1048.1417.54
Table 4. Statistics of the HSC dataset (data source: [47]).
Table 4. Statistics of the HSC dataset (data source: [47]).
TypeVariableUnitMinimumMaximum ValueAverage ValueStandard Deviation
Independent variableCementkg/m3160.00600.00384.3593.01
Coarse aggregatekg/m3500.001486.00860.32102.21
Fine aggregatekg/m3342.001135.00806.21113.61
Waterkg/m3132.00302.08173.5715.56
Superplasticizer%0.0012.002.352.69
Dependent variableCompressive strengthMPa39.5091.3052.0110.15
Table 5. Statistical information for the MSC dataset (data source: [48]).
Table 5. Statistical information for the MSC dataset (data source: [48]).
TypeVariableUnitMinimumMaximum ValueAverage ValueStandard Deviation
Independent variableCompressive strength of cementMPa38.2055.2048.073.68
Tensile strength of cementMPa6.909.108.220.48
Curing agedays3.00388.0081.14101.82
Dmax (maximum grain size) of crushed stonemm20.0080.0031.7712.06
Stone powder content in sand%0.4020.008.024.65
Fineness modulus of sanddimensionless2.303.343.050.26
W/B (water–binder ratio)dimensionless0.250.560.430.08
Water–cement ratiomw/mc0.310.670.460.08
Waterkg/m3104.00291.00172.6920.76
Sand ratio%28.0044.0036.694.27
Slumpmm11.00260.0089.0362.09
Dependent variableCompressive strengthMPa18.4087.2054.4015.84
Table 6. Four optimal models searched by Auto-Sklearn.
Table 6. Four optimal models searched by Auto-Sklearn.
DatabaseNo.WeightsData Preprocessing MethodFeature Preprocessing MethodAlgorithmHyperparameters
CC dataset10.22NoneFeature agglomerationANN2 hidden layers with 210 neurons
20.18NoneRandom tree embeddingANN3 hidden layers with 73 neurons
30.18NoneRandom tree embeddingANN3 hidden layers with 80 neurons
40.12NoneFeature agglomerationANN1 hidden layers with 220 neurons
50.10NonePolynomialAdaBoostmax depth = 9,
estimators = 127,
learning rate = 0.07
60.08NoneFeature agglomerationANN2 hidden layers with 27 neurons
70.04NoneFeature agglomerationANN2 hidden layers with 229 neurons
80.04NoneFeature agglomerationANN3 hidden layers with 252 neurons
90.02NoneFeature agglomerationAdaBoostmax depth = 8,
estimators = 301,
learning rate = 0.21
100.02NoneNoneANN2 hidden layers with 189 neurons
RHA dataset10.32NonePolynomialANN2 hidden layers with 261 neurons
20.18NonePolynomialANN2 hidden layers with 264 neurons
30.14NonePolynomialANN3 hidden layers with 257 neurons
40.10NonePolynomialANN1 hidden layers with 226 neurons
50.10NonePolynomialANN1 hidden layers with 141 neurons
60.06NonePolynomialANN2 hidden layers with 257 neurons
70.04NonePolynomialANN3 hidden layers with 256 neurons
80.04NonePolynomialANN1 hidden layers with 230 neurons
90.02NonePolynomialANN1 hidden layers with 226 neurons
HSC dataset10.40NoneNoneANN2 hidden layers with 37 neurons
20.22NonePolynomialGaussian processAlpha = 0.011, thetaL = 4.609 × 10−7,
thetaU =1.02
30.12NonePolynomialGaussian processAlpha = 0.011, thetaL = 6.437 × 10−7,
thetaU = 78.86
40.12NoneFeature agglomerationANN3 hidden layers with 32 neurons
50.10NoneFeature agglomerationANN3 hidden layers with 35 neurons
60.02NoneNoneGaussian processAlpha = 0.011, thetaL = 7.733 × 10−7,
thetaU = 2.796
70.02NoneFeature agglomerationANN3 hidden layers with 29 neurons
MSC dataset10.52NoneFeature agglomerationGradient boostingmax leaf nodes = 5, learning rate = 0.08
20.18NoneFeature agglomerationGradient boostingmax leaf nodes = 4, learning rate = 0.08
30. 16NoneFeature agglomerationGradient boostingmax leaf nodes = 4, learning rate = 0.08
40.14NoneFeature agglomerationGradient boostingmax leaf nodes = 5, learning rate = 0.02
Table 7. Performance evaluation index values of Auto-Sklearn.
Table 7. Performance evaluation index values of Auto-Sklearn.
DatabaseRMSEMAE R 2 MAPE
CC dataset3.7672.6360.9380.097
RHA dataset4.1333.3270.9680.073
HSC dataset3.0921.5790.9140.028
MSC dataset1.5581.1390.9910.025
Table 8. Performance evaluation index values of the five ML algorithms.
Table 8. Performance evaluation index values of the five ML algorithms.
DatasetAlgorithmHyperparameterRMSEMAE R 2 MAPE
CC datasetANN3 hidden layers with 231 neurons4.9143.4690.8940.122
SVRC = 417, ε =0.115.7484.0400.8540.163
RFmax depth = 14,
max features = 3
4.9873.6010.8900.139
AdaBoostmax depth = 10,
estimators = 62,
learning rate = 0.07
4.9293.5140.8930.140
XGBoostmax depth = 9, learning rate = 0.114.0192.7470.9290.093
RHA datasetANN1 hidden layer with 24 neurons4.7683.5690.9310.067
SVRC = 13, ε = 0.046.0414.2270.8900.104
RFmax depth = 10,
max features = 7
6.0294.2000.8920.096
AdaBoostmax depth = 9,
estimators = 143,
learning rate = 1.22
5.6084.4210.9040.102
XGBoostmax depth = 13, learning rate = 0.753.3562.4570.9660.056
HSC datasetANN3 hidden layers with 94 neurons4.1751.7770.8430.0305
SVRC = 440, ε = 0.083.8542.1670.8660.039
RFmax depth = 12,
max features = 5
4.1532.5560.8450.047
AdaBoostmax depth = 10,
estimators = 414,
learning rate = 0.03
4.3052.3050.8330.041
XGBoostmax depth = 11, learning rate = 0.023.351.750.8990.03
MSC datasetANN2 hidden layers with 35 neurons2.3201.7490.9810.036
SVRC = 195, ε = 0.093.1992.4660.9630.054
RFmax depth = 18,
max features = 5
3.4372.1420.9570.051
AdaBoostmax depth = 7,
estimators = 423,
learning rate = 1.70
3.3092.0390.9600.048
XGBoostmax depth = 16, learning rate = 0.902.4431.5360.9780.316
Table 9. Performance evaluation index values from related works.
Table 9. Performance evaluation index values from related works.
DatasetRef.Data Preprocessing MethodFeature Preprocessing MethodHyperparameter Optimization Method Algorithm   ( R 2 )
CC[45]NoneNoneHuman expertANN (0.922)
[28]NoneShapley additive explanationsGSXGBoost (0.95)
AdaBoost (0.93)
Extra tree (0.94)
Decision tree (0.86)
LKRR (0.96)
LGBM (0.95)
[62]NoneNoneHuman expertANN (0.9025)
SVM (0.8836)
MART (0.9025)
Present studyAutomaticAutomaticAutomaticAuto-Sklearn (0.938)
RHA[46]NoneNoneHuman expertGEP (0.940)
RF (0.913)
Present studyAutomaticAutomaticAutomaticAuto-Sklearn (0.938)
HSC[47]NoneNoneHuman expertGEP (0.90)
DT (0.90)
ANN (0.89)
RF (0.96)
Present studyAutomaticAutomaticAutomaticAuto-Sklearn (0.914)
MSC[48]NoneNoneNoneFitted formula (0.858)
Present studyAutomaticAutomaticAutomaticAuto-Sklearn (0.991)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Shi, M.; Shen, W. Automatic Modeling for Concrete Compressive Strength Prediction Using Auto-Sklearn. Buildings 2022, 12, 1406. https://doi.org/10.3390/buildings12091406

AMA Style

Shi M, Shen W. Automatic Modeling for Concrete Compressive Strength Prediction Using Auto-Sklearn. Buildings. 2022; 12(9):1406. https://doi.org/10.3390/buildings12091406

Chicago/Turabian Style

Shi, M., and Weigang Shen. 2022. "Automatic Modeling for Concrete Compressive Strength Prediction Using Auto-Sklearn" Buildings 12, no. 9: 1406. https://doi.org/10.3390/buildings12091406

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop