Towards a Reliable Design of Geopolymer Concrete for Green Landscapes: A Comparative Study of Tree-Based and Regression-Based Models

Wang, Ranran; Zhang, Jun; Lu, Yijun; Ren, Shisong; Huang, Jiandong

doi:10.3390/buildings14030615

Open AccessArticle

Towards a Reliable Design of Geopolymer Concrete for Green Landscapes: A Comparative Study of Tree-Based and Regression-Based Models

by

Ranran Wang

¹,

Jun Zhang

^1,*,

Yijun Lu

²,

Shisong Ren

^3,* and

Jiandong Huang

^2,*

¹

School of Fine Arts and Design, Guangzhou University, Guangzhou 510006, China

²

School of Civil Engineering, Guangzhou University, Guangzhou 510006, China

³

Faculty of Civil Engineering and Geosciences, Delft University of Technology, 2628 Delft, The Netherlands

^*

Authors to whom correspondence should be addressed.

Buildings 2024, 14(3), 615; https://doi.org/10.3390/buildings14030615

Submission received: 12 January 2024 / Revised: 21 February 2024 / Accepted: 24 February 2024 / Published: 26 February 2024

(This article belongs to the Section Building Materials, and Repair & Renovation)

Download

Browse Figures

Versions Notes

Abstract

:

The design of geopolymer concrete must meet more stringent requirements for the landscape, so understanding and designing geopolymer concrete with a higher compressive strength challenging. In the performance prediction of geopolymer concrete compressive strength, machine learning models have the advantage of being more accurate and faster. However, only a single machine learning model is usually used at present, there are few applications of ensemble learning models, and model optimization processes is lacking. Therefore, this paper proposes to use the Firefly Algorithm (AF) as an optimization tool to perform hyperparameter tuning on Logistic Regression (LR), Multiple Logistic Regression (MLR), decision tree (DT), and Random Forest (RF) models. At the same time, the reliability and efficiency of four integrated learning models were analyzed. The model was used to analyze the influencing factors of geopolymer concrete and determine the strength of their influencing ability. According to the experimental data, the RF-AF model had the lowest RMSE value. The RMSE value of the training set and test set were 4.0364 and 8.7202, respectively. The R value of the training set and test set were 0.9774 and 0.8915, respectively. Therefore, compared with the other three models, RF-AF has a stronger generalization ability and higher prediction accuracy. In addition, the molar concentration of NaOH was the most important influencing factors, and its influence was far greater than the other possible factors including NaOH content. Therefore, it is necessary to pay more attention to NaOH molarity when designing geopolymer concrete.

Keywords:

ensemble learning model; beetle antennae search; geopolymer concrete; NaOH molarity

1. Introduction

At present, cement is used as the main gelling agent in traditional concrete soil foundations. The gel produced by the hydration reaction of cement condenses and consolidates other materials in the concrete, thus forming a stable skeleton structure. However, the production of cement requires a large amount of energy and produces a large amount of carbon dioxide during the preparation process. Geopolymer concrete uses geopolymer materials as the main gelling agent [1,2,3,4]. Compared with traditional cement concrete, its material and preparation process have changed. The main components of geopolymer are silicate, aluminate and other alkaline active substances, and polymer formed by the reaction of silicic acid and aluminic acid [5,6]. In addition, compared with traditional cement, industrial by-products that are now treated as waste can be used as feedstocks for geopolymers. This will not only effectively reduce the carbon emission problem, but also decrease the need for waste disposal methods [7,8,9]. Therefore, it can effectively reduce the demand for natural resources and damage to the environment. Geopolymer concrete condenses a skeleton in the mixture by forming a three-dimensional polymer grid structure. Compared with traditional cement concrete, it has a higher compressive strength, tensile strength, and impermeability strength, so as to meet higher engineering structure requirements. On the other hand, geopolymer concrete has better corrosion resistance to acids, bases, salts, and other chemicals [10,11,12]. Therefore, geopolymer concrete can still maintain a high durability in the face of unusual environments, such as acidic soils and high salinity seawater, which are corrosive to concrete materials. In summary, promoting the use of geopolymer concrete is conducive to reducing carbon emissions and the implementation of environmentally optimized construction.

On the other hand, the difference between decorative low-carbon concrete and ordinary concrete is that the use of low-carbon concrete emphasizes the artistry of modeling, the specificity of the materials, and the reuse of solid waste materials [13]. From the point of view of concrete, the so-called artistry is the standardization and industrialization of the concrete’s comparative personality and aesthetic characteristics. The so-called material specificity refers to the effects other than the conventional texture of concrete products, such as stone effects, wood effects, metal effects, and so on [14]. While considering aesthetics, low-carbon concrete also considers structural factors, environmental factors, economic factors, safety factors, construction period factors, as well as the relationship with sound, lighting, ventilation, water features, and being green. However, there is trouble when polymer concrete is applied to artistic landscapes [15]. This is because the design of landscape polymer concrete is often subjected to the design needs of different architects, who have more stringent requirements for the mechanical properties of concrete; therefore, understanding and designing higher compressive strength polymer concrete is challenging [10,16,17,18].

At present, the research on geopolymer concrete focuses more on the change in compressive strength in different environments. After testing, the specific mix ratio and configuration process of the geopolymer concrete will have an important effect on its compressive strength. Through the adjustment of these two parameters, the strength of the geopolymer concrete can even exceed that of ordinary concrete. The type of geopolymer, active substance, gel content, and curing time all affect the performance of the concrete. Under normal circumstances, due to the above factors, the compressive strength of geopolymer concrete can be adjusted within a wide range, from 20 MPa to 100 MPa. Therefore, the prediction of geopolymer compressive strength is a very complicated task. There are many influencing factors, so it is necessary to choose a suitable model or method for estimation and prediction. Table 1 shows some common methods.

Compared with other traditional models, machine learning models have higher accuracy and reliability when dealing with complex and nonlinear prediction problems. Furthermore, machine learning models can select the features most relevant to the prediction task, reducing human intervention in the feature selection. In addition, machine learning models can provide more intuitive explanations and visualizations of the process, and more clearly express the origin of the results. Finally, it can process new data in real time and update the model, producing a better generalization ability. Consequently, the unique characteristics of geopolymer concrete can be predicted more rapidly and precisely, requiring lower labor and time expenses that those demanded by traditional statistical models [16]. At the same time, machine learning models can provide more comprehensive predictions since the prediction of geopolymer compressive strength is a very complicated task. There are many influencing factors, so it is necessary to choose a suitable model or method for estimation and prediction [37]. Many factors can affect the properties of geopolymer concrete, and the relationship between these factors is often nonlinear [39]. Therefore, it is challenging to uncover the underlying relationships using traditional statistical models. Machine learning models can be continuously trained and optimized to capture these nonlinear relationships and enhance the prediction accuracy [40]. In contrast, despite the machine learning model’s higher prediction efficiency and accuracy, it relies on a substantial amount of training data. The quality and quantity of the datasets have a restrictive effect on the prediction ability of the model. Poor data quality or insufficient data will make it difficult for the model to discover hidden relationships between data in the training stage [41,42,43,44]. In addition, if the dataset used for training is small or the machine learning model is too complex, the prediction results may be overfitted [45,46,47,48]. That is, the model’s training effect fits well, but the prediction ability is poor. Conversely, if the model is too simple or the data quality is poor, with noise or outliers, it will lead to underfitting of the prediction effect, which manifests as a poor prediction performance in both the training and test sets, deviating from the actual value. In order to fully discuss the prediction ability of the geopolymer concrete compressive strength prediction model, Gupta et al. compared artificial neural network (ANN), multiple linear regression, and multiple nonlinear regression (MNLR) models. The results showed that ANN had the highest prediction accuracy and reliability. Its predictive performance was far superior to that of the other two models [16]. Awoyera et al. compared genetic programming (GEP) with artificial neural network techniques to find the predictive models with the highest accuracy. The experimental results showed that both GEP and ANN have good prediction results, and the prediction error of the ANN model for geopolymer concrete strength index is minimal [49]. Tanyildizi et al. artificially predicted the ground polymerization process of a polymer based on fly ash, and used deep long- and short-term memory and machine learning to conduct data analysis and prediction [50]. Ayar et al. used gene expression programming (GBEP) models to develop predictive models for mechanical properties of dense polymer concrete. The predicted results were very close to the actual values [51]. Ayana et al. prepared geopolymer concrete with different industrial wastes and curing methods, and measured its strength value. The experimental data were analyzed by a supervised machine learning model. Under the condition of 28-day curing, the prediction accuracy of the model reached 98.79% [52]. Paruthi et al. used four different machine learning models to determine the compressive strength of geopolymer concrete. These models were gradient boosting machine (GBM), generalized linear model (GLM), extremely randomized trees XRT, and deep learning (DL). The experimental results show that the GBM model has the highest prediction accuracy [53]. Nguyen et al. used a Deep Neural Network (DNN) model to analyze the compressive strength of green fly ash geopolymer concrete. It can be seen from the experimental results that the model has a high prediction accuracy. Therefore, the model can be effectively applied to adjust the mix ratio of polymer concrete with a fly ash base [54]. Ahmed et al. used five different machine learning models to predict the compressive strength of nanoparticle-modified geopolymer concrete. The models were a linear regression model (LR), multi-expression programing (MEP) model, full quadratic (FQ) model, artificial neural network (ANN), and M5P-tree (M5P) model. The experimental results showed that the ANN model had the best prediction effect [55].

In this study, the Firefly Algorithm (FA) was used as the hyperparameter adjustment algorithm. It was integrated with the model used in the experiment to form a composite model. The four analytical models used in this study were Logistic Regression (LR), Multiple Logistic Regression (MLR), decision tree (DT), and Random Forest (RF). In order to effectively determine the design parameters for geopolymer concrete, the prediction results of four composite learning models were compared. The importance of the input variables was evaluated.

It must be acknowledged that datasets often contain noise, outliers, or incomplete data. These issues are an important reason for the decline in the accuracy of traditional machine learning models [48,56]. Moreover, as traditional models can only learn and train from limited data, they may encounter issues such as underfitting or overfitting due to their limited generalization abilities [44,47]. Therefore, when utilizing a single machine learning model, it is essential to continuously adjust model parameters based on the actual situation and even replace the machine learning model if necessary [56,57,58,59]. Different problems may require different models that are more suitable, but selecting the appropriate model for a given problem demands substantial knowledge and experience. Identifying and testing the model’s accuracy consumes considerable time and resources, and an incorrect choice can lead to unsatisfactory results and a failure to meet the problem’s needs. Therefore, the integrated learning model was used as a predictive tool for the compressive strength of geopolymer concrete. Through the ensemble learning model, it is possible to synthesize the prediction outcomes of multiple base models using techniques like majority voting and weighted averages to produce the final prediction results. This effectively mitigates the deviation in results caused by the limited forecasting capability of a single model, thereby enhancing the overall prediction accuracy and stability. By incorporating multiple diverse models, the overfitting issue in prediction effectiveness can be significantly reduced, and the model’s generalization ability can be effectively bolstered to deal with unknown data. Since the prediction outcomes of multiple models are combined, the impact of noise and outliers on the overall predictive capability can be efficiently circumvented. Hence, when faced with intricate problems and extensive datasets, ensemble learning models can tackle problems more effectively and enhance the model’s expressiveness and predictive power.

2. Research Overview

Figure 1 shows a thumbnail of the experimental process. The materials for the geopolymer concrete were selected before the experiment. Thus, the influence of material factors on the compressive strength was adjusted. In order to enhance the model’s generalization capability, two distinct binders were chosen for this study: fly ash and ground granulated blast-furnace slag (GGBS). NaOH and other alkaline materials were incorporated as catalysts to expedite sample formation. The data were obtained using compressive strength tests. After evaluation and analysis, the obtained data (371) were divided into two parts, with 70% (260) of the data used as the training set of the machine learning model and the remaining 30% (111) used as the test set. Since the intensity of the association between the input variables has an impact on the learning effect of the machine learning model, a correlation analysis of the change of the input variables is was out before the data were introduced into the learning model. In this study, the experimental data from tree-based models and regression-based models were compared to arrive at a model with more accurate predictive effects. Since the hyperparameters of the model need to be set manually, which may affect the prediction accuracy of the model, FA was introduced as a hyperparameter tuning tool to upgrade the prediction model. The 10-fold CV was used to evaluate the hyperparameter tuning effect on the predicted data. Finally, the prediction results of the two prediction models were compared by Monte Carlo simulations and Taylor diagrams, and for the problem of predicting the compressive strength of the geopolymer concrete, the model with the best prediction effect was verified. The innovation of this study is that it is the first study to compare regression- and tree-based models for the prediction of the properties of geopolymer materials. It is well known that there is a strong nonlinearity between the composition and properties of geopolymer materials; in particular, the database used in this study contained the content of each component of the material and the composition of the excitation material. Regression-based models are able to learn nonlinear relationships between features in the original space based on the automatic discovery of generalizable patterns based on data, and have a greatly improved efficiency and accuracy compared to manual work; the configuration of these strong features (such as the direct composition of each component material of the polymer in this study) into a cascading pattern helps to improve the convergence of the model. However, in terms of the residual sequence (because the regression model cannot analyze the residual), the prediction effect of these regression-based models and tree-based models may have obvious disadvantages, which is a problem to be solved in this study, which is also one of the reasons why tree-based models and regression-based models were compared in this study. This study is the first to use variable step size FA to adjust the hyperparameters of tree-based models. To the authors’ knowledge, no previous study has tried to predict the compressive strength of geopolymer concrete in this way. This consideration is also based on the previously mentioned strong nonlinear relationship between component parameters and performance, which easily leads to the problem of local optimal solutions or slow convergence of traditional FA-RF and FA-DT. At the same time, previous studies often used traditional evolutionary machine learning algorithms to emphasize the improvement of global search capabilities while ignoring the balance between global and local searches, which led to the ambiguity of the comparative significance of tree-based and regressor-based models, which is the second innovation of this study.

3. Methodology

3.1. Determination of the Input Variables

Many factors affect the formation of strength of polymer concrete, among which additive is an important factor [32]. These additives include aggregates, gel materials (such as fly ash, slag, metallurgical slag, etc.), and activators. In addition, different concrete combinations have different effects on the formation of the compressive strength of geopolymer concrete. Therefore, in order to how to effectively improve the compressive strength of geopolymer concrete, the optimum mix of raw materials was explored. Gel materials play a crucial role in the formation of concrete strength. The right gel material can form a three-position network structure in the mixture and absorb the water in the mixture, thereby increasing the viscosity. It can act as a lubricant in the particle, reducing the friction between different sizes of particles. It can effectively improve the flow of the mixture so that the mixture distribution is more uniform, and improve the compressive strength of the material. Traditional cement concrete utilizes cement as a gel material, which not only requires substantial resource investment in the production process but also leads to increased environmental pollution. In contrast, the geopolymer concrete discussed in this study employs silicate and other materials as gel materials. Incorporating an appropriate amount of activator, it can effectively promote the gel reaction. This aids in the formation of a robust skeletal structure and enhances the formation of strength. The type and dosage of activator have significant influence on the forming of concrete compressive strength. Hence, it is imperative to choose different activator components and their amounts for various gel materials to optimize their effectiveness and enhance the compressive strength of the concrete. At the same time, the size parameters of the aggregate also affect the formation of concrete strength. Aggregates act as particle fillers in geopolymer concrete and constitute the primary load-bearing component of the concrete. Physical characteristics such as particle size distribution, shape, and surface characteristics play a key role in the formation of the internal structural skeleton of concrete, which further affects the compressive strength of the concrete.

Based on the above analysis, the material design parameters of geopolymer concrete were selected and analyzed. Firstly, the reinforcement materials in the mixture were selected, and the content of fly ash [9] and ground granulated blast-furnace slag (GGBS) [8] were selected as the influencing factors of the model. The above two geopolymer materials act as gel materials in geopolymer concrete, and their main components are CaSiO₃ and CaO·Al₂O₃·2SiO₂. Na₂SiO₃ [7], NaOH [7], and NaOH molarity [60] were the activator design parameters; the contents of the fine aggregate, gravel (4/10 mm), and gravel (10/20 mm) [60] and the water–solid ratio were used as the design parameters that affect the strength of concrete. These indexes were used as input parameters to the compressive strength model to predict the geopolymer concrete strength. Table 2 summarizes the specific input variables. Table 3 shows the numerical ranges of the input parameters used in this study.

3.2. Data Pre-Processing

Data preprocessing is an extremely important and indispensable part of data mining and machine learning projects.

In the actual application system, the data in the database are easily disturbed by external and internal factors, such as noise, data loss and inconsistency of data types. Therefore, there are usually some special values in the obtained raw data that will affect the prediction model. Raw data were collected from multiple sources; different data sources are stored and organized in different ways, so there are differences in how the data are presented. Duplicate data refers to the existence or more similar or identical data in the database. With the accumulation of data with high similarity, the prediction effect will produce a bias effect, which will affect the prediction ability of the model. The incomplete data are mainly reflected in the data are not accurate enough and the data are lost in the transmission and recording.

Therefore, in order to improve the accuracy of the model, it is necessary to preprocess the data. Faced with this kind of noise data, the commonly used processing methods are box segmentation, regression, and clustering. Box segmentation is the local smoothing of ordered data values by the values around the data; regression method uses a regression function or model to smooth the data. The clustering method collects similar values and organizes them into clusters or class clusters. Values that fall outside the community are considered noise or outliers and can simply be removed. Thus, the phenomenon of overfitting or underfitting of the prediction model can be effectively reduced.

3.3. Data Collection and Analysis

Table 1 delineates the various types of input variables, while also integrating and categorizing the design parameters and compressive strength data of polymer concrete obtained by previous researchers. The data set used for model training is increased, thereby improving the prediction accuracy of the model [62,63]. The Pearson correlation coefficient was used to test the correlation between these input variables, so as to prevent excessive correlation among input parameters. Figure 2 shows the test results.

Correlation coefficients are useful in determining the extent of linear correlation between different variables. In a machine learning model, multicollinearity can lead to instability and decreased predictive accuracy. As seen in Figure 2, the correlation coefficient between identical variables is 1. Taking the absolute value of the correlation coefficient (1) as the criterion, the closer the score between two different variables is to 1, it means that the two variables have a strong correlation. Specifically, the three correlation coefficients indicated by arrows in Figure 2 exceed 0.5, with the correlation coefficients between gravel 4/10 mm and gravel 10/20 mm being −0.81. This indicates a notably strong negative correlation between these two parameters. Overall, the majority of the correlations were below 0.4 in absolute value, indicating a weak linear correlation between the variables. Hence, it can be inferred that the machine learning prediction model is unlikely to encounter instability. However, more importantly, there were no simple linear relationships between the input parameters, and the potential nonlinear correlation will affect the prediction effect of the model. In certain scenarios, the relationship between variables not only has the linear relationship shown in the data, but may also have hidden nonlinear relationships. Therefore, the size of the correlation coefficient cannot be used unilaterally to show whether there is correlation between two variables. Based on Figure 2, the current data indicate that the input variables in the selected databases were generally independent of each other, making them suitable for the machine learning model used in this study. However, there may be nonlinear relationships (for example, there is a certain constraint relationship between independent variables), but considering the low correlation between the input parameters, it will not affect the machine learning process.

3.4. Description of ML Models

3.4.1. Single Models

DT has the function of a classification and regression algorithm [64,65], and is a commonly used machine learning algorithm. Its operation principle is to distinguish and classify the input data by constructing the decision process as a tree structure. The decision tree consists of root nodes, inner nodes, and leaf nodes. Data enter the model through the root node, is evaluated and split by internal nodes, and the outcomes are passed on to the next internal node or leaf node. Each node in the decision tree represents a category of data or a predicted direction for that data [65]. The operation process of decision tree is to classify the input eigenvalues step by step. After multiple judgments, they are divided into categories or predicted values that represent different characteristics [66].

The RF model is an ensemble learning method based on decision trees [67,68]. Random forests operate by generating numerous decision trees and combining their predictions to carry out classification and regression tasks. It creates diverse training subsets by randomly selecting samples from the original training data, and each subset corresponds to a decision tree. A random forest comprises a large number of these decision trees. Each independent decision tree produces distinct prediction outcomes, categorizes all the results, and then further processes the data to achieve the ultimate prediction result [67,68,69].

Logistic Regression (LR) is a generalized linear regression analysis model primarily utilized for binary classification tasks. In traditional linear regression, the output can range from negative infinity to positive infinity. However, LR employs a nonlinear transformation to confine the output within the interval of (0, 1). This transformation establishes a threshold of 0.5 for categorization. If the output is less than 0.5, it is classified as category 0, and if the output is greater than or equal to 0.5, it is classified as category 1. After classifying and statistically analyzing these two outcomes, the final prediction results are obtained [70].

Multiple Logistic Regression (MLR) is an extension of the LR model. If the classification space is inherently nonlinear, MLR divides the space into multiple regions, each of which is fitted using a linear approach [71,72,73]. The output of MLR is then obtained by taking a weighted average of the results from all the regions. Each small region corresponds to an independent LR model [74,75]. Because the actual computation of the MLR model is integrated from multiple LR models, it has good nonlinear fitting capabilities and good generalization abilities for unseen values when the number of regions, i.e., the number of LR models, is sufficient. However, traditional gradient descent algorithms are not suitable for LR and MLR models, and these models may encounter non-convergence issues during data processing [70].

3.4.2. Ensemble Models

The ensemble models were proposed based on the variable step size FA to adjust the hyperparameters of tree-based models. The principle of the FA is based on the behavior that fireflies attract each other through the intensity of light, changing their trajectory. Among them, the fireflies with weak luminance will fly towards the fireflies with the highest brightness within the detection range, and if the brightness levels of the two are the same, they will move randomly. On the other hand, the attraction of light decreases with increasing distance. At the same time, fireflies with high luminance will also perform random movements. After multiple iterations of position updates, the firefly with weak luminance will converge to the firefly with the highest luminance, which is the optimal solution of the function.

The specific calculation process is as follows:

I = I_{0} e^{- γ r_{i j}}

(1)

where I represents the relative fluorescence intensity of fireflies; I₀ represents the brightness of the brightest firefly, that is, its own (r = 0) fluorescence brightness;

γ

represents the light absorption coefficient, and since the attraction of fluorescence decreases with distance in the underlying assumption, this property is represented by gamma, set to a constant and

r_{i j}

indicates the distance between firefly i and j.

The mutual attraction degree β is calculated as follows:

β (r) = β_{0} e^{- γ {r_{i j}}^{2}}

(2)

where

β_{0}

is the attraction at the light source (r = 0).

The optimal target iteration is calculated as follows:

x_{i} (t + 1) = x_{i} (t) β (x_{j} (t) - x_{i} (t)) + α (r a n d - 1 / 2)

(3)

where

x_{j}

and

x_{i}

represent the spatial positions of two fireflies j and i, respectively;

α

is the step factor; and rand is a random factor uniformly distributed in the range [0, 1].

It should be noted that each firefly is randomly distributed in the search space of the objective function. The brightness of each firefly is related to the fitness value corresponding to the objective function of its location. The brighter the firefly, the better the objective function value corresponding to its position. Each firefly is attracted to the brighter fireflies and moves to obtain a better solution. With the continuous evolution of the population, the effective solution of the optimization problem is finally obtained. In this way, optimized hyperparameters can be obtained over a wider range.

3.5. Evaluation of Predictive Performance

The performance and generalization ability of machine learning models are difficult to judge directly from the prediction results [76]; thus, N-fold cross-validation was used to evaluate the performance of the models. First, the method initially divides the original data into n equal subsets. One subset is then used to verify the prediction and the rest is used to train the model. This process is repeated n times successively to obtain n evaluation values, and the average value is taken as the performance indicator of the model. In this study, 10-fold cross-validation was used to improve the reliability of the model. For quantitative comparison tabulation, standard deviation [77], correlation coefficient (R) [78], and root mean square error (RMSE) [51] were used as comparison parameters of the model’s ability to predict the compressive strength of geopolymer concrete. Grid search and random search may occupy a large computational space; therefore, this study used a heuristic algorithm combined with 10-fold cross-validation for the hyperparameter tuning [79].

4. Analysis of Results

4.1. Results of the 10-Fold Cross-Validation

Ten-fold cross-validation can effectively evaluate the generalization ability of a model, which refers to its processing strength in the face of previously unseen data. The generalization ability of the model can be shown by comparing the test results, thereby avoiding the problem of overfitting or underfitting. Underfitting is manifested in the model prediction data as low coincidence between the model training data and the original data, and there is a certain gap between the results of the test dataset and the actual data. Overfitting means that the model has a good fitting effect on the predicted results in the training stage, but the gap between the predicted results and the actual values in the test stage is too large. The graph shown in Figure 3 was generated with LR model data as the standard values. From the figure, the LR model, which serves as the standard value, has a value close to or equal to 1 every time. The area surrounded by the curve in Figure 3 is the most intuitive representation of the prediction effect of the model. The larger the area enclosed by the curve, the worse the model’s predicted value fits the actual value. The improved MLR model reduced the overall image area, which proves that the MLR model has better prediction effects. Compared with the LR model, the RF model had an RMSE value very close to 0, indicating that the RF model had very small deviations in the prediction results and has better generalization performance.

4.2. RMSE Values for Increasing Iteration Times

Figure 4 shows the RMSE values for the DT-FA, RF-FA, LR-FA, and MLR-FA models. As can be seen from the figure, the regression-based model cannot be hyper-tuned because the LR and MLR models do not apply traditional convergence algorithms. Therefore, as the number of iterations was increasing, the change in the RMSE value was stable. However, the RMSE values of the DT and RF models changed after the hyperparameter adjustment of the Firefly Algorithm. When the number of iterations was small, the change trend of both the DT model and RF model decreases rapidly with the increase in iterations and the numerical change of the model tended to be stable when the iterations reached a certain level. The change trend of the number of the model indicates the learning situation of the model. The rapid decline in values means that the model gradually fits the data. There are two possibilities for the numerical plateau period: one is that the model trapped in the case of local optimal solutions; the other is that the performance modification amplitude of the model changes little with the increase of iterations. When FA grows to the largest generation, the optimal compressive strength value is found and its corresponding hyperparameter is considered the optimal hyperparameter. Then, the whole training set is used to train the classification model with optimal hyperparameters, and its predictive performance is evaluated on the test set. Regarding the proposed DT-FA model, min_samples_split (within the empirical scope of [1, 10]) and min_samples_leaf (within the empirical scope of [2, 10]) are the hyperparameters which should be optimized. The parameter min_samples_split determines the minimum number of split samples that can continue to be separated while the parameter min_samples_leaf indicates the minimum number of leaf node samples. The initial values of the min_samples_split and min_samples_leaf were set as 25 and 50, respectively. They were optimized to 2 and 3 after the hyperparameter optimization. Regarding the proposed RF-FA model, tree_num (within the empirical scope of [1, 10]) and min_samples_leaf (within the empirical scope of [1, 10]) are the hyperparameters which should be optimized. The parameter tree_num represents the number of classification trees and min_samples_leaf specifies that each child of a node after a branch must contain at least min_samples_leaf training samples, otherwise the branch will not occur. The initial values of the parameters were set as 30. The parameters of tree_num and min_samples_leaf were determined to be 7 and 1, respectively. For the LR model, tol and C_inverse were selected to be within the range of [1 × 10⁻⁵–1 × 10⁻³] and [0.1–10]. They represent the tolerance for the stopping criteria and inverse of the regularization strength. The optimized parameters of tol and C_inverse were determined to be 8 and 1.3 × 10⁻⁴, respectively.

4.3. Predictive Results of the DT-FA, RF-FA, LR-FA, and MLR-FA Models

Figure 5 depicts the comparison of the predicted compressive strength from the DT-FA, RF-FA, LR-FA, and MLR-FA models with the actual compressive strength. Compared to the two regression-based models, the prediction accuracy of the two tree-based machine learning models was better, with the predicted strength being closer to the curve of “1:1” in terms of actual compressive strength. The RMSE values of the regression-based prediction models were all greater than those of tree-based models. This means that the prediction errors of the two regression-based models were larger and the prediction accuracy was lower.

As shown in the figure, the data points are relatively scattered and generally concentrated on low values. After comparison, it can be found that the R value of the two model test sets was higher than that of the training set, which may be caused by the overfitting phenomenon. Perhaps because the complexity of the model is too high, the model overfits the training data, so that it cannot generalize to new data. On the other hand, the data points in the prediction results of the LR-FA model were mostly distributed in the regions with low compressive strength, which indicates that the LR-FA model may tend to predict lower compressive strength values.

From the scatter plot in Figure 5, we can see that there are a few outlier data points that deviate from the expected value. Firstly, the data were preprocessed to reduce the influence of special values on the accuracy of the model. In the face of this type of noise data, the current common processing methods include box division, regression, and clustering. The box-division method locally smooths the values of the ordered data by the values around the data; the regression method smooths the data by using regression functions or models; the clustering method gather similar values and organize them into clusters or class clusters; and the values that fall outside the community are regarded as noise or outliers and can be directly deleted.

However, the data points of the two tree-based models are evenly distributed near the “1:1” curve, and their R values are closer to 1. As can be seen from Table 4, the R values of the RF-FA model training and test sets were the largest, which were 0.9774 and 0.8915, respectively. Therefore, it can be proved that this prediction model has a good application prospect in the strength prediction of geopolymer concrete.

As is evident from Figure 6, in the training set and the test set, there is a large difference in R-value performance between the tree-based models and the regression-based models. In the training set, the R values of the regression-based models were concentrated in the range of [0.4, 0.6], and most of them were around 0.53. The training set R values of the DT-FA model and RF-FA model were mostly concentrated in [0.9, 1.0]. In the test set, the R value of the regression-based model improved, but the improvement was not large, and the maximum value of R increased to 0.65. The DT-FA model and RF-FA model showed a small decrease. This means that the regression-based model overfits the data during training, resulting in a situation where the R value of the test set is better than that of the training set. At the same time, it was proven that tree-based models have better prediction accuracy for the compressive strength of geopolymer concrete.

Figure 7 depicts the difference in residual values between the four models. It is clear from the figure that the overall trend of the DT and RF model data is lower than that of the LR and MLR models. Although the DT and RF models had similar median positions, the RF model residuals were generally smaller overall. Therefore, of the four models evaluated, the Random Forest (RF) model showed the best performance in predicting the compressive strength of geopolymer concrete.

4.4. Models Comparison

Figure 8 compares the four models in a Taylor plot, where the three axes of the Taylor plot represent the RMSE, standard deviation, and correlation coefficient values.

Using a circular coordinate system, a Taylor diagram can display the difference between the predicted and observed data of multiple models in different coordinate systems. The position of the graph represents the actual data situation of the model in three different coordinates. When the point predicted by the model is closer to the actual observed data point, it means that the degree of dispersion of the predicted value is smaller and the correlation coefficient is higher. Therefore, by comparing the positions of different model points, the performance of different models can be intuitively compared, their differences in forecasting ability and correlation can be understood, and the most suitable model for this problem can be determined. As shown in the figure, although the standard deviation of the DT-FA model and the RF-FA model was larger than that of the MLR-FA model and the LR-FA model, it means that the former two models are slightly more scattered than the latter two. The overall comparison is similar to the previous chart. The predicted values of the RF-FA model and the DT-FA model were closer to the actual values, and the correlation coefficients were closer to 1. Therefore, both RF-FA model and DT-FA model showed better performance in predicting the compressive strength of geopolymer concrete, both in terms of prediction accuracy and reliability. In general, the RF-FA model had the best prediction effect, and the difference between its predicted and the actual compressive strength of geopolymer concrete was the smallest, although its standard deviation as slightly worse than that of the MLR-FA model and the LR-FA model.

To further compare the ability of the four models to predict the compressive strength of geopolymer concrete in the training set and the test set, a Monte Carlo simulation was used to calculate the RMSE and R values of the four models. As can be seen from Figure 5, the RF model after FA hyperparameter tuning had the highest R value, indicating that it has the best prediction ability. The MLR model had the lowest R value and the largest deviation in prediction effect, consistent with the effect shown in Figure 9. The MLR model had the lowest R value in the training set, but the R value increased rather than decreased in the test set, indicating that the MLR model may have overfitted in the training set or the model may be too complex to solve the problem. After re-evaluation and comparison, the RF-FA model was still the best in predicting the compressive strength of geopolymer concrete among the four machine learning models.

This study also compared the optimized models to the tree-based models proposed by previous studies. Table 5 gives the comparison results. As can be seen from the figure, compared with traditional single machine learning models (e.g., DT and RF), the evolutionary algorithm proposed in this study has a higher accuracy. This comes from the optimization of the hyperparameter tuning process in the proposed model in this study, indicating the effectiveness of the proposed modified FA to tune the hyperparameters for the compressive strength prediction of geopolymer concrete.

4.5. Importance Analysis and Sensitivity Analysis

Figure 10 shows the importance scores of each influencing variable of the compressive strength of geopolymer concrete. In order to effectively reflect the contribution degree of each feature in the prediction model to the prediction result, the importance index was used as the evaluation index. This index helps us to identify which features have the most significant effect on predicting the compressive strength of geopolymer concrete. In a prediction model, a higher importance index means that the feature contributes more to the prediction result, and the model is more dependent on changes in that feature. Therefore, the features of the model can be evaluated according to the size of the importance index, and the unimportant features can be eliminated or the weights of important features can be increased. This helps us further optimize the model.

Among them, the NaOH molar concentration importance score was 4.1184, which was the highest compared to the other influencing variables. Therefore, it can be concluded that the molar concentration of NaOH has the most significant influence on the compressive strength of geopolymer concrete. This is because a high NaOH molarity solution acts as an activator in the formation of geopolymer concrete, which can effectively improve the reaction rate of gel materials in geopolymer concrete. The molar concentration of NaOH promotes the formation of a polymer skeleton in the concrete, thereby increasing its compressive strength. However, when the molar concentration of NaOH is too high, the reaction rate will be too fast, which will adversely affect the formation of concrete strength. Therefore, the NaOH molarity needs to be reasonably adjusted to meet the needs of concrete strength formation. On the other hand, NaSi₂O₃ plays a similar role as NaOH in the formation of geopolymer concrete strength, both of which act as catalysts to promote the polymerization of materials in geopolymer concrete. Its main function in the reaction process is to react with calcium ions in water to form a hydrated silicate gel, which contributes to the formation of a concrete skeleton and improves its compressive strength. High concentrations of NaSiO₃ may negatively affect the strength of concrete. In high concentrations, Na₂SiO₃ reacts with calcium hydroxide (Ca(OH)₂) in the concrete. The resulting products cause a reduction in the hardness and strength of the concrete. On the other hand, too high of a concentration of NaSiO₃ can also cause cracking and spalling of concrete. It is worth noting that the importance score of NaOH molarity is much higher than that of Na₂SiO₃ molarity, with their importance scores being 4.1184 and 1.1728, respectively. Therefore, the NaOH molarity value should be emphasized.

On the other hand, with the exception of NaOH concentration, the importance scores of the other materials were similar. GGBS and fly ash have similar functions. These two materials can fill the pores and micro-cracks in concrete, increasing the compactness and overall performance of concrete. At the same time, GGBS can also undergo secondary hydration reactions with cement hydration products to produce more hydration products, which further improves the compressive strength of the mixture. As a result, the two materials have higher and similar importance scores. The effect of fine aggregates was relatively low. Its main function is to fill the gap between materials with large particle sizes and improve the workability of the mixture. For the formation of the compressive strength of geopolymer concrete, the two materials provide similar functions, so their importance scores were similar.

Figure 11 gives the results of the SHapley Additive exPlanations (SHAP) regarding the input parameters. It represents the method of explaining model predictions that is based on the concept of Shapley values and provides a comprehensive way to explain the output of any machine learning model. Using SHAP technology based on RF expansion can help us better understand and interpret the prediction results of an RF model, so as to better apply the model for classification and regression problems. Figure 11a presents the force plot of the input parameters. A force plot can target the interpretation of a single sample prediction; it visualizes the SHAP values as a force, and each eigenvalue is a force that increases or decreases the prediction. The prediction starts from the base value (base value), which is the constant that interprets the model, and each attribution value is an arrow to characterize increased (positive) or decreased (negative) predictions. As can be seen, gravel 4/10 and GGBS are the characteristics that cause the predicted compressive strength to increase, and their length indicates their degree of influence. The greatest impact was from gravel 4/10mm = 0, but GGBS = 400 had a significant impact on the prediction. The characteristics that cause a predicted reduction in compressive strength were the water/solid ratio and fine aggregate.

Figure 11b presents the SHAP values for the input parameters. It can be seen that GGBS and NaOH molarity are the two most important parameters for the prediction of geopolymer concrete compressive strength. This result is close to that of the importance analysis in this sub-section. Figure 11c gives the beeswarm plot of these input values. It can be seen from the figure that most of the data of fine aggregate were close to the SHAP value of 0, indicating that the data of fine aggregate were mostly concentrated in the region that has little influence on the prediction of compressive strength; for GGBS and NaOH molarity, a significant portion of their data may have an extremely significant impact on the predicted compressive strength. This conclusion is in line with previous studies where the effects of GGBS have been demonstrated using scanning electron microscopy and infrared spectroscopy: in the hydration process of GGBS [81,82], calcium silicate hydrate is produced, and the improvement of the properties of GGBS and fly ash polymers comes from the joint action of calcium silicate hydrate and calcium aluminosilicate hydrate [83,84]. This also verifies the analysis results in Figure 11c, which found that GGBS, NaOH molarity, and fly ash have a greater degree of influence. The role of fine aggregate in the prediction of compressive strength is weak, because fine aggregate often plays a filling role, while coarse aggregate forms a skeleton structure which affects the compressive strength of geopolymer concrete [85,86,87].

5. Conclusions

The design of landscape polymer concrete must meet more stringent requirements for the mechanical properties, so understanding and designing higher compressive strength polymer concrete is challenging. Therefore, the compressive strength is considered the main criterion. However, enhancing the prediction accuracy of compressive strength in geopolymer concrete is a challenging task, and there are numerous influencing factors. It is necessary to consider the relationship between these influencing factors and select and use a more appropriate calculation model for estimation. In this study, four hybrid learning models were proposed and their hyperparameters were tuned using the FA model, which improved the prediction accuracy for the compressive strength of geopolymer concrete. Additionally, the focus was also placed on the reliability and prediction efficiency of the four optimized ensemble learning models. By studying the opposite sides, a new perspective was put forward for establishing a prediction model for geopolymer concrete compressive strength. Based on the model constructed above, we can use the input data to deeply explore the specific effects of various influencing factors on the compressive strength of geopolymer concrete. The following is a detailed analysis of the results.

In the experiment, the LR-FA model was used as the benchmark in the 10-fold cross-validation. The RF-FA model had the lowest RMSE value, which was very close to 0. Therefore, it was proven that the RF-FA model has the most accurate prediction effect and can be employed to predict the compressive strength of the geopolymer concrete applied in the design of green landscapes.
Through the hyperparameter adjustment by the AF algorithm, four kinds of comprehensive learning models were formed, and the different models had different changing trends. In the early training stage of the DT and RF models, in the initial stage of model iteration, the RMSE value obtained by the models decreased rapidly with the increase in the number of iterations and after a certain number of iterations, it plateaued. This indicates that the model is locally optimal, or that further iterations bring little performance improvement. The RMSE values of the MLR and LR models did not decrease with the increase in the number of operations, and these two models were not affected by the traditional convergence algorithm, so their RMSE values were represented by a straight line on the graph. Therefore, it was proven that the FA algorithm has a better optimization effect on the DT and RF models, and can effectively improve the prediction performance of the models.
DT-FA and RF-FA had better prediction effects. From the graph, the forecast point was very close to the “1:1” curve. In terms of data, the RMSE values of the DT-FA model training set and test set were 7.0062 and 10.2038, respectively. The RMSE values of the RF-FA model training set and test set were 4.0364 and 8.7202, respectively. The RMSE value shows that the prediction error of the two models is small and the prediction effect is good.
After analysis, we found that the molar concentration of NaOH was the most influential factor on its own. At the same time, according to the verification of the importance index, the molar concentration of NaOH had the most significant influence on the model’s prediction of the compressive strength of geopolymer concrete. Therefore, in the future design of landscape concrete buildings, designers should focus on the molar concentration of NaOH.

Overall, the findings of this study not only contribute to the advancement of concrete compressive strength prediction models, but the developed model provides an effective tool for future concrete design problems in landscape architecture. Therefore, this article has practical significance for the development of landscape architecture. For future development, the graphical user interface (GUI) for the optimized model should be developed for practical engineering. Also, different optimization algorithms may improve the model prediction effect differently. In this study, the main research goal was to compare the prediction effect of tree-based and regression-based models. Therefore, FA was a fixed optimization algorithm in this study. In future research, we will include different optimization algorithms in the comparison to create more comprehensive predictive models. Moreover, a next-generation hyperparameter optimization framework (e.g., Optuna [88,89,90,91]) can be considered for the compressive strength modeling of geopolymer concrete, which will be of great significance for the strength prediction of geopolymer concrete and more reliable material design in the future.

Author Contributions

Conceptualization, R.W. and J.Z.; Methodology, J.H.; Software, R.W., S.R. and J.H.; Validation, J.H.; Resources, J.Z.; Data curation, R.W., J.Z., Y.L. and S.R.; Writing—original draft, R.W., Y.L. and J.H.; Writing—review & editing, J.Z., Y.L., S.R. and J.H.; Supervision, S.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Guangdong Provincial Department of Education Innovative Strong School Youth Innovative Talent Project (Social Science) (funding number: 2022WQNCX055) and China Postdoctoral Science Foundation (funding number: 2022M720878).

Data Availability Statement

The data are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Abbreviations

LR	Logistic Regression
MLR	Mixed Logistic Regression
DT	Decision Tree
RF	Random Forest
AF	Firefly Algorithm
GGBS	Ground granulated blast-furnace slag
ANN	Artificial neural network
MNLR	Multiple nonlinear regression
GEP	Genetic programming
RMSE	Root Mean Square Error
R	Correlation coefficient

References

Garces, J.I.T.; Beltran, A.B.; Tan, R.R.; Ongpeng, J.M.C.; Promentilla, M.A.B. Carbon footprint of self-healing geopolymer concrete with variable mix model. Clean. Chem. Eng. 2022, 2, 100027. [Google Scholar] [CrossRef]
Le, H.-B.; Bui, Q.-B. Predicting the compressive strength of geopolymer concrete: An empirical model for both recycled and natural aggregates. In CIGOS 2021, Emerging Technologies and Applications for Green Infrastructure: Proceedings of the 6th International Conference on Geotechnics, Civil Engineering and Structures; Springer: Berlin/Heidelberg, Germany, 2022; pp. 793–802. [Google Scholar]
Chen, S.; Zhou, M.; Shi, X.; Huang, J. A novel mbas-rf approach to predict mechanical properties of geopolymer-based compositions. Gels 2023, 9, 434. [Google Scholar] [CrossRef]
Wang, R.; Zhang, J.; Lu, Y.; Huang, J. Towards designing durable sculptural elements: Ensemble learning in predicting compressive strength of fiber-reinforced nano-silica modified concrete. Buildings 2024, 14, 396. [Google Scholar] [CrossRef]
Kishore, Y.; Nadimpalli, S.G.D.; Potnuru, A.K.; Vemuri, J.; Khan, M.A. Statistical analysis of sustainable geopolymer concrete. Mater. Today Proc. 2022, 61, 212–223. [Google Scholar] [CrossRef]
Zhu, F.; Wu, X.; Lu, Y.; Huang, J. Strength reduction due to acid attack in cement mortar containing waste eggshell and glass: A machine learning-based modeling study. Buildings 2024, 14, 225. [Google Scholar] [CrossRef]
Mohseni, E. Assessment of Na₂SiO₃ to naoh ratio impact on the performance of polypropylene fiber-reinforced geopolymer composites. Constr. Build. Mater. 2018, 186, 904–911. [Google Scholar] [CrossRef]
Mehta, A.; Siddique, R. Sustainable geopolymer concrete using ground granulated blast furnace slag and rice husk ash: Strength and permeability properties. J. Clean. Prod. 2018, 205, 49–57. [Google Scholar] [CrossRef]
Lloyd, N.; Rangan, V. Geopolymer concrete with fly ash. In Proceedings of the Second International Conference on Sustainable Construction Materials and Technologies, Ancona, Italy, 28–30 June 2010; pp. 1493–1504. [Google Scholar]
Rahmati, M.; Toufigh, V. Evaluation of geopolymer concrete at high temperatures: An experimental study using machine learning. J. Clean. Prod. 2022, 372, 133608. [Google Scholar] [CrossRef]
Zhou, J.; Su, Z.; Hosseini, S.; Tian, Q.; Lu, Y.; Luo, H.; Xu, X.; Chen, C.; Huang, J. Decision tree models for the estimation of geo-polymer concrete compressive strength. Math. Biosci. Eng. 2024, 21, 1413–1444. [Google Scholar] [CrossRef]
Zhu, F.; Wu, X.; Lu, Y.; Huang, J. Strength estimation and feature interaction of carbon nanotubes-modified concrete using artificial intelligence-based boosting ensembles. Buildings 2024, 14, 134. [Google Scholar] [CrossRef]
Pape, T.; Dickson, J. S19 Geopolymer Concrete Performance Review; NACOE: Fortitude Valley, QLD, Australia, 2016. [Google Scholar]
Patel, M.J.; Patel, A.D. Effect of cupola slag as a partial replacement of coarse aggregate on mechanical properties of geopolymer. GRD J. Eng. 2021, 6, 7–11. [Google Scholar]
Blasiak, G. Investigating Liquid-to-Solid and Na₂SiO₃-to-Naoh Ratios in Geopolymer Concrete for Artificial Reef Construction; Murdoch University: Perth, WA, Australia, 2022. [Google Scholar]
Gupta, T.; Rao, M.C. Prediction of compressive strength of geopolymer concrete using machine learning techniques. Struct. Concr. 2022, 23, 3073–3090. [Google Scholar] [CrossRef]
Cao, V.D.; Pilehvar, S.; Salas-Bringas, C.; Szczotok, A.M.; Bui, T.Q.; Carmona, M.; Rodriguez, J.F.; Kjøniksen, A.-L. Thermal performance and numerical simulation of geopolymer concrete containing different types of thermoregulating materials for passive building applications. Energy Build. 2018, 173, 678–688. [Google Scholar] [CrossRef]
Lavanya, G.; Jegan, J. Evaluation of relationship between split tensile strength and compressive strength for geopolymer concrete of varying grades and molarity. Int. J. Appl. Eng. Res. 2015, 10, 35523–35527. [Google Scholar]
Sudhir, M.; Chen, S.; Rai, S.; Jain, D. An empirical model for geopolymer reactions involving fly ash and ggbs. Adv. Mater. Sci. Eng. 2022, 2022, 8801294. [Google Scholar]
Özbayrak, A.; Kucukgoncu, H.; Atas, O.; Aslanbay, H.H.; Aslanbay, Y.G.; Altun, F. Determination of stress-strain relationship based on alkali activator ratios in geopolymer concretes and development of empirical formulations. Structures 2023, 48, 2048–2061. [Google Scholar] [CrossRef]
Jonbi, J.; Fulazzaky, M.A. Modeling the water absorption and compressive strength of geopolymer paving block: An empirical approach. Measurement 2020, 158, 107695. [Google Scholar] [CrossRef]
Rai, B.; Roy, L.; Rajjak, M. A statistical investigation of different parameters influencing compressive strength of fly ash induced geopolymer concrete. Struct. Concr. 2018, 19, 1268–1279. [Google Scholar] [CrossRef]
Bellum, R.R.; Muniraj, K.; Madduru, S.R.C. Empirical relationships on mechanical properties of class-f fly ash and ggbs based geopolymer concrete. Ann. Chim.–Sci. Matér. 2019, 43, 189–197. [Google Scholar] [CrossRef]
Dolamary, P.Y.; Dilshad, J.; Arbili, M.M.; Karpuzcu, M. Validation of feret regression model for fly ash based geopolymer concrete. Polytech. J. 2018, 8, 173–189. [Google Scholar]
Ali, A.A.; Al-Attar, T.S.; Abbas, W.A. A statistical model to predict the strength development of geopolymer concrete based on SiO2/Al2O3 ratio variation. Civ. Eng. J. 2022, 8, 454–471. [Google Scholar] [CrossRef]
Veerapandian, V.; Pandulu, G.; Jayaseelan, R.; Sathish Kumar, V.; Murali, G.; Vatin, N.I. Numerical modelling of geopolymer concrete in-filled fibre-reinforced polymer composite columns subjected to axial compression loading. Materials 2022, 15, 3390. [Google Scholar] [CrossRef]
Chen, C.; Zhang, X.; Hao, H.; Cui, J. Discussion on the suitability of dynamic constitutive models for prediction of geopolymer concrete structural responses under blast and impact loading. Int. J. Impact Eng. 2022, 160, 104064. [Google Scholar] [CrossRef]
Zhang, P.; Gao, Z.; Wang, J.; Wang, K. Numerical modeling of rebar-matrix bond behaviors of nano-SiO₂ and pva fiber reinforced geopolymer composites. Ceram. Int. 2021, 47, 11727–11737. [Google Scholar] [CrossRef]
Meng, Q.; Wu, C.; Su, Y.; Li, J.; Liu, J.; Pang, J. Experimental and numerical investigation of blast resistant capacity of high performance geopolymer concrete panels. Compos. Part B Eng. 2019, 171, 9–19. [Google Scholar] [CrossRef]
Colangelo, F.; De Luca, G.; Ferone, C.; Mauro, A. Experimental and numerical analysis of thermal and hygrometric characteristics of building structures employing recycled plastic aggregates and geopolymer concrete. Energies 2013, 6, 6077–6101. [Google Scholar] [CrossRef]
Ahmad, A.; Ahmad, W.; Chaiyasarn, K.; Ostrowski, K.A.; Aslam, F.; Zajdel, P.; Joyklad, P. Prediction of geopolymer concrete compressive strength using novel machine learning algorithms. Polymers 2021, 13, 3389. [Google Scholar] [CrossRef] [PubMed]
Ahmed, H.U.; Mohammed, A.S.; Qaidi, S.M.A.; Faraj, R.H.; Hamah Sor, N.; Mohammed, A.A. Compressive strength of geopolymer concrete composites: A systematic comprehensive review, analysis and modeling. Eur. J. Environ. Civ. Eng. 2023, 27, 1383–1428. [Google Scholar] [CrossRef]
Nguyen, M.H.; Mai, H.-V.T.; Trinh, S.H.; Ly, H.-B. A comparative assessment of tree-based predictive models to estimate geopolymer concrete compressive strength. Neural Comput. Appl. 2023, 35, 6569–6588. [Google Scholar] [CrossRef]
Verma, M. Prediction of compressive strength of geopolymer concrete using random forest machine and deep learning. Asian J. Civ. Eng. 2023, 24, 2659–2668. [Google Scholar] [CrossRef]
Bhogayata, A.; Kakadiya, S.; Makwana, R. Neural network for mixture design optimization of geopolymer concrete. ACI Mater. J. 2021, 118, 91–96. [Google Scholar]
Rahman, S.K.; Al-Ameri, R. Experimental investigation and artificial neural network based prediction of bond strength in self-compacting geopolymer concrete reinforced with basalt frp bars. Appl. Sci. 2021, 11, 4889. [Google Scholar] [CrossRef]
Rahman, S.K.; Al-Ameri, R. Experimental and artificial neural network-based study on the sorptivity characteristics of geopolymer concrete with recycled cementitious materials and basalt fibres. Recycling 2022, 7, 55. [Google Scholar] [CrossRef]
Sharma, U.; Gupta, N.; Verma, M. Prediction of the compressive strength of flyash and ggbs incorporated geopolymer concrete using artificial neural network. Asian J. Civ. Eng. 2023, 24, 2837–2850. [Google Scholar] [CrossRef]
Choudhary, R.; Gianey, H.K. Comprehensive review on supervised machine learning algorithms. In Proceedings of the 2017 International Conference on Machine Learning and Data Science (MLDS), Noida, India, 14–15 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 37–43. [Google Scholar]
Grazzi, R.; Franceschi, L.; Pontil, M.; Salzo, S. On the iteration complexity of hypergradient computation. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 12–18 July 2020; pp. 3748–3758. [Google Scholar]
Huang, J.; Zhou, M.; Yuan, H.; Sabri, M.M.S.; Li, X. Prediction of the compressive strength for cement-based materials with metakaolin based on the hybrid machine learning method. Materials 2022, 15, 3500. [Google Scholar] [CrossRef] [PubMed]
Huang, J.; Zhou, M.; Yuan, H.; Sabri, M.M.S.; Li, X. Towards sustainable construction materials: A comparative study of prediction models for green concrete with metakaolin. Buildings 2022, 12, 772. [Google Scholar] [CrossRef]
Huang, J.; Zhou, M.; Sabri, M.M.S.; Yuan, H. A novel neural computing model applied to estimate the dynamic modulus (dm) of asphalt mixtures by the improved beetle antennae search. Sustainability 2022, 14, 5938. [Google Scholar] [CrossRef]
Huang, J.; Zhang, J.; Li, X.; Qiao, Y.; Zhang, R.; Kumar, G.S. Investigating the effects of ensemble and weight optimization approaches on neural networks’ performance to estimate the dynamic modulus of asphalt concrete. Road Mater. Pavement Des. 2022, 24, 1939–1959. [Google Scholar] [CrossRef]
Zhu, F.; Wu, X.; Zhou, M.; Sabri, M.M.S.; Huang, J. Intelligent design of building materials: Development of an ai-based method for cement-slag concrete design. Materials 2022, 15, 3833. [Google Scholar] [CrossRef]
Shi, X.; Chen, S.; Wang, Q.; Lu, Y.; Ren, S.; Huang, J. Mechanical Framework for Geopolymer Gels Construction: An Optimized LSTM Technique to Predict Compressive Strength of Fly Ash-Based Geopolymer Gels Concrete. Gels 2024, 10, 148. [Google Scholar] [CrossRef]
Huang, J.; Zhou, M.; Zhang, J.; Ren, J.; Vatin, N.I.; Sabri, M.M.S. The use of ga and pso in evaluating the shear strength of steel fiber reinforced concrete beams. KSCE J. Civ. Eng. 2022, 26, 3918–3931. [Google Scholar] [CrossRef]
Huang, J.; Zhou, M.; Zhang, J.; Ren, J.; Vatin, N.I.; Sabri, M.M.S. Development of a new stacking model to evaluate the strength parameters of concrete samples in laboratory. Iran. J. Sci. Technol. Trans. Civ. Eng. 2022, 46, 4355–4370. [Google Scholar] [CrossRef]
Awoyera, P.O.; Kirgiz, M.S.; Viloria, A.; Ovallos-Gazabon, D. Estimating strength properties of geopolymer self-compacting concrete using machine learning techniques. J. Mater. Res. Technol. 2020, 9, 9016–9028. [Google Scholar] [CrossRef]
Tanyildizi, H. Predicting the geopolymerization process of fly ash-based geopolymer using deep long short-term memory and machine learning. Cem. Concr. Compos. 2021, 123, 104177. [Google Scholar] [CrossRef]
Chai, T.; Draxler, R.R. Root mean square error (rmse) or mean absolute error (mae)?—Arguments against avoiding rmse in the literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
Ghosh, A.; Ransinchung, G.D. Application of machine learning algorithm to assess the efficacy of varying industrial wastes and curing methods on strength development of geopolymer concrete. Constr. Build. Mater. 2022, 341, 127828. [Google Scholar] [CrossRef]
Paruthi, S.; Rahman, I.; Husain, A. Comparative studies of different machine learning algorithms in predicting the compressive strength of geopolymer concrete. Comput. Concr. 2023, 32, 607–613. [Google Scholar]
Nguyen, K.T.; Nguyen, Q.D.; Le, T.A.; Shin, J.; Lee, K. Analyzing the compressive strength of green fly ash based geopolymer concrete using experiment and machine learning approaches. Constr. Build. Mater. 2020, 247, 118581. [Google Scholar] [CrossRef]
Ahmed, H.U.; Mohammed, A.S.; Faraj, R.H.; Abdalla, A.A.; Qaidi, S.M.A.; Sor, N.H.; Mohammed, A.A. Innovative modeling techniques including mep, ann and fq to forecast the compressive strength of geopolymer concrete modified with nanoparticles. Neural Comput. Appl. 2023, 35, 12453–12479. [Google Scholar] [CrossRef]
Huang, J.; Xue, J. Optimization of svr functions for flyrock evaluation in mine blasting operations. Environ. Earth Sci. 2022, 81, 434. [Google Scholar] [CrossRef]
Huang, J.; Zhang, J.; Gao, Y. Evaluating the clogging behavior of pervious concrete (pc) using the machine learning techniques. CMES-Comput. Model. Eng. Sci. 2022, 130, 805–821. [Google Scholar] [CrossRef]
Huang, J.; Sabri, M.M.S.; Ulrikh, D.V.; Ahmad, M.; Alsaffar, K.A.M. Predicting the compressive strength of the cement-fly ash–slag ternary concrete using the firefly algorithm (fa) and random forest (rf) hybrid machine-learning method. Materials 2022, 15, 4193. [Google Scholar] [CrossRef]
Huang, J.; Kumar, G.S.; Ren, J.; Zhang, J.; Sun, Y. Accurately predicting dynamic modulus of asphalt mixtures in low-temperature regions using hybrid artificial intelligence model. Constr. Build. Mater. 2021, 297, 123655. [Google Scholar] [CrossRef]
Ahmed, H.U.; Mohammed, A.A.; Rafiq, S.; Mohammed, A.S.; Mosavi, A.; Sor, N.H.; Qaidi, S. Compressive strength of sustainable geopolymer concrete composites: A state-of-the-art review. Sustainability 2021, 13, 13502. [Google Scholar] [CrossRef]
Zhang, J.; Wang, R.; Lu, Y.; Huang, J. Prediction of Compressive Strength of Geopolymer Concrete Landscape Design: Application of the Novel Hybrid RF–GWO–XGBoost Algorithm. Buildings 2024, 14, 591. [Google Scholar] [CrossRef]
Zou, Y.; Zheng, C.; Alzahrani, A.M.; Ahmad, W.; Ahmad, A.; Mohamed, A.M.; Khallaf, R.; Elattar, S. Evaluation of artificial intelligence methods to estimate the compressive strength of geopolymers. Gels 2022, 8, 271. [Google Scholar] [CrossRef] [PubMed]
Ji, Z.; Zhou, M.; Wang, Q.; Huang, J. Predicting the international roughness index of jpcp and crcp rigid pavement: A random forest (rf) model hybridized with modified beetle antennae search (mbas) for higher accuracy. Comput. Model. Eng. Sci. 2024, 139, 1557–1582. [Google Scholar] [CrossRef]
Song, Y.-Y.; Ying, L. Decision tree methods: Applications for classification and prediction. Shanghai Arch. Psychiatry 2015, 27, 130. [Google Scholar] [PubMed]
Myles, A.J.; Feudale, R.N.; Liu, Y.; Woody, N.A.; Brown, S.D. An introduction to decision tree modeling. J. Chemom. A J. Chemom. Soc. 2004, 18, 275–285. [Google Scholar] [CrossRef]
Wang, Q.; Cheng, T.; Lu, Y.; Liu, H.; Zhang, R.; Huang, J. Underground Mine Safety and Health: A Hybrid MEREC–CoCoSo System for the Selection of Best Sensor. Sensors 2024, 24, 1285. [Google Scholar] [CrossRef] [PubMed]
Cutler, A.; Cutler, D.R.; Stevens, J.R. Random forests. In Ensemble Machine Learning, Methods and Applications; Springer: Berlin/Heidelberg, Germany, 2012; pp. 157–175. [Google Scholar] [CrossRef]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Speiser, J.L.; Miller, M.E.; Tooze, J.; Ip, E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 2019, 134, 93–101. [Google Scholar] [CrossRef] [PubMed]
Kasza, J.; Wolfe, R. Interpretation of commonly used statistical regression models. Respirology 2014, 19, 14–21. [Google Scholar] [CrossRef]
Wang, P.; Puterman, M.L. Mixed logistic regression models. J. Agric. Biol. Environ. Stat. 1998, 3, 175–200. [Google Scholar] [CrossRef]
Menard, S. Coefficients of determination for multiple logistic regression analysis. Am. Stat. 2000, 54, 17–24. [Google Scholar]
Hosmer, D.W.; Lemesbow, S. Goodness of fit tests for the multiple logistic regression model. Commun. Stat.-Theory Methods 1980, 9, 1043–1069. [Google Scholar] [CrossRef]
Wang, Q.; Yu, S.; Qi, X.; Hu, Y.; Zheng, W.; Shi, J.; Yao, H. Overview of logistic regression model analysis and application. Zhonghua Yu Fang Yi Xue Za Zhi-Chin. J. Prev. Med. 2019, 53, 955–960. [Google Scholar] [PubMed]
Lee, J. Covariance adjustment of rates based on the multiple logistic regression model. J. Chronic Dis. 1981, 34, 415–426. [Google Scholar] [CrossRef] [PubMed]
Malhotra, R.; Meena, S. Empirical validation of cross-version and 10-fold cross-validation for defect prediction. In Proceedings of the 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 4–6 August 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 431–438. [Google Scholar]
Lee, D.K.; In, J.; Lee, S. Standard deviation and standard error of the mean. Korean J. Anesthesiol. 2015, 68, 220–223. [Google Scholar] [CrossRef]
Benesty, J.; Chen, J.; Huang, Y. On the importance of the pearson correlation coefficient in noise reduction. IEEE Trans. Audio Speech Lang. Process. 2008, 16, 757–765. [Google Scholar] [CrossRef]
Wakjira, T.G.; Ibrahim, M.; Ebead, U.; Alam, M.S. Explainable machine learning model and reliability analysis for flexural capacity prediction of rc beams strengthened in flexure with frcm. Eng. Struct. 2022, 255, 113903. [Google Scholar] [CrossRef]
Nazar, S.; Yang, J.; Amin, M.N.; Khan, K.; Ashraf, M.; Aslam, F.; Javed, M.F.; Eldin, S.M. Machine learning interpretable-prediction models to evaluate the slump and strength of fly ash-based geopolymer. J. Mater. Res. Technol. 2023, 24, 100–124. [Google Scholar] [CrossRef]
Bouaissi, A.; Li, L.-y.; Abdullah, M.M.A.B.; Bui, Q.-B. Mechanical properties and microstructure analysis of fa-ggbs-hmns based geopolymer concrete. Constr. Build. Mater. 2019, 210, 198–209. [Google Scholar] [CrossRef]
Nagajothi, S.; Elavenil, S. Effect of ggbs addition on reactivity and microstructure properties of ambient cured fly ash based geopolymer concrete. Silicon 2021, 13, 507–516. [Google Scholar] [CrossRef]
Rajini, B.; Rao, A.N.; Sashidhar, C. Micro-level studies of fly ash and ggbs–based geopolymer concrete using fourier transform infra-red. Mater. Today Proc. 2021, 46, 586–589. [Google Scholar] [CrossRef]
Revathi, T.; Jeyalakshmi, R. Fly ash–ggbs geopolymer in boron environment: A study on rheology and microstructure by atr ft-ir and mas nmr. Constr. Build. Mater. 2021, 267, 120965. [Google Scholar] [CrossRef]
Abdullahi, M. Effect of aggregate type on compressive strength of concrete. Int. J. Civ. Struct. Eng. 2012, 2, 791–800. [Google Scholar] [CrossRef]
Yu, F.; Sun, D.; Wang, J.; Hu, M. Influence of aggregate size on compressive strength of pervious concrete. Constr. Build. Mater. 2019, 209, 463–475. [Google Scholar] [CrossRef]
Bogas, J.A.; Gomes, A. Compressive behavior and failure modes of structural lightweight aggregate concrete–characterization and strength prediction. Mater. Des. (1980–2015) 2013, 46, 832–841. [Google Scholar] [CrossRef]
Wakjira, T.G.; Alam, M.S. Peak and ultimate stress-strain model of confined ultra-high-performance concrete (uhpc) using hybrid machine learning model with conditional tabular generative adversarial network. Appl. Soft Comput. 2024, 154, 111353. [Google Scholar] [CrossRef]
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar]
Srinivas, P.; Katarya, R. Hyoptxg: Optuna hyper-parameter optimization framework for predicting cardiovascular disease using xgboost. Biomed. Signal Process. Control 2022, 73, 103456. [Google Scholar] [CrossRef]
Lai, J.-P.; Lin, Y.-L.; Lin, H.-C.; Shih, C.-Y.; Wang, Y.-P.; Pai, P.-F. Tree-based machine learning models with optuna in predicting impedance values for circuit analysis. Micromachines 2023, 14, 265. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Flow diagram.

Figure 2. Correlation coefficient of the input parameters.

Figure 3. Ten-fold cross-validation results.

Figure 4. RMSE values for DT-IBAS, RF-IBAS, and KNN-IBAS models.

Figure 5. Predicted and actual compressive strength of the geopolymer concrete from the developed models. (a) LR-FA (training set RMSE = 15.4094; training set R = 0.5321; test set RMSE = 16.4778; test set R = 0.598); (b) MLR-FA (training set RMSE = 14.8172; training set R = 0.5207; test set RMSE = 15.4861; test set R = 0.5858); (c) DT-FA (training set RMSE = 7.0026; training set R = 0.915; test set RMSE = 10.2038; test set R = 0.842); (d) RF-FA (training set RMSE = 4.0364; training set R = 0.9774; test set RMSE = 8.7202; test set R = 0.8915).

Figure 6. Comparison of training set and test set results.

Figure 7. Analysis of the parameters of four models.

Figure 8. Comparison from the perspectives of the correlation coefficient, standard deviation, and RMSE values.

Figure 9. RMSE comparison of training set and test set results.

Figure 10. Importance analysis.

Figure 11. SHAP analysis.

Table 1. Commonly used tools for predicting the compressive strength of geopolymer concrete.

Prediction Methods	Detailed Approach	Features
Empirical models [1,2,19]	An approximate compressive strength estimate is obtained based on statistical data and empirical rules [19,20,21].	These models can only be used to solve specific problems [19,22,23].
Statistical models [22,23,24]	At present, the commonly used statistical methods for predicting the compressive strength of polymer concrete include regression analysis, artificial neural network, support vector machine, and so on [22,24].	These models can take more factors into account to predict the numerical range of the compressive strength more fully. However, it needs a lot of experimental data to support it [5,18,25].
Physical models and numerical simulations [17,26,27,28]	The model focuses on physical models and numerical simulations, and establishes a prediction model through mechanical relationships and a corresponding calculation method [28,29,30].	The model can consider more influencing factors, but it requires experimental data and model validation of corresponding factors, which requires more time and resources [30].
Machine learning algorithms [10,31,32]	Using machine learning algorithms, such as decision trees [33], random forests [34], neural networks [35], etc., the training of this model requires the input of relevant data of influencing factors and the characteristic data of known materials. Through the input of a large number of data sets, the model can predict the compressive strength of geopolymer concrete [36,37].	This method can find the relationship between potential influencing factors from a large number of relevant data, so as to build a prediction model with a high prediction accuracy [16,38].

Table 2. Determination of the input variables.

Design Consideration	Input Variables
Cementing materials	Fly ash [23,61] and ground granulated blast-furnace slag (GGBS) content [8,61]
Activator design parameters	Na₂SiO₃ [7] and NaOH [7] content; molarity of NaOH [7]
Concrete design parameters	of the fine aggregate, gravel (4/10 mm), and gravel (10/20 mm) content and water/solid ratio [18,19,36]

Table 3. The range of the input variable.

Parameter	Fly Ash (kg/m³)	GGBS (kg/m³)	NazSiO₃ (kg/m³)	NaOH (kg/m³)	Fine Aggregate (kg/m³)	Gravel 4/10 mm (kg/m³)	Gravel 10/20 mm (kg/m³)	Water/Solid Ratio	NaOH Molarity
Mean	174.34	225.15	111.66	53.74	729.88	288.39	737.37	0.34	8.14
Mode	0	0	108	64	651	0	0	0.53	10
Median	120	300	108	56	728	208	789	0.34	9.2
Standard Deviation	167.95	162.27	48.15	31.91	130.97	372.31	358.55	0.11	4.56
Maximum	523	450	342	147	1360	1293.4	1298	0.63	20
Minimum	0	0	18	3.5	459	0	0	0	1

Table 4. Machine learning models’ prediction outcomes.

Machine Learning Model	DT-FA	DT-FA	LR-FA	MLR-FA
Training Set RMSE	7.0026	4.0364	15.4094	14.8172
Test Set RMSE	10.2038	8.7202	16.4778	15.4861
Training Set R	0.915	0.9774	0.5321	0.5207
Test Set R	0.842	0.8915	0.598	0.5858

Table 5. Comparison of the optimized model with previous tree-based models.

Machine Learning Models	Input Parameters	Source	Correlation Coefficient
RF-FA model in present study	371 datasets with 9 input parameters	present study	0.952
DT-FA model in present study	371 datasets with 9 input parameters	present study	0.893
DT model	110 datasets with 14 input parameters	[33]	0.827
RF model	110 datasets with 14 input parameters	[33]	0.870
Deep learning model	61 datasets with 11 input parameters	[34]	0.725
RF model	61 datasets with 11 input parameters	[34]	0.932
GEP model	245 datasets with 17 input parameters	[80]	0.913
ANFIS model	245 datasets with 17 input parameters	[80]	0.941
ANN model	245 datasets with 17 input parameters	[80]	0.948

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, R.; Zhang, J.; Lu, Y.; Ren, S.; Huang, J. Towards a Reliable Design of Geopolymer Concrete for Green Landscapes: A Comparative Study of Tree-Based and Regression-Based Models. Buildings 2024, 14, 615. https://doi.org/10.3390/buildings14030615

AMA Style

Wang R, Zhang J, Lu Y, Ren S, Huang J. Towards a Reliable Design of Geopolymer Concrete for Green Landscapes: A Comparative Study of Tree-Based and Regression-Based Models. Buildings. 2024; 14(3):615. https://doi.org/10.3390/buildings14030615

Chicago/Turabian Style

Wang, Ranran, Jun Zhang, Yijun Lu, Shisong Ren, and Jiandong Huang. 2024. "Towards a Reliable Design of Geopolymer Concrete for Green Landscapes: A Comparative Study of Tree-Based and Regression-Based Models" Buildings 14, no. 3: 615. https://doi.org/10.3390/buildings14030615

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards a Reliable Design of Geopolymer Concrete for Green Landscapes: A Comparative Study of Tree-Based and Regression-Based Models

Abstract

1. Introduction

2. Research Overview

3. Methodology

3.1. Determination of the Input Variables

3.2. Data Pre-Processing

3.3. Data Collection and Analysis

3.4. Description of ML Models

3.4.1. Single Models

3.4.2. Ensemble Models

3.5. Evaluation of Predictive Performance

4. Analysis of Results

4.1. Results of the 10-Fold Cross-Validation

4.2. RMSE Values for Increasing Iteration Times

4.3. Predictive Results of the DT-FA, RF-FA, LR-FA, and MLR-FA Models

4.4. Models Comparison

4.5. Importance Analysis and Sensitivity Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI