Next Article in Journal
Agricultural Wastes as Renewable Biomass to Remediate Water Pollution
Previous Article in Journal
Research on Consumer Trust Mechanism in China’s B2C E-Commerce Platform for Second-Hand Cars
Previous Article in Special Issue
Low-Fouling Plate-and-Frame Ultrafiltration for Juice Clarification: Part 2—Module Design and Application
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Using Machine Learning to Predict the Performance of a Cross-Flow Ultrafiltration Membrane in Xylose Reductase Separation

1
Department of Civil and Environmental Engineering, Faculty of Engineering, Prince of Songkla University, Hat Yai Campus, Songkhla 90110, Thailand
2
Faculty of Civil Engineering Technology, University of Malaysia Pahang, Lebuhraya Tun Razak, Gambang 26300, Malaysia
3
PSU Energy Systems Research Institute, Research and Development Office, Prince of Songkla University, Songkhla 90110, Thailand
*
Author to whom correspondence should be addressed.
Sustainability 2023, 15(5), 4245; https://doi.org/10.3390/su15054245
Submission received: 17 December 2022 / Revised: 19 February 2023 / Accepted: 22 February 2023 / Published: 27 February 2023
(This article belongs to the Special Issue Sustainable Food Waste Valorisation by Membrane Technology)

Abstract

:
This study provides a new perspective for xylose reductase enzyme separation from the reaction mixtures—obtained in the production of xylitol—by means of machine learning technique for large-scale production. Two types of machine learning models, including an adaptive neuro-fuzzy inference system based on grid partitioning of the input space and a boosted regression tree were developed, validated, and tested. The models’ inputs were cross-flow velocity, transmembrane pressure, and filtration time, whereas the membrane permeability (called membrane flux) and xylitol concentration were considered as the outputs. According to the results, the boosted regression tree model demonstrated the highest predictive performance in forecasting the membrane flux and the amount of xylitol produced with a coefficient of determination of 0.994 and 0.967, respectively, against 0.985 and 0.946 for the grid partitioning-based adaptive neuro-fuzzy inference system, 0.865 and 0.820 for the best nonlinear regression picked from among 143 different equations, and 0.815 and 0.752 for the linear regression. The boosted regression tree modeling approach demonstrated a superior capability of predictive accuracy of the critical separation performances in the enzymatic-based cross-flow ultrafiltration membrane for xylitol synthesis.

1. Introduction

Xylose reductase (XR) is a member of the aldose reductase or aldehyde reductase (ALR) family (EC 1.1.1.21), which belongs to the aldo-keto reductase (AKR) superfamily of enzymes [1,2]. It catalyzes the reduction of xylose (found in hemicellulose hydrolysates from lignocellulosic biomass) to xylitol [3], which has enormous applications in the pharmaceutical, food, and beverage industries [4] as its global market size is expected to increase from USD 1 billion in 2022 [5] to USD 1.37 billion by 2025 [6].
XR has been reported to be found in the cytoplasm of a wide variety of microorganisms, including bacteria, molds, algae, and yeasts [5,7,8]. However, as has appeared in the literature, only yeast species have been extensively studied. Some examples of yeast XRs include Candida shehatae [9], Candida tropicalis [10,11,12], Candida guilliermondii [13,14], Pichia fermentans [6], Chaetomium themophilum [15], Spathaspora arborariae/passalidarum [16], Barnettozyma populi/california/salicaria and Cyberlindnera mrakii [7], Neurospora crassa [17], and Kluyveromyces sp. [18,19]. In general, XR prefers NADPH (reduced form of nicotinamide adenine dinucleotide phosphate as a coenzyme (donor of hydrogen atoms)) to convert xylose to xylitol. However, dual coenzyme specificity for NADPH and NADH has also been reported [8,15,16].
One of the major challenges associated with the enzymatic-based xylitol production process is downstream processing. It is referred to further process the reaction mixture for the recovery of the targeted component of interest; for example, the separation of the XR enzyme, as it is not available commercially [14], is highly desired. With respect to the fact that downstream processes are potential cost drivers [20], a proper design for the downstream process is required to offer good purification yield for the component of interest while being as simple as possible and cost-efficient to make their practical application feasible for large-scale industrial processes. In this regard, membrane filters have proven to be valuable tools because they are energy-efficient technologies, and their scale-up is relatively easy [21,22,23]. Membrane filters are categorized in accordance with the types of driving forces and the size of pores, among which the tangential-flow filtration (more often known as cross-flow (CF) filtration) with an ultrafiltration (UF) membrane has become the most attractive and promising technology [24] because it is economically viable and offers high productivity along with good purity level for the targeted product of interest. The performance of such a process is strongly influenced by several operating parameters, most notably CF velocity, transmembrane pressure (i.e., the difference between the applied and the osmotic pressure), and filtration time (referred to hereafter as CFV, TMP, and FT, respectively) [20]. Hence, the development of a model to accurately predict the performance of CF–UF membrane process is highly desired as it could help engineers and asset managers to better control the operating parameters.

Background

Numerous mathematical models describing the mechanism of solute(s) transport through the CF–UF membrane system are available in the literature. The first one—the gel layer model—was introduced by Michaels [25]; however, this model does not take into account the osmotic pressures of macromolecular solutes. Hence, such a model seems not to be appropriate for ultrafiltration of protein solutions because Vilker et al. [26] observed that the concentrated solutions of bovine serum albumin can exert osmotic pressure. Bellara and Cui [27] proposed a nonparameterized model to predict the permeate flux for protein (i.e., bovine serum albumin) solutions in CF–UF tubular membrane. The proposed model was a combination of a hydrodynamic model to support the growth of boundary layer, and Maxwell–Stefan equations [28] to describe electrostatic interactions generated through filtration of charged molecules. The model was numerically solved with (and without) taking into account viscose integrations within the concentration polarization layer, and tested against data (TMP = 40 kPa, pH of 5.4 and 7.4). When the model considered viscous integrations, an excellent correlation between the model predictions and measured data was observed. One concern associated with Bellara and Cui’s model [27] is neglecting the electro-viscous effects in order to keep the model simple. Ahmad et al. [29] developed a model to predict the volume of permeate flux, gel layer resistance, and the rejection of each solute in a multiple-solute solution using a pilot-scale CF–UF membrane, where the feed was a complex biological solution (i.e., the pretreated palm oil mill effluent containing the solutes of ammonical nitrogen, carbohydrate constituents, and crude protein). The model was constructed based on osmotic pressure-/resistance-in-series-/gel-polarization-model taking into account mass balance analysis. The authors estimated the model parameters (e.g., membrane resistance, mass transfer and back transport coefficients of the solutes, and permeating coefficient) by means of Levenberg–Marquardt and Gauss–Newton algorithms. The model predictions were in a good agreement with the experimental results. In the study conducted by Karasu et al. [30], a model was developed for separation of protein from a synthetic whey concentrate suspension in a CF–UF system by modifying the compressive yield stress model for permeate flux through a fouling layer on a dead-end UF membrane that was introduced by [31,32,33]. In Karasu et al.’s model [30], it was assumed that (i) some of the solid particles on the fouling layer surface are continuously detached by shear stress created by the influent flow, and the rate of particle removal (volume per unit time and per unit membrane area) is constant; (ii) the effect of the concentration polarization layer on the shear stress on the top part of the fouling layer is negligible; and (iii) shear stress and cake compression are functions of CF velocity and TMP, respectively. The authors used the removal rate of the fouling layer as a fitting factor, which was estimated by trial-and-error method so that the model-predicted ratios of filtration time-to-volume of permeate per unit membrane area best fitted the corresponding measured values. The results indicated that permeate flux and filtration time predicted by the proposed model correlated well with the measured data in steady-state condition. However, the model was not able to adequately fit the experimental data in the beginning of the time course of permeate. Nguyen et al. [34] modified Karasu et al.’s model [30] considering that membrane fouling (using a pore blockage model) and the growth of the cake layer (using a compressive yield stress model) take place simultaneously. Their results demonstrated that such a combined model well described the entire time course of the permeate. Kirschner et al. [35] modified Hermia’s models [36] (i.e., intermediate pore blocking and cake filtration models reported for filtration and fouling in dead-end membrane under constant pressure) to describe fouling during filtration using a polyether sulfone flat sheet UF membrane operated under constant CF filtration; intermediate pore blocking refers to a phenomenon where solid particles are allowed to either deposit onto an unobstructed membrane’s surface area or on top of an early deposited solid particle, whereas, in cake formation, solid particles fully cover the membrane’s surface area in some layers. The authors also used Field et al.’s model [37] to take into account the CF filtration term in their model. A latex bead suspension (200 ppm 0.22 μm) and a soybean oil emulsion (200 ppm) were separately utilized as foulant. According to the simulation results, below the threshold flux (i.e., the flux below which cake formation is insignificant and beyond which cake formation governs the fouling mechanism), the intermediate pore blocking model fitted the experimental values well. However, above the threshold flux, the combination of the intermediate pore blocking model and the cake filtration model produced the best fit.
Even though various mathematical models have been developed to predict the efficiency of ultrafiltration of a complex mixture, they might not be able to successfully model the permeation flux decline because of occurring a wide range of different and complicated phenomena during fouling of the UF membrane, which could be associated with (an unknown) interaction between feed compositions, the membrane itself (i.e., nature and surface properties), and operating/hydrodynamic conditions [38,39]. In addition, equations involved in the mathematical models may be complex and there may be a need for simplifications by incorporating assumptions; in such case, if the assumptions are far from reality, these could lead to an under/overestimation in the model-predicted values. Furthermore, the mathematical models describing behavior of UF processes often suffer from a number of limitations. For instance, they are not able to describe the flux-time behavior under unsteady state conditions. Moreover, a given mathematical model can be applied only for a particular feed under certain conditions [40]. For these reasons, alternative methods (much simpler and more flexible in comparison with mathematically established models, and with high prediction accuracy) for predicting UF process performance are nowadays in high demand. In this context, one approach that can be recruited is machine leaning (ML) modeling (known as an easy-to-use black-box technique). The ML models do not require detailed/theoretical knowledge of mechanisms involved in the process or the relationships between the parameters that govern it. They are constructed based on only a measured input–output dataset. Such modeling approach is a robust and powerful tool possessing high generalization ability [41].
Wei et al. [42] used a single hidden layer feedforward backpropagation neural network (FBPNN) to predict the dynamic permeate flux of cross-flow ultrafiltration of colloidal suspensions (i.e., spherical silica colloids with particle diameter of 65 nm) as a function of TMP, FT, ionic strength, and zeta potential (data taken from Bowen et al. [43]). The results showed that the FBPNN model satisfactory described the nonlinear behavior of permeate flux, offering a training and testing R2 value of 0.997 and 0.914, respectively, against the linear multiple regression (MR) and nonlinear MR with the training and testing R2 values of 0.752 and 0.800, and 0.809 and 0.848, respectively. In another study, Krippl et al. [39] developed a single hidden layer FBPNN model and successfully predicted the permeate flux of cross-flow ultrafiltration process as a function of CF (100–300 mL min−1), TMP (1.3–2.5 bar), and the initial protein bulk concentration (1.9–23.2 g L−1) (i.e., bovine serum albumin and lysozyme from chicken egg white), where two types of membranes were used (i.e., hydrophobic polyether sulfone and hydrophilic stabilized cellulose-based membrane). The authors further combined the FBPNN model with a mechanistic model such that the hybrid model showed a high predictive accuracy in predicting the duration of filtration in both batch and fed-batch continuous modes.
Yet, to the best of the authors’ knowledge, the application of machine learning models in predicting the performance of CF–UF membrane process in XR enzyme purification has never been explored. This inspired the authors to conduct this study aimed to develop, validate, and test two different machine learning models, including an adaptive neuro-fuzzy inference system based on grid partitioning of the input space (ANFIS-GP) (an advanced ML model possessing the advantage of both numerical and linguistic knowledge, which is more transparent to the user/reader compared to other neural network models), and boosted regression trees (an easy-to-understand ML model even for people without an analytical background) to predict the performance of a CF–UF membrane process in the separation of XR as a function of the operating parameters CFV, TMP, and FT. The predictive performances of these two models were compared with each other, and with that of the best nonlinear MR picked from among 143 different regressions, and of a linear MR.
In this study, initially, the dataset is given followed by modeling approaches including ANFIS-GP, boosted regression trees, and (non)linear regression. Next, the modeling results are discussed and compared. Then, sensitivity analysis for the best predictive model is provided. Finally, this study ends with conclusions.

2. Methodology

A diagram illustrating the workflow of this study is depicted in Figure 1; refer to text for further details.

2.1. Dataset

The data used in this study were obtained from Krishnan et al. [20], who operated a lab-scale CF–UF membrane for separation of XR enzyme from the reaction mixture (i.e., xylitol, xylose, glucose, arabinose, acetic acid, and XR) during xylitol synthesis. The reaction mixture was subjected to ultrafiltration, where a filtration process was performed at TMP and CFV values of 1.2 bar and 1.06 cm s−1, respectively, to assess how the filtration time—ranging from 10 min to 100 min—influenced the process performance (i.e., the membrane permeability—called membrane flux—and the amount of xylitol produced). The authors also conducted two other series of experiments for the purpose of investigating the effects of TMP and CFV on the process performance: (i) TMP varied between 0.8 and 1.6 bar at a constant CFV value of 1.06 cm s−1, and (ii) CFV varied between 0.58 and 1.2 cm s−1 while TMP was held constant at 1.2 bar. The experimental conditions in all the runs are presented in Table 1.

2.2. Modeling Approaches

To construct predictive models (described in Section 2.2.1, Section 2.2.2, and Section 2.2.3), FT (x1), TMP (x2), and CFV (x3) were considered as the inputs, while the normalized flux (y1), and xylitol concentration (y2) were the outputs. As seen in Table 1, the raw dataset consists of a total number of 90 input–output data pairs (referred to hereafter as observations). Before using the raw dataset, it was randomized using Excel (version 2016, Microsoft Corp., Redmond, WA, USA) and subsequently split into two disjoint subsets: training subset and testing subset. Out of 90 observations, 70 observations (corresponding to about 78% of the dataset) were used as training subset to develop the models, while the remaining 22% of the dataset (i.e., 20 observations) was served as testing subset in order to evaluate the prediction power of the trained (developed) models; note that the training subset was further divided into five subsets in order to perform a cross-validation (CV) technique to avoid the models from overfitting the training data. The training and testing subsets were stored in the workspace of MATLAB® (trial version, R2020a) (MathWorks Inc., Natick, MA, USA) in the form of arrays, in which each row represented an observation.
The following subsections provide different modeling approaches used for predicting the performance of the UF membrane process in the separation of XR from the reaction mixture—obtained in xylitol synthesis—as a function of FT, TMP, and CFV.

2.2.1. ANFIS Model

ANFIS, a computational intelligence technique, integrates the learning ability of an artificial neural network (ANN) with the ability of an FIS in knowledge interpretation. The ANFIS modeling technique is used to model complex tasks with nonlinearity behavior, which is not easy to be modeled mathematically. Such models are built based on if-then rules and a collection of input–output data pairs [44,45].
Let us suppose that an FIS is composed of two inputs (xi; i = 1 and 2)—each characterized by two fuzzy sets Ai and Bi that are specified with an appropriate membership function (MF)—and one output f(x1,x2). Furthermore, let us suppose that the FIS is in the form of the first-order Takagi–Sugeno FIS [46]. The relationship between inputs and output can be described in accordance with the following rule sets, each consists of an “if part” (called premise or antecedent part) and a “then part” (called conclusion or consequent part):
Rule (1):
If   ( x 1   is   A 1   and   x 2   is   A 2 ) ,   then   f 1 ( x 1 , x 2 ) = a 1 x 1 + b 1 x 2 + c 1
Rule (2):
If   ( x 1   is   B 1   and   x 2   is   B 2 ) ,   then   f 2 ( x 1 , x 2 ) = a 2 x 1 + b 2 x 2 + c 2
where Ai and Bi denote the fuzzy sets pertaining to xi (i = 1 and 2), and each is specified by an appropriate MF (note that parameters associated with Ai and Bi are referred to as antecedent parameters); f(x1,x2) is the output function of the i-th rule (i = 1 and 2); ai, bi, and ci are referred to as antecedent parameters, corresponding to the i-th output function (i = 1 and 2).
A schematic representation of a two-input one-output first-order Takagi–Sugeno FIS with two fuzzy rules is displayed in Figure 2.
It can be observed from Figure 2 that the ANFIS model is a six-layer feed-forward neural network wherein each layer, except layer 1, consists of processing units (more often called nodes) that perform particular functions on their incoming signals and generate outputs, which are considered as inputs for the next layer; layer 1, called input layer, has no processing units; in other words, the input layer nodes only transfer the input data to the next layer. The reader is referred to [47,48] for a detailed description of each layer.
  • ANFIS-GP
One of the most popular techniques developed to identify the structure of the ANFIS model is based on grid partitioning (GP) of the input space. This technique—with respect to the number/type of MFs assigned to each dimension of a given input space—splits the input space into a number of subspaces (called fuzzy regions each characterized by a fuzzy if-then rule) using an axis-paralleled partition. The total number of fuzzy rules equals the number of all possible combinations of all the inputs [49]. Figure 3 illustrates an example of grid partitioning of a two-dimensional input space with three triangular MFs assigned to each dimension.
Let us suppose that an input space contains a collection of n inputs {x1, x2, …, xn}. Applying GP technique on such an input space yields m1 × m2 ×× mn fuzzy rules where mi stands for MFs associated with the i-th input xi. It can be implied that the number of fuzzy rules rapidly rises as the number of MFs increases. Therefore, training the ANFIS model becomes computationally expensive because the number of fitting parameters increases. According to Jang [50], GP is an appropriate technique only for problems with a small number of input variables (e.g., smaller than six). In this study, the problem at hand deals with three inputs. Hence, applying the ANFIS-GP is suitable.
  • ANFIS Parameters
ANFIS models contain two parameter sets: (1) antecedent parameter set and (2) consequent parameter set. Let us suppose an ANFIS is created based on the first-order Takagi–Sugeno FIS. The number of antecedent and consequent parameters can be computed according to Equations (3) and (4).
N A = N × P × M
N c = N R = × ( N + 1 )
where NA is the number of antecedent parameters; N is the number of inputs to the ANFIS model; P is the number of input MFs (for example, in the case of generalized bell-shaped MF; P equals (3)); M is the number of MFs assigned to each input; NC is the number of consequent parameters; and NR is the number of fuzzy rules.
A large body of literature has reported that a combination of the error backpropagation (BP) and the least squares estimation (LSE) methods, called a hybrid algorithm, is very efficient to optimize ANFIS parameters [50,51,52,53]. Each iteration of the hybrid algorithm includes two passes as follows:
(1)
Forward pass: The input signals go forward through the network, layer by layer, until the defuzzification layer wherein the LSE method is applied to determine the consequent parameters while the antecedent parameters remain constant.
(2)
Backward pass: The error signals are back propagated from the output layer to the input layer and gradient descent is used to adjust the antecedent parameters, while the consequent parameters remain constant.
The ANFIS output is calculated using the consequent parameters determined in the forward pass.
In this study, an ANFIS-GP model was implemented in MATLAB® (trial version, R2020a) (MathWorks Inc., Natick, MA, USA) using ANFIS Editor GUI (graphical user interface), which can be displayed by typing “anfisedit” in the MATLAB command window. Initially, the training and testing subsets were loaded from the MTALAB workspace, and then, an FIS in the form of the first-order Takagi–Sugeno FIS was created considering that a five-fold CV technique was applied to secure the model against overfitting the data.
For the problem at hand, various ANFIS models were developed differing in the types of input MFs to choose the best model as the one with minimum validation error. Six types of input MFs, including Gaussian MF (gaussmf), generalized bell-shaped MF (gbellmf), trapezoidal MF (trapmf), triangular MF (trimf), the product of two sigmoidal MFs (psigmf), and a combination of two Gaussian MFs (gauss2mf) were examined (refer to the study of Clemente [54] for the mathematical definition of these MFs). The number of MFs per input was set to two; note that an increase beyond two was found to be inappropriate due to the creation of an ANFIS model in which the total number of trainable parameters was greater than the number of observations in the training subset. Each ANFIS-GP model was trained using the hybrid algorithm, and the training error goal was set to zero. The algorithm stopped learning when the training error reached zero. Otherwise, the algorithm continued iterating up to a certain iteration beyond which the training error was not further decreased. In case none of these two stopping criteria were satisfied, iterating was progressed up the predetermined number of iterations (i.e., 100 iterations in this study). Once the model training process was complete, the model was tested on the testing subset (unseen during training process). The performance of the ANFIS-GP models was assessed using three statistical indices, namely coefficient of determination (R2), root mean squared error (RMSE), and Nash–Sutcliffe efficiency coefficient (NSE) defined by Equations (5), (6), and (7), respectively [55,56,57].
R 2 = ( i = 1 n ( y i y ¯ ) ( y i p y p ¯ ) i = 1 n ( y i y ¯ ) 2   i = 1 n ( y i p y p ¯ ) 2 ) 2   0   R 2     1
R M S E = ( 1 n i = 1 n ( y i y i p ) 2 ) 0.5   0   RMSE    
          N S E     = 1 i = 1 n ( y i y i p ) 2 i = 1 n ( y i y ¯ ) 2     NSE     1
where y i and y i p prepresent the measured values of the output, and the model predicted values for the i-th observations, respectively; y ¯ and ( y p ¯ ) represent the average value of yi and y i p , respectively (i = 1, 2, …, n); and n is the total number of observations (in the training or testing subset), on which R2, RMSE, and NSE are calculated.
From Equations (5)–(7), a perfect model would expect to achieve R2 and NSE values equal to unity, and an RMSE value of zero (i.e., y i p   = y𝑖).

2.2.2. Boosted Regression Trees

Boosted regression trees is a modelling approach composed of two algorithms, including regression trees (RT) and boosting [58]. The following subsections describe a brief introduction to RT and boosting algorithms.
  • RT Algorithm
RT, a decision support tool dealing with continuous (numeric) predictors [59,60], is among the most common machine learning algorithms, which has been widely used in various fields (e.g., engineering, science, finance, etc.) due to its simplicity and high interpretability [61,62,63]. The RT algorithm was first introduced by Breiman et al. [64], and it consists of a series of decision nodes on input variables called explanatory or predictor variables. A decision node is represented by a conditional statement, i.e., a binary question with the options of being “true” or “false”. This results in the creation of two arcs (branches) from each node; note that if the conditional statement is true, the observations correspond to the given predictor variable fall to the right branch, otherwise, they fall to the left branch. Each branch may lead to a decision node, which is branched out again, or may end up in an unsplit node called terminal (output) node or leaf depending on the minimum number of data points that a node can take in order to be split. Since this approach forms a tree-shaped structure, it is known as RT. Most RTs are drawn upside down, which means that the parent node (called root node) is the topmost, and the leaves are at the bottom of the tree [65,66,67]. The algorithm examines binary splits for all predictors to find their corresponding cut-off values that offer the best split. The best predictor, which is assigned to the root node, is selected according to the sum of the squared residuals index after data splitting; this process is repeated for new nodes (i.e., child nodes generated from the root node, grandchild nodes, etc.) until a stopping condition is satisfied (for instance, minimum leaf size (mls) is used as a stopping parameter); the splitting of nodes is terminated if the number of data points per node becomes smaller than the mls. The mathematical background of the RT model is given in detail below [65,68].
Let us suppose that a given system is described with a training dataset consisting of n observations { x i , y i   } 1 n with x R m and y R , where x represents an m-element vector (m denotes the number of predictor variables) and y is the corresponding output variable. The objective is to use this dataset to find a function, f (x), mapping x into y, which minimizes the expected value of a particular loss function L (y, f(x)) on the training dataset; an often used L function is the sum of squared residuals (SSR), where residuals are the difference between y and f(x). This function, f(x), can then be served to predict the output of query observations where only predictor variables are known.
First, all observations are sorted according to the values of each predictor [69]. Let us suppose that {x1, x2, …, xn} are the sorted values of a given predictor x j . Any value between xi and xi+1 will split the set to the same two subsets, and thus, the possible number of splits that needs to be tested is n-1. In general, the midpoint of each interval [xi, xi+1] is considered the splitting point. Then, the algorithm starts with a root node where, for a given predictor x j , we choose the cut-off point (c) such that the binary splitting of the observations into the right-hand node with xjc and the left-hand node with xj < c results in the greatest possible reduction in the accumulative SSR on both the right- and left-hand nodes (SSR index is used as a criterion to assess the quality of a given split). All predictors and all possible c values that correspond to each of the predictors are examined. A particular predictor with a c value that produces the overall minimal SSR is selected as the variable, which should reside on the root node. Mathematically, it can be expressed as follows:
S S R j = i L N ( y i y L N * ) 2 + i R N ( y i y R N * ) 2
where SSRj is sum of squared residual correspond to the j-th predictor, and y L N * and y R N * represent the averaged outputs of the series of observations corresponding to the left-hand node and to the right-hand node, respectively.
This process is repeated to find the best predictor and c value to split the dataset further so that the SSR in each of the prior created nodes is minimized. The process continues until a stopping condition is satisfied. For example, it stops once the size of none of the leaves is greater than mls, which is set by the user. When the tree is formed, the output of a given observation, i.e., new (test) observation, is made by averaging the outputs of all the training observations in the leaf where the test observation belongs.
  • Boosting
Single RT-based models—despite widespread use due to their simplicity and high interpretability, being able to deal with missing variables and to handle a mixture of continuous and categorical predictor variables, and insensitive to outliers [58,67] —suffer from low accuracy in term of prediction [60]. In addition, they are often unstable, which means that small changes in a training dataset may lead to forming a tree with different series of splits [70,71,72,73]. Hence, the boosting algorithm that was first proposed by Schapire [74]—later developed by Freund [75] and Freund and Schapire [76]—can be served as a means to enhance predictive performance of RT models [58,77]. Boosting of RT (referred to hereafter as BRT) employs an iterative algorithm to build a final model, sequentially adding many—typically hundreds or thousands—single RTs; a single RT is referred to as a base learner or weak learner due to having low prediction power [78]. The concept of the BRT model, graphically shown in Figure 4, is that each additional RT further reduces the prediction error [67].
Figure 5 displays a general schematic diagram depicting how BRT works; note that only two RTs are shown, and for simplicity, the RTs were supposed to be shallow with only one internal node. In other words, the left- or right-hand node (here the left-hand one) originating from the root node and the nodes from the single internal node are unsplit nodes (leaves). The output of a BRT model is calculated as the sum of the outputs of all trees multiplied by a learning rate.
Mathematically, the BRT’s output (F) can be computed in accordance with Equation (8) [58].
F ( x ) = k = 1 N t α f k ( x )
where fk(x) is the function of k-th RT built on the dataset {X, r} where X is the predictor space and r is the vector of residuals produced from the previously grown RT. When k equals one, r equals y, where y is the vector of outputs in the training observations. In other words, the first RT is grown on the dataset {X, y}. It is worth mentioning that each tree in a BRT model has a quite different structure compared with the others as they grow on different datasets.
α is the learning rate.
Nt is the total number of RTs.
  • BRT Tuning Parameters
The BRT model has several tuning parameters, which should be adjusted in order to optimize the model accuracy. The key parameters are as follows [58,67,79]:
Number of trees (Ntr): In case of setting Ntr to small values, the model may be subject to underfitting, which means that the model in unable to fit even the training dataset. In contrast, the more trees, the better the model learns. However, the training time increases, and the model would tend to overfitting. This means that the model learns noises (irregular patterns) rather than relevant patterns; in other words, the model offers good predictions on the training dataset; however, it produces a poor prediction performance on the test dataset. Hence, this parameter should be carefully controlled. One of the possible solutions to avoid under-/over-fitting is to use the CV technique, which allows selecting the optimum value of Ntr according to the performance of CV. For a detailed description of the CV technique, the reader is referred to the studies of Carslaw and Taylor [80], Fedotenkova [81], and Salehi and Lestari [82].
Depth of trees: The deeper the trees grow, the more the model is prone to overfitting.
Learning rate (α): It is often called step size or shrinkage factor, which takes a value in the range (0,1]; α is a measure of how fast or slow the model converges. Note that in case of setting α to a very small value (<<1), a very large value of Ntr is needed to attain high performance; however, this comes at a cost, which means that training the model might become computationally too expensive.
In this study, the BRT model was implemented in MATLAB using the Regression Learner App, a subdivision of the Machine Learning and Deep Learning group, which can be found by clicking on the Apps tab in the MATLAB toolbar. Initially, the training and testing subsets were loaded from the MATLAB workspace, and then a five-fold CV was selected as a preventative technique against overfitting.
There are some parameters, i.e., Ntr, mls and α, that are critical to the development of an accurate BRT model. For the problem under consideration here, Ntr, mls, and α varied in the following ranges: Ntr from 1 up to 600 (with a total number of 23 data points), mls from 2 to 8 with an increment step size of 1, and α from 0.1 to 1.0 with an increment step size of 0.1. It was assumed that only one of these three parameters varied at a time, and the rest of the two parameters were held constant. This assumption resulted in the creation of 1610 different BRT models for each output (y1 and y2), among which the best model was picked as the one that yielded the least validation error; the smaller the validation error, the better the model generalization power. The performance of the BRT model was assessed using the statistical indices (R2, NSE, and RMSE) given by Equations (5)–(7).

2.2.3. Multiple Regression Analysis

Multiple regression is a statistical technique that is widely used in numerous problems to establish a relationship between a series of predictor variables and a dependent variable called output variable. Consider output variable y, which depends on “m” predictor variables (x1, x2, …, xm). For “n” observation, if the relationship between these variables is given by Equation (10), it is called a multiple linear regression (MLR).
y i = a 0 + j = 1 m a j x j i                               i = 1 , 2 , , n
where xji denotes the i-th observation of xj; yi represents i-th observation of the response; “a0” is a constant (the model intercept); and “aj’s” are the linear regression coefficients; these coefficients can be estimated using the LSE method. The computational procedure to determine the regression coefficients has been well described in [83,84].
An extended form of MLR is the multiple nonlinear regression (MnLR) in which an output variable is described by a function that is a nonlinear combination of the predictors. Though the MLR is often used, there are so many problems in which the application of a MnLR is more suitable because the output behaves nonlinearly with changes in the predictors. A challenge that is often associated with the use of MnLR is what type of nonlinear function gives the best description of the relationship between predictors and output, because MnLR can take many different functional forms.
In this study, MLR analysis was first performed to appraise the performance of the conventional regression method for predicting the performance of the UF membrane process in the separation of XR. The training subset was fitted to an MLR using MATLAB to estimate the MLR coefficients. Then, the testing subset was used to assess the prediction accuracy of the fitted MLR in terms of R2, RMSE, and NSE given by Equations (5) and (6). Second, MnLR analysis was performed; to achieve this, the curve fitting in the DataFit software (trial version 9.1.32, Copyright© 1995–2014, Oakdale Engineering, USA) was used. The training subset was fitted to various MnLR models (a total number of 143 different models) to estimate the MnLR models’ coefficients. The fitted models were then fed with the testing subset to pick the best model as the one with the highest prediction accuracy (the higher R2, NSE, and the smaller RMSE).

2.2.4. Sensitivity Analysis

A sensitivity analysis was performed to investigate how the changes in input variables influence the output of a given model. In this regard, a tornado chart (sometimes called butterfly chart) was constructed, which is based on one-at-a-time input variation [85,86,87]. The tornado chart demonstrates the sensitivity of output in terms of “swings” displayed as bars; a swing corresponding to the input variable Xi is defined as the absolute difference between the output values (Yi) calculated at the values of Xi max and Xi min, while all other input variables (Xj, j # i) are held at their median values. The created swings are sorted horizontally in descending order of their width so that the overall chart (i.e., the stacked horizontal bars) looks similar to the shape of a typical tornado. The wider the swing, the higher the model sensitivity to the input variable tested; in other words, the most important input variable that affects the model output is at the top of the chart, and the least important is at the bottom. Figure 6 illustrates the procedure of building a tornado chart for a given output variable (Y) as a function of input variables Xi (i =1 to n). The blue and red bars—extended from the vertical dash line (Y value when medians are used for all input variables)—represent the values of Yi | X i ,   m i n and Yi | X i , m a x , respectively. The green bars represent swings, which are then categorized so that the widest bar is placed at the top of the chart, followed by the second-widest bar, and so on.

3. Results and Discussions

3.1. Evaluation of ANFIS-GP Model

Figure 7 shows the validation performance, in terms of R2 and RMSE, of the ANFIS-GP when the model output was the normalized flux. It is clear from Figure 7 that changes in the type of input MF influence the validation results. The best input MF, out of six MFs (i.e., trimf, trapmf, gbellmf, gaussmf, gauss2mf, and psigmf) examined to fit the training subset, was selected as the one yielded the highest R2 and the least RMSE. As seen in Figure 7, the use of trimf offered the best results (R2 = 0.9649, RMSE = 0.0154). Hereafter, the ANFIS-GP model with normalized flux as the model output and trimf as the best input MF will be referred to as ANFIS-GP1.
The structure of the ANFIS-GP1 is graphically represented in Figure 8. There are three nodes in the input layer, corresponding to the three inputs. In the second layer, two nodes are connected to each input node (in total six nodes), which correspond to two MFs, in the form of trimf, for each input. The third layer contains eight (i.e., 23) nodes equivalent to eight if-then rules. The rules processing results are given by eight nodes in the next layer, and then a weighted average method is applied in order to obtain the output. A detailed description of the fuzzy if-then rules, and the optimal values of the input MFs’ parameters (i.e., antecedent parameters) and output MFs’ parameters (i.e., consequent parameters) of the ANFIS-GP1 are given in the Supplementary Material (Tables S1–S3).
Regarding the ANFIS-GP model with xylitol concentration as the model output, gaussmf was selected as the best input MF, which gave R2 and RMSE values equal 0.8366 and 0.2394, respectively (see Figure 9). Hereafter, the ANFIS-GP model with xylitol concentration as the model output and gaussmf as the best input MF will be referred to as ANFIS-GP2. The structure of the ANFIS-GP2 is the same as the structure of the ANFIS-GP1 shown in Figure 8, except that the input MFs were in the form of gaussmf instead of trimf. A detailed description of the fuzzy if-then rules and the optimal values of the input/output MFs’ parameters of the ANFIS-GP2 are given in the Supplementary Materials (Tables S4–S6).
To visualize the prediction accuracy of the ANFIS-GP1 and ANFIS-GP2 on the entire dataset (i.e., both the training and testing subsets), the scattered diagram—in which the measured data plotted against the models’ predicted values—was constructed (Figure 10); a distinction first needs to be made between the 1:1 line and the line of best fit (regression line). The 1:1 line (more often called the 45° line or 100% correlation line) is often used as a reference when comparing two data sets in a two-dimensional scatter plot. If the corresponding data points from the two data sets are equal to each other, the corresponding scatters lie exactly on the 1:1 line. The line of best fit refers to a straight line through a scatter plot of data points that best represents the relationship between the data points. The equation for such a line is typically created using the least squares method.
It can be observed from Figure 10 that the data points on the plots have been dispersed well enough around the 1:1 line with R2 and NSE values of 0.9845 and 0.9843 for ANFIS-GP1 and of 0.9459 and 0.9453 for the ANFIS-GP2, respectively. These results indicate the high prediction accuracy of the models as they can satisfactorily explain the variability in outputs; the ANFIS-GP1 and ANFIS-GP2 do not explain only about 1.6% and 5.5%, respectively, of the total variability in the outputs.

3.2. Evaluation of BRT Model

Figure 11 shows the validation error of the BRT model (with the normalized flux as the output) as a function of mls and Ntr, while α was set to its optimal value (α = 0.3) that was obtained via trial-and error-method. As seen in Figure 11, the smallest error (RMSE = 0.0184) was achieved when mls = 5 and Ntr = 200 (note that further increase in Ntr, beyond 200, showed a very little effect on the decrease in error); hereafter, the optimal BRT model with the normalized flux as the output will be referred to as BRT1. As an example, the structure of the tree no. 200 in the trained BRT1 is illustrated in Figure 12.
Figure 13 shows the validation error of the BRT model (with xylitol concentration as the output) as a function of mls and Ntr, while α was set to its optimal value (α = 0.1). It can be observed in Figure 13 that the smallest error (RMSE = 0.2566) was obtained when mls = 2 and Ntr = 80; hereafter, the optimal BRT model with xylitol concentration as the output will be referred to as BRT2. As an example, the structure of tree no. 80 in the trained BRT2 is illustrated in Figure 14.
Figure 15 shows the prediction accuracy of models BRT1 and BRT2 on the entire dataset (i.e., both the training and testing subsets) as a scatter plot of the measured and the model-predicted values. From Figure 15, the BRT1 and BRT2 offered an R2 value of 0.9944 and 0.9669, respectively, and an NSE value of 0.9937 and 0.9664, respectively, indicating that the established models could predict the output variables well; the BRT1 and BRT2 do not explain only about 0.6% and 3.3%, respectively, of the total variability in the outputs.

3.3. Evaluation of Regression Models

The MLR and MnLR equations obtained using the training subset have the following form:
MLR1:
  y1 = −0.0019x1 + 0.2167x2 + 0.1745x3 + 0.1845
MLR2:
y2 = −0.0150x1 + 1.5183x2 + 0.9941x3 + 13.2684
MnLR1 : y 1 = 0.1547 + 2.3579 ln ( A ) 16.0759 A 2 A = 0.3905 x 1 35.6471 x 2 36.1295 x 3 + 92.6519
MnLR2 : y 2 = 13.1374 + 7.0660 × A ln ( A ) + 0.6351 A A = 0.1041 x 1 + 9.9640 x 2 + 8.2077 x 3 5.3888
where y1 and y2 represent the outputs (the normalized flux and xylitol concentration, respectively); x1: FT; x2: TMP; and x3: CFV (refer to Table 1 for FT, TMP, and CFV values).
Figure 16 shows a scatter plot of the measured and the predicted values using the MLR1, MLR2, MnLR1, and MnLR2 models on the entire dataset (i.e., both the training and testing subsets). From this figure, the MLR 1 and MLR2 produced a relatively poor linear correlation with R2value of 0.8149 and 0.7519, and NSE values of 0.8145 and 0.7507, respectively. It can also be seen from Figure 16 that the MnLR1 and MnLR2 produced a linear correlation with R2 value of 0.8653 and 0.8199, and NSE values of 0.8641 and 0.8190, respectively. These findings indicate that (non)linear regression could not be accurate enough to correctly predict the normalized flux and xylitol concentration for the system under consideration in this study.

3.4. Comparison of the Models

A performance comparison of the models developed (i.e., ANFIS-GP, BRT, and M(n)LR) on the entire dataset is summarized in Table 2; three statistical indices (R2, NSE, and RMSE) were used to make the comparison. It can be seen from Table 2, when the normalized flux was used as the model’s output (called ANFIS-GP1 model), an R2 value of 0.9845 and an NSE value of 0.9843—corresponding to an RMSE value of 0.0109—were obtained. In case of using xylitol concentration as the model’s output (called ANFIS-GP2), R2, NSE, and RMSE were found to be 0.9459, 0.9453, and 0.1494 g L−1, respectively. These findings demonstrate that the ANFIS-GP model performs better than M(n)LR models, which yielded R2, NSE, and RMSE values in the ranges of 0.7519–0.8653, 0.7507–0.8641, and 0.0319–0.3191, respectively. In addition to high prediction accuracy of the ANFIS-GP model, its development was straightforward because the number of input variables—for the problem presented in this study—was as small as three inputs. However, in case of a given problem involving many inputs, ANFIS-GP modeling could not be a feasible method because the number of fuzzy rules increases exponentially with an increase in the number of inputs, and consequently, this leads to an increase in the number of trainable parameters so that the model becomes computationally expensive. For instance, an ANFIS-GP model fed with n input variables would generate M n fuzzy rules, where M denotes the number of MFs for each input.
When comparing the performance of the developed ANFIS-GP and BRT models (see Table 2), it can be observed that the BRT produced an R2 value of 0.9669–0.9944 and an NSE value of 0.9664–0.9937, which are greater than those obtained using ANFIS-GP model. In addition, the BRT model gave an RMSE value of 0.0069–0.1171, which is approximately 22–37% smaller than that produced by the ANFIS model.
Overall, it can be concluded that, among the models developed, the BRT model offered the highest R2 and NSE values and produced the least error, and hence, it is the best choice for predicting the normalized flux and xylitol concentration for the system under consideration in this study. However, it should be noted that BRT models suffer from two major limitations: (i) they are greatly unstable; a small change in the training dataset can lead to form a completely different structure yielding different results, and (ii) such models are prone to overfit the training data and to be computationally expensive. In order to overcome this issue, caution must be taken to properly set the leaf node size (i.e., the number of data points assigned to each leaf node) for any given problem.

3.5. Sensitivity Analysis for Model BRT

The sensitivity analysis, in the form of tornado chart, for a model BRT that produced a smaller RMSE (higher R2) in comparison with models ANFIS-GP, MnLR, and MLR (see Table 2) is shown in Figure 17. As seen in Figure 17a (tornado chart for model BRT1), the widest swing is associated with input variable X1, which is 0.199. This implies that the highest impact on the model output comes from variable X1, followed by input variables X2 and X3 showing a lower impact, with swing widths of 0.165 and 0.110, respectively. Figure 16b (tornado chart for model BRT2), similar to Figure 16a, demonstrates that the input variable X1 has the highest impact on the model output with a swing width of 1.60 g L−1 compared to those of X2 and X3 with swing widths of 1.00 g L−1 and 0.77 g L−1, respectively.

3.6. Model Calibration and Validation Need

Although the modelling approaches and the models developed have demonstrated such a meaningful predictive power, a need for model calibration and validation has to be addressed. The machine learning models presented in this study were learned from and used to make predictions on the data taken from the experimental runs conducted by our previous work [20]. As the size of the original (raw) dataset was considered small (only 90 observations, called patterns), the authors initially decided to randomly split the raw dataset into only two disjoint subsets (i.e., training (Tr): 70 patterns, and testing (Ts): 20 patterns). Then, the cross validation (a very common, useful, and proven technique in the field of data mining) was performed to avoid overfitting the training data and to tune the models’ parameters. Then, the prediction accuracy of the developed (trained) model was assessed on the Ts subset.
Each pattern in the Ts subset was never seen during the training phase. In other words, at least one of the data points (input variables in the pattern) of the j-th pattern (j = 1—20) in the Ts subset had a different value compared to that of the i-th pattern (i =1–70) in the Tr subset. By this, the authors mean that the Tr and Ts are composed of independent patterns. In other words, a reasonable judgment on prediction capability of the trained model can be made by testing the model on the Ts subset.
If any data point of the j-th pattern in the Ts does not have the same value as found for that of i-th pattern in the Tr, the Tr and Ts subsets would be definitely independent, and a better judgment can be made on the prediction accuracy of the model. Recognizing the limited experimental runs in our work, we therefore recommend additional datasets of the truly independent experiments be performed when the mathematical/statistical modeling works are to be initiated by the existing data.

4. Conclusions

This study, for the first time, demonstrated the application of a machine learning (ML) modeling approach to predict the performance of a cross-flow ultrafiltration (CF-UF) membrane for the separation of xylose reductase (XR) from the reaction mixture obtained in the production of xylitol, as a function of cross-flow velocity, transmembrane pressure, and filtration time. Two types of ML-based models, including a grid partitioning-based adaptive neuro-fuzzy inference system (ANFIS-GP) and boosted regression trees (BRT) were developed, validated, and tested. The prediction accuracy of the BRT and ANFIS-GP models was compared as well as with that of the (non)linear regressions. The modeling results clearly indicated the BRT model yielded a superior predictive performance on membrane permeability with an excellent R2 value of 0.994 and on the amount of xylitol produced with R2 value of 0.966. This implies that the BRT could be considered as a valuable computational tool for predicting the performance of CF–UF membrane for enzymatic production of xylitol. Future research needs to be directed towards other types of commercial enzymes and on a scale-up study to determine the suitability of the separation process in industrial applications, particularly for long-term usage with respect to the production of high yield and productivity.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/su15054245/s1, Table S1: Fuzzy if-then rules for the ANFIS-GP1 used in this study; Table S2: Antecedent parameters of the ANFIS-GP1 used in this study; Table S3: Consequent parameters of the ANFIS-GP1 used in this study; Table S4: Fuzzy if-then rules for the ANFIS-GP2 used in this study; Table S5: Antecedent parameters of the ANFIS-GP xylitol model used in this study; Table S6: Consequent parameters of the ANFIS-GP xylitol model used in this study.

Author Contributions

Writing—original draft, validation, formal analysis, interpretation, writing—review and editing, R.S.; Writing—original draft, investigation, formal analysis, resources, writing—review and editing, S.K.; Writing—review and editing, M.N.; Supervision, project administration, S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by a Postdoctoral Fellowship from Prince of Songkla University, Thailand. This study was also supported by research grant-(RDU2203102) from the University of Malaysia Pahang, Malaysia.

Data Availability Statement

All data supporting the conclusions of this study are included in the paper or in its Supplementary Materials.

Acknowledgments

The authors thank the support from Biogas and Biorefinery Laboratory at the Faculty of Engineering and PSU Energy Systems Research Institute, Prince of Songkla University, Thailand.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

AKRaldo-keto reductase
ALRaldehyde reductase
ANFIS-GPadaptive neuro-fuzzy inference system based on grid partitioning on the input space
ANNartificial neural network
Ai, Bifuzzy sets pertaining to input variable xi
ai, bi, ciantecedent parameters corresponding to the i-th output function
BPbackpropagation
BRTboosted regression tree
ccut-off point
CF-UFcross-flow ultrafiltration
CVcross-validation
FBPNNfeedforward backpropagation neural network
FISfuzzy inference system
fioutput function of the i-th fuzzy rule
GPgrid partitioning
GUIgraph user interface
gaussmfGaussian membership function
gauss2mfcombination of two Gaussian membership function
gbellmfgeneralized bell-shaped membership function
LSEleast squared estimation
Mnumber of membership functions assigned to each input
MFmembership function
MLRmultiple linear regression
mlsminimum leaf size
MnLRmultiple nonlinear regression
Nnumber of inputs to the ANFIS model
NAnumber of antecedent parameters
NCnumber of consequent parameters
NRnumber of fuzzy rules
ntotal number of observations
NADPHreduced form of nicotinamide adenine dinucleotide phosphate (donor of hydrogen atoms)
NSENash–Sutcliffe efficiency coefficient
Pnumber of input MFs
psigmfproduct of two sigmoidal membership function
R2coefficient of determination
RMSEroot mean squared error
rvector of residuals
RTregression tree
SSRsum of squared residuals
trapmftrapezoidal membership function
trimftriangular membership function
UFultrafiltration
wisynaptic weight assigned to the signal leaving the i-th neuron with the hid den layer
XRxylose reductase
x1FT (filtration time; min)
x2TMP (transmembrane pressure; bar)
x3CFV (cross-flow velocity; cm s−1)
y1normalized flux (i.e., ratio of the permeate flux to the pure water flux)
y2xylitol concentration (g L−1)
yipmodel-predicted value for the i-th observation
y ¯ averaged value of yi
y | x i ,   m i n output value computed at xi set to its minimum value
y | x i ,   m a x output value computed at xi set to its maximum value
μji (xi)membership function of j-th fuzzy set associated with input variable xi
αlearning rate (step size or shrinkage factor)

References

  1. Jez, J.M.; Penning, T.M. The aldo-keto reductase (AKR) superfamily: An update. Chem. Biol. Interact. 2001, 130–132, 499–525. [Google Scholar] [CrossRef]
  2. Krishnan, S.; Nasrullah, M.; Kamyab, H.; Suzana, N.; Munaim, M.S.A.; Wahid, Z.A.; Ali, I.H.; Salehi, R.; Chaiprapat, S. Fouling characteristics and cleaning approach of ultrafiltration membrane during xylose reductase separation. Bioprocess Biosyst. Eng. 2022, 45, 1125–1136. [Google Scholar] [CrossRef]
  3. Azizah, N. Biotransformation of xylitol production from xylose of lignocellulose biomass using xylose reductase enzyme: Review. J. Food Life Sci. 2019, 3, 103–112. [Google Scholar]
  4. Umai, D.; Kayalvizhi, R.; Kumar, V.; Jacob, S. Xylitol: Bioproduction and applications-A review. Front. Sustain. 2022, 3, 826190. [Google Scholar] [CrossRef]
  5. Lugani, Y.; Sooch, B.S. Fermentative production of xylitol from a newly isolated xylose reductase producing Pseudomonas putida BSX-46. LWT-Food Sci. Technol. 2020, 134, 109988. [Google Scholar] [CrossRef]
  6. Narisetty, V.; Castro, E.; Durgapal, S.; Coulon, F.; Jacob, S.; Kumar, D.; Awasthi, M.K.; Pant, K.K.; Parameswaran, B.; Kumar, V. High level xylitol production by Pichia fermentans using non-detoxified xylose-rich sugarcane bagasse and olive pits hydrolysates. Bioresour. Technol. 2021, 342, 126005. [Google Scholar] [CrossRef]
  7. Saha, B.C.; Kennedy, G.J. Production of xylitol from mixed sugars of xylose and arabinose without co-producing arabitol. Biocatal. Agric. Biotechnol. 2020, 29, 101786. [Google Scholar] [CrossRef]
  8. Lugani, Y.; Puri, M.; Sooch, B.S. Recent insights, applications and prospects of xylose reductase: A futuristic enzyme for xylitol production. Eur. Food Res. Technol. 2021, 247, 921–946. [Google Scholar] [CrossRef]
  9. Ho, N.W.Y.; Lin, F.P.; Huang, S.; Andrewst, P.C.; Tsao, G.T. Purification, characterization, and amino terminal sequence of xylose reductase from Candida shehatae. Enzym. Microb. Technol. 1990, 12, 33–39. [Google Scholar] [CrossRef]
  10. Eryasar, K.; Karasu-Yalcin, S. Evaluation of some lignocellulosic byproducts of food industry for microbial xylitol production by Candida tropicalis. 3 Biotech 2016, 6, 202. [Google Scholar] [CrossRef] [Green Version]
  11. Ariyan, M.; Uthandi, S. Xylitol production by xylose reductase over producing recombinant Escherichia coli M15. Madras Agric. J. 2019, 106, 205–209. [Google Scholar] [CrossRef]
  12. Kim, S.; Lee, J.; Sung, B.H. Isolation and characterization of the stress-tolerant Candida tropicalis YHJ1 and evaluation of its xylose reductase for xylitol production from acid pre-treatment wastewater. Front. Bioeng. Biotechnol. 2019, 7, 12. [Google Scholar] [CrossRef]
  13. Walsh, M.K.; Khliaf, H.F.; Shakir, K.A. Production of xylitol from agricultural waste by enzymatic methods. Am. J. Agric. Biol. Sci. 2018, 13, 1–8. [Google Scholar] [CrossRef] [Green Version]
  14. Kklaif, H.F.; Nasser, J.M.; Shakir, K.A. Production of xylose reductase and xylitol by Candida guilliermondii using wheat straw hydrolysates. Iraqi J. Agric. Sci. 2020, 51, 1653–1660. [Google Scholar]
  15. Quehenberger, J.; Reichenbach, T.; Baumann, N.; Rettenbacher, L.; Divne, C.; Spadiut, O. Kinetics and predicted structure of a novel xylose reductase from Chaetomium thermophilum. Int. J. Mol. Sci. 2019, 20, 185. [Google Scholar] [CrossRef] [Green Version]
  16. Mouro, A.; dos Santos, A.A.; Agnolo, D.D.; Gubert, G.F.; Bon, E.P.S.; Rosa, C.A.; Fonseca, C.; Stambuk, B.U. Combining xylose reductase from Spathaspora arborariae with xylitol dehydrogenase from Spathaspora passalidarum to promote xylose consumption and fermentation into xylitol by Saccharomyces cerevisiae. Fermentation 2020, 6, 72. [Google Scholar] [CrossRef]
  17. Woodyer, R.; Simurdiak, M.; van der Donk, W.A.; Zhao, H. Heterologous expression, purification, and characterization of a highly active xylose reductase from Neurospora crassa. Appl. Environ. Microbiol. 2005, 71, 1642–1647. [Google Scholar] [CrossRef] [Green Version]
  18. Mueller, M.; Wilkins, M.R.; Banat, I.M. Production of xylitol by the thermotolerant Kluyveromyces marxianus IMB strains. Bioprocess. Biotech. 2011, 1, 1000102e. [Google Scholar] [CrossRef] [Green Version]
  19. Dasgupta, D.; Ghosh, D.; Bandhu, S.; Agrawal, D.; Suman, S.K.; Adhikari, D.K. Purification, characterization and molecular docking study of NADPH dependent xylose reductase from thermotolerant Kluyveromyces sp. IIPE453. Process Biochem. 2016, 51, 124–133. [Google Scholar] [CrossRef]
  20. Krishnan, S.; Suzan, B.N.; Wahid, Z.A.; Nasrullah, M.; Munaim, M.S.A.; MD Din, M.F.B.; Taib, S.M.; Li, Y.Y. Optimization of operating parameters for xylose reductase separation through ultrafiltration membrane using response surface methodology. Biotechnol. Rep. 2020, 27, e00498. [Google Scholar] [CrossRef]
  21. Desiriani, R.; Kresnowati, M.T.A.P.; Wenten, I.G. Membrane-based downstream processing of microbial xylitol production. Int. J. Technol. 2017, 8, 1393–1401. [Google Scholar] [CrossRef]
  22. Conidi, C.; Drioli, E.; Cassano, A. Membrane-based agro-food production processes for polyphenol separation, purification and concentration. Curr. Opin. Food Sci. 2018, 23, 149–164. [Google Scholar] [CrossRef]
  23. Cordova, A.; Astudillo, C.; Illanes, A. Membrane technology for the purification of enzymatically produced oligosaccharides. In Separation of Functional Molecules in Food by Membrane Technology; Elsevier: Amsterdam, The Netherlands, 2019; pp. 113–153. [Google Scholar]
  24. Mukherjee, S. Isolation and purification of industrial enzymes: Advances in enzyme technology. Adv. Enzym. Technol. (Biomass Biofuels Biochem.) 2019, 41–70. [Google Scholar] [CrossRef]
  25. Michaels, A.S. New separation techniques for the CPI. Chem. Eng. Process. 1968, 64, 31–42. [Google Scholar]
  26. Vilker, V.L.; Colton, C.K.; Smith, K.A.; Green, D.L. The osmotic pressure of concentrated protein and lipoprotein solutions and its significance to ultrafiltration. J. Membr. Sci. 1984, 20, 63–77. [Google Scholar] [CrossRef]
  27. Bellara, S.R.; Cui, Z. A Maxwell-Stefan approach to modeling the cross flow ultrafiltration of protein solutions in tubular membranes. Chem. Eng. Sci. 1998, 53, 2153–2166. [Google Scholar] [CrossRef]
  28. Wesselingh, J.A.; Vonk, P. Ultrafiltration of a large polyelectrolyte. J. Membr. Sci. 1995, 99, 21–27. [Google Scholar] [CrossRef]
  29. Ahmad, A.L.; Chong, M.F.; Bhatia, S. Ultrafiltration modeling of multiple solutes system for continuous cross flow process. Chem. Eng. Sci. 2006, 61, 5057–5069. [Google Scholar] [CrossRef]
  30. Karasu, K.; Yoshikawa, S.; Kentish, S.E.; Stevens, G.W. A model for cross flow filtration of dairy whey based on the rheology of the compressible cake. J. Membr. Sci. 2009, 341, 252–260. [Google Scholar] [CrossRef]
  31. Landman, K.A.; Sirakoff, C.; White, L.R. Dewatering of flocculated suspensions by pressure filtration. Phys. Fluids Fluid Dyn. 1991, 3, 1495–1509. [Google Scholar] [CrossRef] [Green Version]
  32. Landman, K.A.; White, L.R.; Eberl, M. Pressure filtration of flocculated suspensions. AIChE J. 1995, 41, 1687–1700. [Google Scholar] [CrossRef]
  33. Lawrence, N.D. Characterisation and Performance of Ultrafiltration Membranes in the Dairy Industry. Ph.D. Thesis, University of Melbourne, Melbourne, Australia, 1998. [Google Scholar]
  34. Nguyen, T.A.; Yoshikawa, S.; Karasu, K.; Ookawara, S. A simple combination model for filtrate flux in cross flow ultrafiltration of protein suspension. J. Membr. Sci. 2012, 403–404, 84–93. [Google Scholar] [CrossRef]
  35. Kirschner, A.Y.; Cheng, Y.H.; Paul, D.R.; Field, R.W.; Freeman, B.D. Fouling mechanisms in constant flux cross flow ultrafiltration. J. Membr. Sci. 2019, 574, 65–75. [Google Scholar] [CrossRef] [Green Version]
  36. Hermia, J. Constant pressure blocking filtration laws—Application of power-law non-Newtonian fluids. Trans. Am. Inst. Chem. Eng. 1982, 60, 183–187. [Google Scholar]
  37. Field, R.W.; Wu, J.J. Modeling of permeability loss in membrane filtration: Re-examination of fundamental fouling equations and their link to critical flux. Desalination 2011, 283, 68–74. [Google Scholar] [CrossRef]
  38. Cheryan, M. Ultrafiltration and Microfiltration Handbook; CRC Press: Roca Raton, FL, USA, 1998. [Google Scholar]
  39. Krippl, M.; Durauer, A.; Duerkoa, M. Hybrid modeling of cross filtration: Predicting the flux evolution and duration of ultrafiltration processes. Sep. Purif. 2020, 248, 117064. [Google Scholar] [CrossRef]
  40. Sargolzaei, J.; Saghatoleslami, N.; Mosavi, S.M.; Khoshnoodi, M. Comparative study of artificial neural network (ANN) and statistical methods for predicting the performance of ultrafiltration process in the milk industry. Iran J. Chem. Chem. Eng. 2006, 25, 67–76. [Google Scholar]
  41. Teodosiu, C.; Pastravanu, O.; Macoveanu, M. Neural network models for ultrafiltration and backwashing. Water Res. 2000, 34, 4371–4380. [Google Scholar] [CrossRef]
  42. Wei, A.L.; Zeng, G.M.; Huang, G.H.; Liang, J.; Li, X.D. Modeling of a permeate flux of cross-flow membrane filtration of colloidal suspensions: A wavelet network approach. Int. J. Environ. Sci. Technol. 2009, 6, 395–406. [Google Scholar] [CrossRef] [Green Version]
  43. Bowen, W.R.; Jones, M.G.; Yousef, H.N.S. Prediction of the rate of cross flow membrane ultrafiltration of colloids: A neural network approach. Chem. Eng. Sci. 1998, 53, 3793–3802. [Google Scholar] [CrossRef]
  44. Hussain, W.; Merigo, J.M.; Reza, M.R. Predictive intelligence using ANFIS-induced OWAWA for complex stock market prediction. Int. J. Intell. Syst. 2022, 37, 4586–4611. [Google Scholar] [CrossRef]
  45. Al-qaness, M.A.A.; Ewees, A.A.; Fan, H.; Abualigah, L.; Abd Elaziz, M. Boosted ANFIS model using augmented marine predator algorithm with mutation operators for wind power forecasting. Appl. Energy 2022, 314, 118851. [Google Scholar] [CrossRef]
  46. Takagi, T.; Sugeno, M. Fuzzy identification of systems and its application to modeling and control. IEEE Trans. Syst. Man Cybern. 1985, 15, 116–132. [Google Scholar] [CrossRef]
  47. Zivkovic, M.; Bacanin, N.; Venkatachalam, K.; Nayyar, A.; Djordjevic, A.; Strumbeger, I.; Al-Turjman, F. COVID-19 cases prediction by using hybrid machine learning and beetle antennae search approach. Sustain. Cities Soc. 2021, 66, 102669. [Google Scholar] [CrossRef]
  48. Salehi, R.; Chaiprapat, S. Predicting H2S emission from gravity sewer using an adaptive neuro-fuzzy inference system. Water Qual. Res. J. 2022, 57, 20–39. [Google Scholar] [CrossRef]
  49. Khaled, B.; Abdellah, A.; Noureddine, D.; Salim, H.; Sabeha, A. Modelling of biochemical oxygen demand from limited water quality variable by ANFIS using two partition methods. Water Qual. Res. J. 2018, 53, 24–40. [Google Scholar] [CrossRef]
  50. Jang, J.S.R. ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man Cybern. 1993, 23, 665–685. [Google Scholar] [CrossRef]
  51. Jang, J.S.R.; Sun, C.T.; Mizutani, E. Neuro-Fuzzy and Soft Computing (A Computational Approach to Learning and Machine Intelligence); Prentice Hall: Hoboken, NJ, USA, 1997. [Google Scholar]
  52. Karim, R.; Dilwar, F.; Siddique, R.A. Predictive modeling of surface roughness in MQL assisted turning of SiC-Al alloy composites using artificial neural network and adaptive neuro fuzzy inference system. J. Adv. Res. Manuf. Mater. Sci. Metall. Eng. 2018, 5, 12–28. [Google Scholar]
  53. Keneni, B.M. Evolving Rule Based Explainable Artificial Intelligence for Decision Support System of Unmanned Aerial Vehicles. Master’s Thesis, University of Toledo, Electrical and Computer Science Engineering Department, Toledo, OH, USA, 2018. [Google Scholar]
  54. Clemente, L. ANFIS Modelling of PIV Tests. Master’s Thesis, Department of Chemical and Materials Engineering, Polytechnic of Turin, Turin, Italy, 2019. [Google Scholar]
  55. Deo, R.C.; Downs, N.J.; Adamowski, J.F.; Parisi, A.V. Adaptive neuro-fuzzy inference system integrated with solar zenith angle for forecasting sub-tropical photo-synthetically active radiation. Food Energy Secur. 2019, 8, e00151. [Google Scholar] [CrossRef] [Green Version]
  56. Dehghani, M.; Riahi-Madvar, H.; Hooshyaripor, F.; Mosavi, A.; Shamshirband, S.; Zavadskas, E.K.; Chau, K.W. Prediction of hydropower generation using grey wolf optimization adaptive neuro-fuzzy inference system. Energies 2019, 12, 289. [Google Scholar] [CrossRef] [Green Version]
  57. Belvederesi, C.; Dominic, J.A.; Hassan, Q.K.; Gupta, A.; Achari, G. Predicting river flow using an Al-based sequential adaptive neuro-fuzzy inference system. Water 2020, 12, 1622. [Google Scholar] [CrossRef]
  58. Elith, J.; Leathwick, J.R.; Hastie, T. A working guide to boosted regression trees. J. Anim. Ecol. 2008, 77, 802–813. [Google Scholar] [CrossRef]
  59. Carty, D.M. An Analysis of Boosted Regression Trees to Predict the Strength Properties of Wood Composites. Master’s Thesis, Graduate School at Tennessee Research Creative Exchange, University of Tennessee, Knoxville, TN, USA, 2011. [Google Scholar]
  60. Friedman, J.H.; Meulman, J.J. Multiple additive regression trees with application in Epidemiology. Stat. Med. 2003, 22, 1365–1381. [Google Scholar]
  61. Chang, Y. Economic forecasting by a piecewise regression tree method. Int. J. Manag. Appl. Sci. 2017, 3, 12–15. [Google Scholar]
  62. Koc, Y.; Eyduran, E.; Akbulut, O. Application of regression tree method for different data from animal science. Pak. J. Zool. 2017, 49, 599–607. [Google Scholar] [CrossRef]
  63. Naumets, S.; Lu, M. Investigation into explainable regression trees for construction engineering applications. J. Constr. Eng. Manag. 2021, 147, 04021084. [Google Scholar] [CrossRef]
  64. Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. Classification and Regression Trees; Chapman and Hall/CRC: London, UK, 1984. [Google Scholar]
  65. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning with Application in R; Springer Science + Business Media: New York, NY, USA, 2013. [Google Scholar]
  66. Das, S. Reconstruction of Gamma-Ray Direction Using Boosted Decision Trees and the Disp Parameter. Master’s Thesis, Department of Physics, McGill University, Montreal, QC, Canada, 2019. [Google Scholar]
  67. Akalin, A. Computational Genomics with R; Chapman and Hall/CRC: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
  68. Ceballos-Santos, S.; González-Pardo, J.; Carslaw, D.C.; Santurtún, A.; Santibáñez, M.; Fernández-Olmo, I. Meteorological normalisation using boosted regression trees to estimate the impact of COVID-19 restrictions on air quality levels. Int. J. Environ. Res. Public Health 2021, 18, 13347. [Google Scholar] [CrossRef]
  69. Mehta, M.; Agrawal, R.; Rissanen, J. SLIQ: A fast scalable classifier for data mining. In International Conference on Extending Database Technology; Springer: Berlin, Heidelberg, 1996. [Google Scholar] [CrossRef]
  70. Sutton, C.D. Classification and regression trees, bagging, and boosting. Handb. Stat. 2005, 24, 303–329. [Google Scholar]
  71. Bastos, J. Credit Scoring with Boosted Decision Trees; CEMAPRE, School of Economics and Management (ISEG), Technical University of Lisbon: Lisbon, Portugal, 2008. [Google Scholar]
  72. Roe, B.P.; Yang, H.J.; Zhu, J.; Liu, Y.; Stancu, I.; McGregor, G. Boosted decision trees as an alternative to artificial neural networks for particle identification. Nucl. Instrum. Methods Phys. Res. A 2005, 543, 577–584. [Google Scholar] [CrossRef] [Green Version]
  73. Hastie, T.; Tibshirani, R.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: NewYork, NY, USA, 2001. [Google Scholar]
  74. Schapire, R.E. The strength of weak learnability. Mach. Learn. 1990, 5, 197–227. [Google Scholar] [CrossRef] [Green Version]
  75. Freund, Y. Boosting a week learning algorithm by majority. Inf. Comput. 1995, 121, 256–285. [Google Scholar] [CrossRef] [Green Version]
  76. Freund, Y.; Schapire, R.E. A Short introduction to boosting. J. Jpn. Soc. Artif. Intell. 1999, 14, 771–780. (In Japanese) [Google Scholar]
  77. Ridgeway, G. The state of boosting. Comput. Sci. Stat. 1999, 31, 172–181. [Google Scholar]
  78. Latif, S.D. Developing a boosted decision tree regression prediction model as a sustainable tool for compressive strength of environmentally friendly concrete. Environ. Sci. Pollut. Res. 2021, 28, 65935–65944. [Google Scholar] [CrossRef] [PubMed]
  79. Xie, Y.; Zhu, C.; Lu, Y.; Zhu, Z. Towards optimization of boosting models for formation lithology identification. Math. Probl. Eng. 2019, 13, 5309852. [Google Scholar] [CrossRef] [Green Version]
  80. Carslaw, D.C.; Taylor, P.J. Analysis of air pollution data at a mixed source location using boosted regression trees. Atmos. Environ. 2009, 43, 3563–3570. [Google Scholar] [CrossRef]
  81. Fedotenkova, M. Extraction of Multivariate Components in Brain Signals Obtained during General Anesthesia. Ph.D. Thesis, Faculty of science and technology, University of Lorraine, Nancy, France, 2016. [Google Scholar]
  82. Salehi, R.; Lestari, R.A.S. Predicting the performance of a desulfurizing bio-filter using an artificial neural network (ANN) model. Environ. Eng. Res. 2021, 26, 200462. [Google Scholar] [CrossRef]
  83. Draper, N.R.; Smith, H. Applied Regression Analysis; John Wiley & Sons: New York, NY, USA, 1998. [Google Scholar]
  84. Montgomery, D.C. Design and Analysis of Experiments; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  85. Eschenbach, T.G. Spiderplots versus tornado diagrams for sensitivity analysis. Interfaces 1992, 22, 40–46. [Google Scholar] [CrossRef]
  86. Bassam, A.; Tzuc, M.; Soberanis, M.E.; Ricalde, L.J.; Cruz, B. Temperature estimation for photovoltaic array using an adaptive neuro fuzzy inference system. Sustainability 2017, 9, 1399. [Google Scholar] [CrossRef] [Green Version]
  87. Tran, Q.H.; Huh, J.; Nguyen, V.B.; Kang, C.; Ahn, J.H.; Park, I.J. Sensitivity analysis for ship-to-shore container crane design. Appl. Sci. 2018, 8, 1667. [Google Scholar] [CrossRef] [Green Version]
Figure 1. A schematic illustration of the workflow of this study (Tr: training subset; Ts: testing subset; ANFIS: adaptive neuro-fuzzy inference system; BRT: boosted regression tree; M(n)LR: multiple (non)linear regression; see text for further details).
Figure 1. A schematic illustration of the workflow of this study (Tr: training subset; Ts: testing subset; ANFIS: adaptive neuro-fuzzy inference system; BRT: boosted regression tree; M(n)LR: multiple (non)linear regression; see text for further details).
Sustainability 15 04245 g001
Figure 2. A schematic representation of a two-input one-output first-order Takagi–Sugeno FIS with two fuzzy rules. x1 and x2 are the inputs, Ai and Bi are the fuzzy sets pertaining to the input xi (i = 1 and 2), μA𝑖(𝑥𝑖) is the membership function of fuzzy set A associated with xi (i = 1 and 2), μB𝑖(𝑥𝑖) is the membership function of fuzzy set B associated with xi (i = 1 and 2), the nodes marked as “π” apply a product operation, the nodes marked as “N” normalize the firing strength wi (i = 1 and 2), the single node marked as ∑ performs a summation function, f i ( x 1 , x 2 ) represents the output function of the i-th rule (i = 1 and 2), and a rectangle represents an adaptive node while a circle represents a fixed node (an adaptive node has modifiable parameters while a fixed node does not have modifiable parameters).
Figure 2. A schematic representation of a two-input one-output first-order Takagi–Sugeno FIS with two fuzzy rules. x1 and x2 are the inputs, Ai and Bi are the fuzzy sets pertaining to the input xi (i = 1 and 2), μA𝑖(𝑥𝑖) is the membership function of fuzzy set A associated with xi (i = 1 and 2), μB𝑖(𝑥𝑖) is the membership function of fuzzy set B associated with xi (i = 1 and 2), the nodes marked as “π” apply a product operation, the nodes marked as “N” normalize the firing strength wi (i = 1 and 2), the single node marked as ∑ performs a summation function, f i ( x 1 , x 2 ) represents the output function of the i-th rule (i = 1 and 2), and a rectangle represents an adaptive node while a circle represents a fixed node (an adaptive node has modifiable parameters while a fixed node does not have modifiable parameters).
Sustainability 15 04245 g002
Figure 3. Grid partitioning of an input domain composed of two input variables (X1 and X2) each assigned three triangular membership functions. Ai and Bi are the fuzzy sets pertaining to the input Xi (i = 1 and 2), and μji(Xi) is the membership function of the j-th fuzzy set (A or B) associated with Xi (i = 1 and 2).
Figure 3. Grid partitioning of an input domain composed of two input variables (X1 and X2) each assigned three triangular membership functions. Ai and Bi are the fuzzy sets pertaining to the input Xi (i = 1 and 2), and μji(Xi) is the membership function of the j-th fuzzy set (A or B) associated with Xi (i = 1 and 2).
Sustainability 15 04245 g003
Figure 4. A graphical representation of BRT model concept.
Figure 4. A graphical representation of BRT model concept.
Sustainability 15 04245 g004
Figure 5. A general schematic diagram depicting the components of a BDT. A black circle corresponds to the root node of the tree, whereas a brown corresponds to the internal node. The green circles represent unsplit nodes (leaves). A solid blue arrow indicates that the conditional statement in the preceding node is satisfied, while a dashed blue arrow indicates that the conditional statement has failed. The circle labeled as Σ applies a summation function on its incoming signals, where a signal is defined as the weighted output of an RT (weighting coefficient known as learning rate (α) that take values smaller than 1.0).
Figure 5. A general schematic diagram depicting the components of a BDT. A black circle corresponds to the root node of the tree, whereas a brown corresponds to the internal node. The green circles represent unsplit nodes (leaves). A solid blue arrow indicates that the conditional statement in the preceding node is satisfied, while a dashed blue arrow indicates that the conditional statement has failed. The circle labeled as Σ applies a summation function on its incoming signals, where a signal is defined as the weighted output of an RT (weighting coefficient known as learning rate (α) that take values smaller than 1.0).
Sustainability 15 04245 g005
Figure 6. Graphical illustration of sensitivity analysis using tornado chart (note: the swing associated with the input variable Xi (i = 1 to n) is the absolute (Yi | X i ,   m a x Yi | X i , m i n ) while all other input variables Xj (j # i) are set at their median values).
Figure 6. Graphical illustration of sensitivity analysis using tornado chart (note: the swing associated with the input variable Xi (i = 1 to n) is the absolute (Yi | X i ,   m a x Yi | X i , m i n ) while all other input variables Xj (j # i) are set at their median values).
Sustainability 15 04245 g006
Figure 7. Validation results for the ANFIS-GP (with normalized flux as the model output) as a function of the types of membership functions.
Figure 7. Validation results for the ANFIS-GP (with normalized flux as the model output) as a function of the types of membership functions.
Sustainability 15 04245 g007
Figure 8. Structure of the ANFIS-GP1 (x1: FT, x2: TMP, x3: CFV (refer to Table 1 for FT, TMP, and CFV values); y1: predicted normalized flux; “MF” and “R” denote membership function and fuzzy if-then rule, respectively (refer to Tables S1–S3 in the Supplementary Materials for a detailed description of the fuzzy if-then rules and the optimal values of the inputs/output MFs’ parameters).
Figure 8. Structure of the ANFIS-GP1 (x1: FT, x2: TMP, x3: CFV (refer to Table 1 for FT, TMP, and CFV values); y1: predicted normalized flux; “MF” and “R” denote membership function and fuzzy if-then rule, respectively (refer to Tables S1–S3 in the Supplementary Materials for a detailed description of the fuzzy if-then rules and the optimal values of the inputs/output MFs’ parameters).
Sustainability 15 04245 g008
Figure 9. Validation results for the ANFIS-GP (with xylitol concentration as the model output) as a function of the types of membership functions.
Figure 9. Validation results for the ANFIS-GP (with xylitol concentration as the model output) as a function of the types of membership functions.
Sustainability 15 04245 g009
Figure 10. Scatter plot of the measured and predicted (a) normalized flux using ANFIS-GP 1 and (b) xylitol concentration using the ANFIS-GP2 on the entire dataset.
Figure 10. Scatter plot of the measured and predicted (a) normalized flux using ANFIS-GP 1 and (b) xylitol concentration using the ANFIS-GP2 on the entire dataset.
Sustainability 15 04245 g010
Figure 11. Validation RMSE curves for the BRT model (with normalized flux as the output) as a function of Ntr and mls (α = 0.3).
Figure 11. Validation RMSE curves for the BRT model (with normalized flux as the output) as a function of Ntr and mls (α = 0.3).
Sustainability 15 04245 g011
Figure 12. Graphical representation of the structure of weak learner (tree) no. 200 in the trained BRT1. (Notes: the total number of nodes in the tree equals 21; a brown circle indicates a node that can be branched out, whereas a green circle indicates a terminal node (called unsplit node or leaf node) that cannot be branched out; the number inside the nodes represent node number; the number under the each node represents the size of the node (the number of observations that satisfy the condition for the node); xi on the top of the nodes denotes the cut predictor; and the numbers in parentheses represent the cut-off point (c) associated with xi (for each node, the right-hand branch is chosen if xi ≥ c, and the left-hand branch is chosen if xi < c).
Figure 12. Graphical representation of the structure of weak learner (tree) no. 200 in the trained BRT1. (Notes: the total number of nodes in the tree equals 21; a brown circle indicates a node that can be branched out, whereas a green circle indicates a terminal node (called unsplit node or leaf node) that cannot be branched out; the number inside the nodes represent node number; the number under the each node represents the size of the node (the number of observations that satisfy the condition for the node); xi on the top of the nodes denotes the cut predictor; and the numbers in parentheses represent the cut-off point (c) associated with xi (for each node, the right-hand branch is chosen if xi ≥ c, and the left-hand branch is chosen if xi < c).
Sustainability 15 04245 g012
Figure 13. Validation RMSE curves for the BRT model (with xylitol concentration as the output) as a function of Ntr and mls (α = 0.1).
Figure 13. Validation RMSE curves for the BRT model (with xylitol concentration as the output) as a function of Ntr and mls (α = 0.1).
Sustainability 15 04245 g013
Figure 14. Graphical representation of the structure of weak learner (tree) no. 80 in the trained BRT2 (see the notes in the caption of Figure 12).
Figure 14. Graphical representation of the structure of weak learner (tree) no. 80 in the trained BRT2 (see the notes in the caption of Figure 12).
Sustainability 15 04245 g014
Figure 15. Scatter plot of the measured and predicted (a) normalized flux using BRT1 and (b) xylitol concentration using the BRT2 on the entire dataset.
Figure 15. Scatter plot of the measured and predicted (a) normalized flux using BRT1 and (b) xylitol concentration using the BRT2 on the entire dataset.
Sustainability 15 04245 g015
Figure 16. Scatter plot of the measured and predicted (a) normalized flux using M(n)LR1 and (b) xylitol concentration using M(n)LR2 on the entire dataset.
Figure 16. Scatter plot of the measured and predicted (a) normalized flux using M(n)LR1 and (b) xylitol concentration using M(n)LR2 on the entire dataset.
Sustainability 15 04245 g016
Figure 17. Sensitivity analysis for the assessment of input variable importance on the estimation of (a) y1 using model BRT1 and (b) y2 using model BRT2. (Notes: the minimum, median, and maximum values of the input variables X1, X2, and X3 are (10, 55, and 100), (0.8, 1.2, and 1.6), and (0.58, 1.06, and 1.20), respectively; X1: filtration time (min); X2: transmembrane pressure (bar); X3: cross-flow velocity (cm s−1); y1: normalized flux (i.e., ratio of the permeate flux to the pure water flux); y2: xylitol concentration (g L−1); vertical dash line: y1 (and y2) value at medians of all input variables.
Figure 17. Sensitivity analysis for the assessment of input variable importance on the estimation of (a) y1 using model BRT1 and (b) y2 using model BRT2. (Notes: the minimum, median, and maximum values of the input variables X1, X2, and X3 are (10, 55, and 100), (0.8, 1.2, and 1.6), and (0.58, 1.06, and 1.20), respectively; X1: filtration time (min); X2: transmembrane pressure (bar); X3: cross-flow velocity (cm s−1); y1: normalized flux (i.e., ratio of the permeate flux to the pure water flux); y2: xylitol concentration (g L−1); vertical dash line: y1 (and y2) value at medians of all input variables.
Sustainability 15 04245 g017
Table 1. Operating conditions used in the experimental UF membrane runs [20].
Table 1. Operating conditions used in the experimental UF membrane runs [20].
CFV = 0.58 cm s−1CFV = 0.70 cm s−1CFV = 0.82 cm s−1
Exp. codeTMP
(bar)
FT
(min)
y1y2
(g L−1)
Exp. codeTMP
(bar)
FT
(min)
y1y2
(g L−1)
Exp. codeTMP
(bar)
FT
(min)
y1y2
(g L−1)
E11.2100.532115.51E21.2100.597615.66E31.2100.639715.80
200.486515.48 200.543915.61 200.562315.84
300.458615.34 300.485915.52 300.515715.70
400.435115.12 400.455315.36 400.490615.62
500.423215.00 500.447815.20 500.494015.45
600.425314.89 600.449715.07 600.476415.40
700.405414.56 700.446714.76 700.478115.37
800.396714.23 800.439714.50 800.471815.26
900.393214.00 900.438714.33 900.468415.20
1000.393713.00 1000.436514.30 1000.467315.00
CFV = 1.20 cm s−1CFV = 1.06 cm s−1TMP = 0.8 bar
Exp. codeTMP
(bar)
FT
(min)
y1y2
(g L−1)
Exp. codeTMP
(bar)
FT
(min)
y1y2
(g L−1)
Exp. codeCFV
(cm s−1)
FT
(min)
y1y2
(g L−1)
E41.2100.764316.27E51.2100.663915.97E61.06100.529615.40
200.675316.10 200.557215.71 200.482015.23
300.638116.00 300.523315.31 300.457015.00
400.592315.92 400.501815.14 400.418314.65
500.552915.85 500.480814.94 500.417614.60
600.534315.63 600.467414.63 600.393214.58
700.531915.55 700.458514.57 700.372314.56
800.530715.54 800.463414.34 800.368514.50
900.531815.40 900.459314.17 900.365214.20
1000.528715.42 1000.457213.95 1000.362213.24
TMP = 1.0 barTMP = 1.4 barTMP = 1.6 cm s−1
Exp. codeCFV
(cm s−1)
FT
(min)
y1y2
(g L−1)
Exp. codeCFV
(cm s−1)
FT
(min)
y1y2
(g L−1)
Exp. codeCFV
(cm s−1)
FT
(min)
y1y2
(g L−1)
E71.06100.594115.66E81.06100.718816.10E91.06100.783216.25
200.541215.61 200.632016.06 200.673216.00
300.516515.52 300.607215.89 300.653215.90
400.495315.36 400.553215.72 400.623815.76
500.461815.20 500.539515.60 500.584415.65
600.448215.07 600.506415.46 600.558315.63
700.437114.76 700.484115.55 700.536715.55
800.438214.50 800.459115.36 800.534715.54
900.435914.33 900.458615.20 900.532815.40
1000.438214.30 1000.457315.21 1000.532015.42
TMP: transmembrane pressure, CFV: cross-flow velocity; FT: filtration time; y1: normalized flux (i.e., ratio of the permeate flux to the pure water flux); y2: xylitol concentration.
Table 2. Performance comparison of the developed models in this study.
Table 2. Performance comparison of the developed models in this study.
Model OutputModelR2NSEUnitRMSE
Normalized fluxBRT10.99440.993710.0069
ANFIS-GP10.98450.984310.0109
MnLR10.86530.864110.0319
MLR10.81490.814510.0373
Xylitol concentrationBRT20.96690.966410.1171
ANFIS-GP20.94590.945310.1494
MnLR20.81990.819010.2719
MLR 20.75190.750710.3191
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Salehi, R.; Krishnan, S.; Nasrullah, M.; Chaiprapat, S. Using Machine Learning to Predict the Performance of a Cross-Flow Ultrafiltration Membrane in Xylose Reductase Separation. Sustainability 2023, 15, 4245. https://doi.org/10.3390/su15054245

AMA Style

Salehi R, Krishnan S, Nasrullah M, Chaiprapat S. Using Machine Learning to Predict the Performance of a Cross-Flow Ultrafiltration Membrane in Xylose Reductase Separation. Sustainability. 2023; 15(5):4245. https://doi.org/10.3390/su15054245

Chicago/Turabian Style

Salehi, Reza, Santhana Krishnan, Mohd Nasrullah, and Sumate Chaiprapat. 2023. "Using Machine Learning to Predict the Performance of a Cross-Flow Ultrafiltration Membrane in Xylose Reductase Separation" Sustainability 15, no. 5: 4245. https://doi.org/10.3390/su15054245

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop