Next Article in Journal
Numerical Research on Biomass Gasification in a Quadruple Fluidized Bed Gasifier
Previous Article in Journal
Sustainable Operations of Last Mile Logistics Based on Machine Learning Processes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Outlet Liquid Material Concentration Prediction of an Evaporation Process Based on Knowledge and Data Information

1
School of Information and Control Engineering, Liaoning Petrochemical University, Fushun 113001, China
2
Institute of Intelligence Science and Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
*
Author to whom correspondence should be addressed.
Processes 2022, 10(12), 2525; https://doi.org/10.3390/pr10122525
Submission received: 2 October 2022 / Revised: 22 November 2022 / Accepted: 25 November 2022 / Published: 28 November 2022

Abstract

:
The outlet liquid material concentration is a key production indicator to evaluate the evaporation quality and an important basis to adjust the evaporation operation parameters. However, the online concentration analyzer has strict installation conditions and high prices, and it is difficult to obtain the liquid material concentration in time. Usually, the field works perform imprecise operations according to the time delay information. In addition, the process data contain errors, which affects the accuracy and timeliness of process optimization and control. Therefore, a hybrid prediction model of concentration based on data reconciliation is presented in this paper. First, to obtain the high-quality process data, the data reconciliation method is applied for preprocessing. Moreover, the process mechanistic model is constructed by utilizing the process knowledge and the balance principle. Taking into account the volatility and nonlinearity characteristics, a data-driven model based on autoregressive integrated moving average integrated generalized autoregressive conditional heteroscedasticity is established, and then the support vector regression model is built for prediction residual optimization. Furthermore, the prediction results of the mechanistic model and the data-driven model are balanced comprehensively. Finally, an evaporation process is selected for simulation verification. The results demonstrate that the proposed hybrid prediction model has improved the prediction condition and performance.

1. Introduction

With the continuous development of China’s economy, the demand for aluminum, which is one of the important raw materials for industrial production, is also increasing [1]. At present, the bauxite resources are insufficient, the price of imported bauxite continues to rise, and environmental protection is paid increasingly more attention to by the manufacturing industry. Alumina is an important raw material for aluminum smelting, and it is of great significance to improve the quality of alumina. The evaporation process is the key process of alumina production, the outlet liquid material concentration is the production index of evaporation quality, and is also an important guide to adjust the operation parameters of the evaporation process [2]. However, due to the nonlinear characteristics of sodium aluminate solution, such as easy precipitation, high viscosity, and strong corrosiveness, it is difficult to measure the component concentration online, which is usually obtained by manual sampling and laboratory analysis [3]. It makes the acquisition of solution concentration information seriously lag and it is difficult to guide adjustment in time, affecting the timeliness of real-time optimization and control of the process. Therefore, the establishment of an outlet concentration prediction model is crucial for the whole alumina production, which is beneficial to improve the quality and yield of alumina, reducing consumption and labor intensity, and promoting the management level.
As a virtual sensing technology, soft sensors combine mathematical models, data processing, and software techniques, and obtain the estimates of process variables to be measured based on other available measurements and process parameters. They provide a feasible and economical alternative solution to physical measurement sensors that are expensive or impractical [4]. In general, soft sensor modeling methods are divided into mechanistic modeling, data-driven modeling, and hybrid modeling [5]. Mechanistic modeling is an accurate mathematical model based on the internal mechanism of the production process or the transfer mechanism of the material flow. However, the complexity and uncertainty of the industrial process lead to the lower precision of mechanistic modeling. Therefore, data-driven models emerge as the times require, making up for the deficiencies of mechanistic modeling. The data-driven approach emphasizes the relationship between input and output, and does not require detailed and accurate prior knowledge. Moreover, it only relies on the data in the process operation to achieve real-time prediction of process parameters. Nowadays, with the development of artificial intelligence and machine learning technology, there are various data-driven soft sensor modeling methods, including regression analysis methods, machine learning methods, multivariate statistics, and fuzzy logic. In addition, regression analysis methods are principal component regression (PCR) [6,7,8], multiple stepwise regression (MSR) [9,10,11], and multiple linear regression (MLR) [12,13,14]. Moreover, machine learning methods are artificial neural networks [15,16,17], extreme learning machines [18,19,20], support vector machines [21,22,23], Gaussian process regression [24,25,26], etc.
Furthermore, the data-driven soft sensor modeling has been developed and widely used in industrial production and other fields. However, the poor environment in the industrial production process leads to inaccurate information or even partial data missing, which affects the accuracy of the data-driven soft sensor model. Hence, the combination of mechanistic model and data-driven model overcomes the limitations and low adaptability of single modeling method. Hamilton et al. replaced a subset of the mechanistic model equations with the corresponding nonparametric representation to form a hybrid modeling scheme for dynamic prediction of the system. Moreover, the proposed hybrid approach performed more robust parameter estimation and improved the short-term prediction performance when the model parameters were highly uncertain [27]. Kuo et al. constructed a COVID-19 case prediction model based on county-level population, environment, and mobile data use to machine learning technology and a generalized linear model (GLM) hybrid framework. The experimental results showed that the prediction results were highly correlated with the reported daily cases and cumulative cases, and the hybrid technology combined all the modeling results by adjusting their weights, further improving the performance of the hybrid model [28]. Hu et al. proposed a hybrid method consisting of data preprocessing and wind speed prediction to improve the accuracy of short-term wind speed prediction. Moreover, based on the 12-month wind speed data of two meteorological towers in Yan’an City, the validity and accuracy of the proposed multi-step wind speed prediction method were verified. The results showed that the prediction accuracy was improved significantly and the proposed model was superior [29]. Li et al. presented a new hybrid Elman–LSTM battery remaining life prediction method by combining the empirical model decomposition algorithm with long-term memory and the Elman neural network. It collected a comprehensive battery test data set for model parameterization and performance evaluation. The results showed that the proposed hybrid Elman–LSTM model predicted the remaining service life of the battery with higher accuracy than other similar models [30]. Li and Willems proposed a spatial heterogeneity concept and logistic regression to predict flood probability. Contracted with the traditional one-dimensional hydrodynamic model, the accuracy of the flood warning obtained by the hybrid model was found to be as high as 86% [31]. Xie et al. introduced industrial production condition analysis and fuzzy expert rules, applied data reconciliation technology, and used the mechanistic model to predict outlet concentration [32]. The predicted results of the mechanistic model were compensated by KELM. Although relevant experiments showed that the proposed model achieved high prediction accuracy, the KELM prediction error compensation model itself is difficult to deal with raw concentration data with sharp fluctuations, and KELM is very sensitive to the changes of two super parameters, and the parameter adjustment process is also very complex. Wang et al. proposed a soft sensor model for evaporation process based on R-PLS to predict the concentration of sodium aluminate solution in the evaporation process of alumina production [33]. However, the single data-driven prediction model does not have good interpretability, and the prediction accuracy is also limited by the original real data conditions.
The above soft sensor modeling methods provide inspiration and encouragement for the research of this paper. However, the actual evaporation process has some industrial characteristics, and the mentioned soft sensor modeling methods have certain limitations. First of all, the detection instrument of the evaporation process is affected by the harsh environment, there are errors in the detection information, and some important parameters are not detected. The process data is inaccurate and incomplete. Then, the correlation and coupling between the production process parameters are strong, which leads to low prediction accuracy and insufficient generalization ability of the model. Furthermore, the mechanistic model fully reflects the internal structure of process operation, but how to improve the effectiveness of the mechanistic model based on assumptions is also one of the difficult problems in soft sensor modeling. Therefore, the innovations of this paper include: (1) based on the study of the data characteristics of the industrial production, the data reconciliation model is constructed to correct the data containing random errors and estimate the unmeasured parameters; (2) after the detailed analysis of the evaporation process, the process mechanistic models are developed for prediction through equilibrium principles, running states, and parameter identification; (3) the errors of the prediction results obtained by the autoregressive integrated moving average (ARIMA) and generalized autoregressive conditional heteroscedasticity (GARCH) are compensated by the support vector regression (SVR) model; (4) combining mechanistic modeling and prediction with data-driven modeling, the model adaptively adjusts the weight of prediction results calculated by different models, thus changing the final prediction results. Therefore, the proposed model can maintain good prediction accuracy whether the original data of the input model are relatively stable or fluctuates violently.

2. Process Analysis and Problem Formulation

The evaporation process is a key process of alumina production, which evaporates the excess water in the mother liquor and washing filtrate to ensure that the concentration of the sodium aluminate solution in the circulating material liquid meets the production requirements and reduces the pollution of the environment caused by the discharge of waste lye. The entire evaporation production process is described in Figure 1, which consists of multiple evaporators, preheaters, flash evaporators, and condensed water tanks. The discharge concentration mainly refers to the concentration of the alkaline substance in the solution, which is an important indicator in the evaporation process and is related to whether the dissolution process of alumina production is carried out smoothly. Due to the complex and harsh production environment, it is not suitable for online instrument detection. In the actual production process, it relies on the manual sampling, analysis, and detection, resulting in a lag in the discharge concentration detection and operation adjustment of the evaporation process. In addition, there are many parameters influencing the outlet material concentration and the coupling is serious, which increases the energy consumption and resource consumption of the whole process. Therefore, in order to obtain the discharge concentration in time and reduce the fluctuation of the discharge concentration in the evaporation process, as well as ensure the stable operation of the evaporation process and alumina production, it is of great significance for the alumina production to study the prediction model of the discharge concentration of the evaporation process.
In order to improve the effectiveness of process detection information, a new prediction modeling method based on data reconciliation is introduced in this paper, and the framework is shown in Figure 2. Considering the inaccuracy and incompleteness of process data, the data reconciliation model is applied for improving data quality, and the measurement is corrected and the unmeasured parameters are estimated. Then, the outlet concentration estimation model based on mechanistic and domain knowledge is established. Moreover, the data-driven model is established through ARIMA-GARCH, and its prediction error is compensated by SVR to reduce the time series error. In addition, the mechanistic model and the data-driven model are balanced for concentration prediction.

3. Model Structure

3.1. Data Preprocessing

In general, the measured data of the evaporation process include liquid material flow rate, liquid material temperature, and steam temperature. These measured data inevitably contain errors, so that the measured variables do not meet the material balance, heat balance, and other physical and chemical mechanism requirements of the process, and the measurement shows inaccuracy. On the other hand, the measurement data are incomplete due to the expensive installation of measurement instruments or testing, infeasible measurement technology, harsh conditions that do not allow sampling, or instrument failure. The data characteristics cause inaccurate process modeling, optimization, and control, and even lead to biased decision making. To this end, the process measurement data need to be corrected to obtain more accurate process data. Therefore, data reconciliation makes use of the relationship between material balance and heat balance in the process, the measurements are corrected, and the deviation between the reconciled data and the measured value is minimized. Moreover, the unmeasured parameters are estimated. At present, data reconciliation has been applied in chemical reaction systems [34], minerals and mental processes [35], power plants [36], and other fields.
In the evaporation production process, based on the particularity of the cascading structure for the equipment, the data reconciliation problem is described as follows combined with a layered strategy to ensure measurement redundancy [37].
{ min f 1 = i = 1 l j = 1 n f ( x j i , x ^ j i ) = i = 1 l j = 1 n ( x j i x ^ j i ) 2 σ i 2 s . t .     G mass ( X ^ , U ) = 0           x ^ L i x ^ i x ^ U i       i = 1 , 2 , , l           u L q u q u U q       q = 1 , 2 , , L
{ min f 2 = i = 1 m l j = 1 n f ( x j i , x ^ j i ) = i = 1 m l j = 1 n ( x j i x ^ j i ) 2 σ i 2 s . t .     G heat ( X ^ , U ) = 0           x ^ L i x ^ i x ^ U i       i = 1 , 2 , , m l           u L q u q u U q       q = 1 , 2 , , M L m

3.2. Mechanistic Modeling

Based on the mechanistic analysis of the evaporation process for the alumina production, the mechanistic models of each equipment in the process are established. Moreover, combined with the changing trend of liquid material and steam of the whole process, the mechanistic model of the whole evaporation process is constructed, and the outlet concentration is estimated. Due to the large number of equipment involved in the evaporation process, the correlations between the process variables are complex and the coupling is strong, so it is difficult to quantitatively describe the characteristics between the variables. Therefore, on the basis of the mass conservation and heat balance, based on the following assumptions, the mechanistic models of the complex material flow of the three typical equipment are listed, as shown in the Table 1. Similarly, the mechanistic models of other equipment can be deduced by analogy.
(1)
The variation of solute mass caused by scaling is negligible.
(2)
The steam in the process is saturated and does not contain non-condensable gas.
(3)
The liquid material or steam is distributed evenly in the heating tube.
(4)
The liquid material and steam in the preheater thoroughly mix.
Moreover, the mechanistic model of the whole evaporation process is demonstrated as
W z = F 0 ρ 0 ( 1 C 0 ρ 4 s / C 4 s ρ 0 ) = i = 1 6 V i
The concentration of the outlet liquid material is related to the moisture in the liquid material. Thus, the established mechanistic models are applied to calculate the concentration through the amount of water evaporated by each evaporator. The specific steps include:
(1)
Based on the data reconciliation model, the phenomenon of steam/water mixing, leakage, and venting is considered and the error accuracy, correlation coefficient, and the amount of the evaporated water are set.
(2)
Analyze the coupling relationship between variables, the physical parameters such as density and specific heat are calculated, and then the outlet concentration and flow rate for evaporators are obtained.
(3)
Combine the various mechanism equations simultaneously to calculate the amount of steam required by the evaporation equipment.
(4)
If the error accuracy between the calculated steam flow rate and the actual required amount of steam meets the operating requirements, the estimation stops, otherwise, the original amount of the steam used in the original calculation is replaced by the calculated amount of the steam, and go to (2).
(5)
According to the inlet liquid material concentration and the flow rate of the outlet liquid material, the outlet liquid material concentration is obtained by the total amount of the steam.

3.3. Data-Based Combined Prediction Model with Error Compensation

Due to the complex and changeable operating environment of the evaporation process, the assumptions provided by the mechanistic modeling lead to some limitations in the actual production and the deviation of the prediction results. At the same time, the production process is uncertain, and the prediction model is disturbed. Therefore, on the basis of the previous data-based modeling methods, ARIMA-GARCH is applied to construct a data-driven prediction model, and then SVR is used to compensate the prediction error and realize the combined prediction of the mechanistic model. It overcomes the defects of the mechanistic model caused by the uncertainty of the industrial process.

3.3.1. ARIMA-GARCH Data-Driven Model

In the time series model, ARIMA is divided into a moving average model, autoregressive model, and difference process. As for the autoregressive model, the moving average model pays close attention to the cumulative relationship of data errors. The p-order autoregressive model and the q-order moving average model are defined as
y t = μ + i = 1 p γ i y t i + ϵ t
y t = μ + ϵ t + i = 1 q θ i ϵ t i
The random process represented by the weighted sum of the elements for the white noise sequence becomes a moving average process. The number of parameters in the process is called the order of the moving average process. When the stationary random process has the characteristics of both autoregressive process and moving average process, it is no longer suitable to use the autoregressive model or moving average model alone. Hence, autoregressive moving average model only needs a low order to fit the actual data and thus has great practical value in prediction, which is expressed as
y t = μ + i = 1 p γ i y t i + ϵ t + i = 1 q θ i ϵ t i
For the non-stationary series, the ARIMA model extends the analysis of stationary time series to non-stationary cases. However, the time series of the traditional ARIMA model needs to meet zero mean ( E ( ε t   ) = 0 ), pure random ( Cov ( ε t   , ε t i   ) = 0 , i 1 ), and homogeneity of variance ( Var ( ε t   ) = σ t 2 ). When the time series is highly volatile, the variance cannot be kept constant. Therefore, to solve this problem, autoregression conditional heteroscedasticity (ARCH) model is developed, as shown in Equation (7). It considers the variance as a variable that changes over time, and this time-varying variance is a linear combination of the squared values of the past finite-term noise. Where v t and ε t i ( i 1 ) are independent and E ( v t ) = 0 , var ( v t ) = 1 , v t is a sequence of independent identically distributed random variables.
{ y t = a 0 + x t + ε t ε t = v t σ t σ t 2 = α 0 + i = 1 q α i ε t i 2 α 0 > 0 , α i 0 i = 1 q α i < 1 i = 1 , 2 , , q
In order to further improve the anti-disturbance prediction performance, generalized autoregression conditional heteroscedasticity (GARCH) considers that the volatility of each variable in the time series is the linear combination of squared residuals at the latest p time point, and then the sum of the linear combination of variables at the latest q time point, which is expressed as
{ y t = a 0 + x t + ε t ε t = v t σ t σ t 2 = α 0 + i = 1 q α i ε t i 2 + j = 1 p β j σ t j 2 α 0 > 0 , α i , β j 0 i = 1 q α i + j = 1 p β j < 1 i = 1 , 2 , , q , j = 1 , 2 , , p
Hence, the data-driven modeling process combined with ARIMA and GARCH is as follows:
Step 1: Stability test. The stability of the data is preliminarily judged according to the time series, autocorrelation function, and partial autocorrelation function.
Step 2: Stabilization processing. The non-stationary time series are smoothed until the processed data passes the stabilization test using the sequence diagram and unit root.
Step 3: Model order determination and model parameters estimation. The data are differentially determined for parameter d, and parameter p and q are determined according to the truncation of the correlogram.
Step 4: The ARIMA model is established, and the residual sequence is obtained to test whether there is heteroscedasticity. If so, the order of GARCH model is determined by using AIC criterion.
Step 5: Model validation. Judge whether the residual sequence is a white noise sequence, test whether the residual of the model meets the independence requirements, and verify the fitting results of the model.
Step 6: The model which has been tested is applied for prediction.

3.3.2. Error Compensation Based on SVR Model

Considering the linear and nonlinear characteristics of time series, SVR is utilized to compensate the errors of the data-driven prediction model based on ARIMA-GARCH. As an extension of support vector machine, SVR is based on the structural risk minimization to make all samples approach the regression hyperplane and minimize the total deviation between samples and hyperplane. In the SVR model, the input sample x is mapped to a high-dimensional space by the nonlinear mapping φ ( x ) , and then a linear regression model is established in the feature space, as shown in Equation (9).
f ( x ) = w T φ ( x ) + b
To find the smoothest function by minimizing the square of the vector norm, the error of the prediction result for each set of training data is less than ε . If the prediction error is greater than ε , the sensitivity loss function is used to punish, as shown in Equations (10) and (11). Where   n   represents the sample size, ξ and ξ * represent the online training error of the insensitive tube, and | y i w T φ ( x i ) b | < ε , C > 0   is the regularization factor.
min w , b , ξ ξ 1 2 w 2 + C i = 1 n ( ξ + ξ )
  s . t { y i w T φ ( x i ) b ε + ξ w T φ ( x i ) + b y i ε + ξ ξ 0 , ξ 0
| y i w T φ ( x i ) b | < ε , C > 0 is the regularization factor.
Then, the constrained optimization problem is transformed into a dual problem by using Lagrange multiplier, as shown in Equation (12).
max α i , α i i = 1 n y i ( α i α i ) ε i = 1 n ( α i + α i ) 1 2 i , j = 1 n ( α i α i ) ( α j α j ) k ( x i , x j )
s.t { i = 1 n ( α i α i ) = 0 0 α i , α i C
The quadratic programming is applied to determine constraints, and then the deviation of the best weight is calculated to obtain the prediction results, which is expressed as follows,
f ( x ) = i = 1 n ( α i α i ) k ( x i , x j ) + b
k ( x i , x j ) is a kernel function, which is RBF kernel function k ( x i , x j ) = exp ( g x i x j 2 ) .

3.3.3. ARIMA-GARCH-SVR Data-Driven Prediction Model

The operating environment of actual industrial evaporation production is complex and changeable, and is easily distributed by uncertain conditions. The process data has linear and nonlinear characteristics. Thus, the single ARIMA time series prediction model cannot accurately describe the nonlinear characteristics of the outlet liquid material concentration data series. Therefore, the linear part is modeled by the ARIMA-GARCH data-driven model, and the error sequence is optimized and compensated by the SVR model.
In detail, the ARIMA-GARCH model is used to predict the concentration time series data A t according to the actual historical concentration O t . Moreover, the residual e A t ( e A t = O t A t ) between the prediction concentration data A t and the actual concentration data O t is calculated, and the error prediction is made through the SVR model to get the predicted result of the residual sequence e S t . Finally, the residual predicted results are utilized to compensate and optimize the predicted concentration, and the data-driven prediction results H t ( H t = A t + e S t ) are received.

3.4. Hybrid Prediction Modeling Based on Mechanistic and Data-Driven Model

In the prediction modeling of the actual industrial production process, the mechanistic model can realize the interpretation of production mechanistic, but the prediction effectiveness of mechanistic modeling is limited by the uncertainty and complexity of industrial process. Moreover, the data-driven model relies on process data to enable parameter prediction. However, the large fluctuation in production conditions results in unpredictable data changes, which affects the effectiveness of data-driven predictive modeling. In order to overcome the defects of a single mechanistic prediction model depending on prior knowledge and single data-driven prediction method modeled by data information, and improve the limitations and low adaptability of single prediction model, the mechanistic model and data-driven model are combined, and the detailed process is shown in Figure 3, specifically.
Step 1: To improve the accuracy and completeness of the process data used in prediction modeling, the data reconciliation model is established to calibrate the measured variables and estimate the unmeasured parameter.
Step 2: The process mechanism is analyzed, and the mechanistic models of the production equipment and the industrial process are set up through domain knowledge and balance principle. The outlet liquid material concentration is calculated by the mass of water produced by evaporating one ton of liquid material in the mechanism model.
Step 3: A data-driven prediction method based on autoregressive moving average model and generalized autoregressive conditional heteroscedastic model is constructed, and the residual of data-driven prediction is optimized and compensated based on support vector regression model.
Step 4: In order to balance the prediction results of the mechanistic model and the data-driven model, the reciprocal variance method is applied to give a small weight to the prediction results with large fluctuations, and vice versa. The weight coefficient is determined as follows, where ω i is the weight of the ith prediction model, E i is the prediction variance of the ith single prediction model, m is the total number of models.
{ ω i = E i 1 i = 1 m E i 1 i = 1 m ω i = 1 ω i 0 i = 1 , 2 , 3 , , m
Moreover, the prediction results M t of the mechanistic model and the prediction results D t of the data-driven model are given corresponding weights ( ω 1 , ω 2 ), and the final prediction results F t are combined effectively.
F t = ω 1 × M t + ω 2 × D t

4. Industrial Application

4.1. Data and Model Parameter Description

In this paper, to validate the feasibility of the proposed hybrid concentration prediction modeling method, 1500 concentration samples of the sodium aluminate solution evaporation process of an alumina production in China were selected for industrial application verification. The first 1200 samples are used as the training set, and the last 300 samples are used as the test set for concentration prediction. The data reconciliation model as an optimization model is solved by the state transition algorithm (STA) [38]. The relevant programs of the experiment are programmed and run in MATLAB R2020b. The processor of the computer is Intel Core i5-7300HQ (2.5GHz/L3 6M) for the Hewlett-Packard, Beijing, China. Moreover, BP neural network, kernel function extreme learning machine (KELM), long short-term memory (LSTM), and support vector regression (SVR) model are used for a comparison experiment to demonstrate the performance of the proposed prediction model. For the proposed prediction model, the ARIMA (4,1,6) and GARCH (1,1) are determined, the weight of mechanistic modeling is 0.772, and the weight of data-driven modeling is 0.228 when data fluctuation is serious. By comparison, the weight is 0.4323 for mechanistic modeling and the weight is 0.5677 for data-driven modeling when the production condition is relatively smooth. Moreover, the number of hidden layers of BP neural network is set to 6 and 8. The regularization parameter and kernel function parameter of KELM model are determined to be 8555.2 and 121.2675. The number of hidden layer units in LSTM model is 200, the maximum number of iterations is 250. The learning rate is used to reduce, the number of descending iterations is 125, and the decline factor is set to 0.2.

4.2. Data Preprocessing

Data reconciliation is introduced to preprocess production data in the proposed prediction model. Besides the standard deviation of the measured data, the standard deviation of reconciled data and the relative standard deviation of the 34 measured variables in the evaporation process are used to verify that the quality of the processed data has improved, as shown in Figure 4. Obviously, for 34 measured variables, the reconciled standard deviation is lower than the measured standard deviation to varying degrees, indicating that the fluctuation range of reconciled data around the real value is smaller and closer to the real value. In particular, for the second and the third flash evaporator, the relative standard deviation of the liquid material temperature, the secondary steam temperature and the condensate water temperature are larger, and the reconciliation effect is more obvious, which can ensure the accuracy of prediction modeling.

4.3. Comparison of Prediction Results of Different Models

From Figure 5, it is obvious that the change trend of the proposed hybrid prediction model is more consistent with the change trend of the actual concentration than that of mechanistic model and other concentration prediction models. In addition, the prediction results of other prediction models have a large deviation, and the most prediction results of the proposed prediction method match well with the actual concentration. The prediction performance is significantly improved.
The prediction relative errors of different prediction models are compared in Figure 6. It can be seen that the curve obtained from the proposed prediction model is more stable than other curves, indicating that the relative error of the proposed prediction method is the smallest. Moreover, the relative error of the proposed hybrid prediction method does not increase suddenly at a certain point, which shows that the proposed prediction method has good stability and can more accurately predict the liquid material concentration with uncertain fluctuations.
Figure 7 shows the prediction error distribution of different prediction models. In Figure 7, the prediction error of the proposed prediction method has a wider distribution within a small error range than the prediction results of the mechanistic model and other prediction models. It indicates that the data preprocessing with data reconciliation and the combination of the data-driven prediction results by using SVR to compensate the prediction error of the ARIMA-GARCH and the mechanistic model estimation results based on domain knowledge can improve the reliability of prediction results. In addition, when the data fluctuate smoothly, the prediction error distribution of different prediction models is shown in Figure 8. The error distribution of the proposed model is also more balanced than the other models. In the case of uncertain data fluctuation, the proposed hybrid prediction modeling method has better feasibility.

4.4. Evaluating Indicator and Analysis

In addition, to further evaluate the feasibility and accuracy of the proposed hybrid prediction model, three quantitative performance indicators are used, namely root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE), which are defined by Equations (17)–(19). The comparison results of different prediction models under different production conditions are shown in Table 2 and Table 3.
RSME = i = 1 N ( y i y i ) 2 N
MAE = i = 1 N | y i y i | N
MAPE = 1 N i = 1 N | y i y i y i | × 100 %
where y i is the predicted value of the ith output and y i is the true value of the ith output.
From Table 2, it can be seen that the proposed prediction model is the smallest in RMSE, MAE, and MAPE. In detail, compared with the LSTM model and the SVR model, the RMSE calculated by the proposed prediction model decreases by 54.46% and 53.11%, respectively. In addition, compared with the BP model and the KELM model, the MAE decreases by 48.46% and 49.45%, respectively. Moreover, for other prediction models, the MAPE calculated by the proposed model is reduced to varying degrees. It indicates that the proposed prediction model is obviously superior to other models, and the prediction accuracy of the proposed model is high and the actual concentration can be tracked well. However, when the data fluctuate smoothly, Table 3 shows that the mechanistic model performs worse on all three metrics compared to the other four data-driven models. The production condition is relatively stable and the predictions of the data-driven modeling results are less volatile, fluctuating around the mean overall, while less volatility also implies less error since the data itself has less range. In any case, the predictions of the proposed model are not only more consistent with the variation of the actual concentration values, but also outperform the predictions of the other models for different metrics. Hence, in the case of unforeseeable production conditions, the hybrid prediction model proposed in this paper can deal with different data characteristics under uncertain production conditions and maintain a satisfactory prediction effect.
Furthermore, it can be seen from Figure 9 and Figure 10 that compared with other independent data-driven prediction models under different production conditions, the hybrid model has great improvement in both MAPE and RMSE indicators. However, in Figure 9, compared with a single mechanistic prediction model, the improvement effect is not obvious. Because the actual production conditions are complex, the distribution of the collected actual concentration data is uncertain (the variance, mean value, and other indicators will change with time), sometimes it is relatively stable, and sometimes it may fluctuate violently. When data fluctuation is serious, it is difficult for data-driven modeling methods to obtain qualified prediction results. Considering this, the mechanistic model is combined with data-driven model to form an adaptive prediction model. The corresponding weight is given by the model according to the prediction error of the two models and then they are combined. Nevertheless, for Figure 10, when the data are stable, the data-driven model learns linear and nonlinear relationships from the historical data and is used for prediction. The results are also given a larger weight, and the final prediction results are closer to the prediction results of the data-driven modeling method. When the data fluctuate violently, the prediction result of the mechanistic model is given a greater weight, and the final results are closer to the prediction result of the mechanistic model. It can be well illustrated that the proposed method combines the advantages of both mechanistic model and data-driven model, which not only better match the trend of the actual concentration values, but also have less errors. Better prediction results can be maintained whether dealing with highly fluctuating or relatively flat data.

5. Conclusions

In this paper, for the evaporation process, a hybrid prediction modeling method has been established to realize online outlet liquid material concentration detection. The data reconciliation was firstly adopted to preprocess actual measured data for improving the measurement quality. Then, the mechanistic models of production equipment and evaporation process were constructed according to the domain knowledge. Moreover, through the established ARIMA-GARCH model to extract the linear information, the SVR model was applied for nonlinear error compensation. Furthermore, the estimated results from the mechanistic model and the prediction results from the data-driven model were weighted for concentration prediction. Finally, the experiments have been carried out to prove that the proposed prediction model is better than other prediction models in all aspects. It is more accurate, more sensitive to the change of outlet liquid material concentration, and can better track the change trend of concentration. It is an effective prediction modeling method for outlet liquid material concentration with a wider scope of application.

Author Contributions

Conceptualization, S.X. and X.J.; methodology, S.X. and Y.H.; software, Y.H.; validation, Y.H. and X.J.; writing—original draft preparation, Y.H.; writing—review and editing, S.X.; funding acquisition, S.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China (61903257), Post-doctoral Later-stage Foundation Project of Shenzhen Polytechnic (6022271004K) and Scientific Research Project of Shenzhen Polytechnic (6022312036K).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

mMeasured variables
lMeasurement variables for mass balance
m-lMeasurement variables for heat balance
x j i Measured data for the ith measured variable
x ^ j i Reconciled data for the ith measured variable
σ i Standard deviation
U Unmeasured variables
G Constraint function
x ^ U i Upper bound for reconciled variables
x ^ L i Lower bounds for reconciled variables
u U q Upper bound for unmeasured variables
u L q Lower bounds for unmeasured variables
T Liquid material temperature for the evaporator
T s Liquid material temperature for the flash evaporator
c p Specific heat for the evaporator
c p s Specific heat for the flash evaporator
F Liquid material flow rate for the evaporator
F s Liquid material flow rate for the flash evaporator
ρ Liquid material density for the evaporator
ρ s Liquid material density for the flash evaporator
T n Condensate water temperature
V Secondary steam flow rate for the evaporator
V j s Secondary steam flow rate for the flash evaporator
H Steam enthalpy for the evaporator
H j s Steam enthalpy for the flash evaporator
F 0 Feed flow rate
c p w Water specific heat
V 0 Flow rate for live steam
H 0 Enthalpy
T V i Secondary steam temperature
A i Area of heat transfer
k i Coefficient about heat transfer
Q l o s s Heat loss
y t Current data
y t i Historical data
μ Constant term
θ i Moving average coefficient
γ i Autocorrelation coefficient
ϵ t Order of white noise
p Order of autoregressive model
q Order of moving average model
σ t Volatility
α Coefficient
β Coefficient
f ( x ) Output of the prediction
φ ( x ) Function of feature
ω Weight vector
b Offset value
α i , j * Lagrange multipliers
E i Prediction variance of the ith single prediction model
ω i Weight of the ith prediction model

References

  1. Li, Q.; Dai, T.; Gao, T.; Zhong, W.; Wen, B.; Li, T.; Zhou, Y. Aluminum Material Flow Analysis for Production, Consumption, and Trade in China from 2008 to 2017. J. Clean. Prod. 2021, 296, 126444. [Google Scholar] [CrossRef]
  2. Chai, Q.Q.; Yang, C.H.; Teo, K.L.; Gui, W.H. Optimal Control of an Industrial-Scale Evaporation Process: Sodium Aluminate Solution. Control Eng. Pract. 2012, 20, 618–628. [Google Scholar] [CrossRef]
  3. Wang, W.; Wang, D. Prediction of Component Concentrations in Sodium Aluminate Liquor Using Stochastic Configuration Networks. Neural Comput. Appl. 2020, 32, 13625–13638. [Google Scholar] [CrossRef]
  4. Yan, W.; Tang, D.; Lin, Y. A Data-Driven Soft Sensor Modeling Method Based on Deep Learning and Its Application. IEEE Trans. Ind. Electron. 2017, 64, 4237–4245. [Google Scholar] [CrossRef]
  5. Hu, X.; Shi, L.; Lin, G.; Lin, L. Comparison of Physical-Based, Data-Driven and Hybrid Modeling Approaches for Evapotranspiration Estimation. J. Hydrol. 2021, 601, 126592. [Google Scholar] [CrossRef]
  6. Febrero-Bande, M.; Galeano, P.; González-Manteiga, W. Functional Principal Component Regression and Functional Partial Least-Squares Regression: An Overview and a Comparative Study. Int. Stat. Rev. 2017, 85, 61–83. [Google Scholar] [CrossRef]
  7. Tao, Y.; Shi, H.; Song, B.; Tan, S. Parallel Quality-Related Dynamic Principal Component Regression Method for Chemical Process Monitoring. J. Process Contr. 2019, 73, 33–45. [Google Scholar] [CrossRef]
  8. Chen, M.; Luo, Y.; Shen, Y.; Han, Z.; Cui, Y. Driving Force Analysis of Irrigation Water Consumption Using Principal Component Regression Analysis. Agr. Water Manag. 2020, 234, 106089. [Google Scholar] [CrossRef]
  9. Smith, G. Step Away from Stepwise. J. Big Data 2018, 5, 32. [Google Scholar] [CrossRef]
  10. Liu, Y.; Heuvelink, G.B.M.; Bai, Z.; He, P.; Xu, X.; Ding, W.; Huang, S. Analysis of Spatio-Temporal Variation of Crop Yield in China Using Stepwise Multiple Linear Regression. Field Crop. Res. 2021, 264, 108098. [Google Scholar] [CrossRef]
  11. Trivedi, A. Logistics Management Awareness and the Implementation of Restaurant Business: An Application of Stepwise Multiple Regression. Asian Adm. Manage. Rev. 2018, 1, 1–8. [Google Scholar] [CrossRef]
  12. Pandey, M.; Zakwan, M.; Sharma, P.K.; Ahmad, Z. Multiple Linear Regression and Genetic Algorithm Approaches to Predict Temporal Scour Depth near Circular Pier in Non-Cohesive Sediment. ISH J. Hydr. Eng. 2020, 26, 96–103. [Google Scholar] [CrossRef]
  13. DeForest, D.K.; Brix, K.V.; Tear, L.M.; Adams, W.J. Multiple Linear Regression Models for Predicting Chronic Aluminum Toxicity to Freshwater Aquatic Organisms and Developing Water Quality Guidelines. Environ. Toxicol. Chem. 2018, 37, 80–90. [Google Scholar] [CrossRef] [PubMed]
  14. Khademi, F.; Akbari, M.; Jamal, S.M.; Nikoo, M. Multiple Linear Regression, Artificial Neural Network, and Fuzzy Logic Prediction of 28 Days Compressive Strength of Concrete. Front. Struct. Civ. Eng. 2017, 11, 90–99. [Google Scholar] [CrossRef]
  15. Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-Art in Artificial Neural Network Applications: A Survey. Heliyon 2018, 4, e00938. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Ti, Z.; Deng, X.W.; Zhang, M. Artificial Neural Networks Based Wake Model for Power Prediction of Wind Farm. Renew. Energy 2021, 172, 618–631. [Google Scholar] [CrossRef]
  17. Van Hung, T.; Alkhamis, H.H.; Alrefaei, A.F.; Sohret, Y.; Brindhadevi, K. Prediction of Emission Characteristics of a Diesel Engine Using Experimental and Artificial Neural Networks. Appl. Nanosci. 2021. [Google Scholar] [CrossRef]
  18. Yaseen, Z.M.; Sulaiman, S.O.; Deo, R.C.; Chau, K.-W. An Enhanced Extreme Learning Machine Model for River Flow Forecasting: State-of-the-Art, Practical Applications in Water Resource Engineering Area and Future Research Direction. J. Hydrol. 2019, 569, 387–408. [Google Scholar] [CrossRef]
  19. Pan, Z.; Meng, Z.; Chen, Z.; Gao, W.; Shi, Y. A Two-Stage Method Based on Extreme Learning Machine for Predicting the Remaining Useful Life of Rolling-Element Bearings. Mech. Syst. Signal Process. 2020, 144, 106899. [Google Scholar] [CrossRef]
  20. Fayaz, M.; Kim, D. A Prediction Methodology of Energy Consumption Based on Deep Extreme Learning Machine and Comparative Analysis in Residential Buildings. Electronics 2018, 7, 222. [Google Scholar] [CrossRef]
  21. Mehraein, I.; Riahi, S. The QSPR Models to Predict the Solubility of CO2 in Ionic Liquids Based on Least-Squares Support Vector Machines and Genetic Algorithm-Multi Linear Regression. J. Mol. Liq. 2017, 225, 521–530. [Google Scholar] [CrossRef]
  22. Deiss, L.; Margenot, A.J.; Culman, S.W.; Demyan, M.S. Tuning Support Vector Machines Regression Models Improves Prediction Accuracy of Soil Properties in MIR Spectroscopy. Geoderma 2020, 365, 114227. [Google Scholar] [CrossRef]
  23. Battineni, G.; Chintalapudi, N.; Amenta, F. Machine Learning in Medicine: Performance Calculation of Dementia Prediction by Support Vector Machines (SVM). Infor. Med. Unlock. 2019, 16, 100200. [Google Scholar] [CrossRef]
  24. Schulz, E.; Speekenbrink, M.; Krause, A. A Tutorial on Gaussian Process Regression: Modelling, Exploring, and Exploiting Functions. J. Math. Psychol. 2018, 85, 1–16. [Google Scholar] [CrossRef]
  25. Hewing, L.; Kabzan, J.; Zeilinger, M.N. Cautious Model Predictive Control Using Gaussian Process Regression. IEEE Trans. Contr. Syst. Tech. 2020, 28, 2736–2743. [Google Scholar] [CrossRef] [Green Version]
  26. Kong, D.; Chen, Y.; Li, N. Gaussian Process Regression for Tool Wear Prediction. Mech. Syst. Signal Process. 2018, 104, 556–574. [Google Scholar] [CrossRef]
  27. Hamilton, F.; Lloyd, A.L.; Flores, K.B. Hybrid Modeling and Prediction of Dynamical Systems. PLOS Comput. Biol. 2017, 13, e1005655. [Google Scholar] [CrossRef] [Green Version]
  28. Kuo, C.-P.; Fu, J.S. Evaluating the Impact of Mobility on COVID-19 Pandemic with Machine Learning Hybrid Predictions. Sci. Total Environ. 2021, 758, 144151. [Google Scholar] [CrossRef]
  29. Hu, W.; Yang, Q.; Chen, H.-P.; Yuan, Z.; Li, C.; Shao, S.; Zhang, J. New Hybrid Approach for Short-Term Wind Speed Predictions Based on Preprocessing Algorithm and Optimization Theory. Renew. Energy 2021, 179, 2174–2186. [Google Scholar] [CrossRef]
  30. Li, X.; Zhang, L.; Wang, Z.; Dong, P. Remaining Useful Life Prediction for Lithium-ion Batteries Based on a Hybrid Model Combining the Long Short-term Memory and Elman Neural Networks. J. Energy Storage 2019, 21, 510–518. [Google Scholar] [CrossRef]
  31. Li, X.; Willems, P. A Hybrid Model for Fast and Probabilistic Urban Pluvial Flood Prediction. Water Resour. Res. 2020, 56, e2019WR025128. [Google Scholar] [CrossRef]
  32. Da Cunha, A.S.; Peixoto, F.C.; Prata, D.M. Robust data reconciliation in chemical reactors. Comput. Chem. Eng. 2021, 145, 107170. [Google Scholar] [CrossRef]
  33. Xie, S.; Wang, H.; Peng, J. A hybrid prediction model of recycled sodium aluminate solution concentration in evaporation process. IEEE Trans. Instrum. Meas. 2021, 70, 1–13. [Google Scholar] [CrossRef]
  34. Wang, Y.; Ding, J.; Chai, T. Soft-sensor for alkaline solution concentration of evaporation process. In Proceedings of the 2008 7th World Congress on Intelligent Control and Automation, Chongqing, China, 25–27 June 2008; pp. 3476–3480. [Google Scholar]
  35. Vasebi, A.; Poulin, É.; Hodouin, D. Selecting proper uncertainty model for steady-state data reconciliation-Application to mineral and metal processing industries. Miner. Eng. 2014, 65, 130–144. [Google Scholar] [CrossRef]
  36. Guo, S.S.; Liu, P.; Li, Z. Data reconciliation for the overall thermal system of a steam turbine power plant. Appl. Energy 2016, 165, 1037–1051. [Google Scholar] [CrossRef]
  37. Xie, S.; Yang, C.H.; Yuan, X.F.; Wang, X.L.; Xie, Y.F. A novel robust data reconciliation method for industrial processes. Contr. Eng. Pract. 2019, 83, 203–212. [Google Scholar] [CrossRef]
  38. Zhou, X.J.; Yang, C.H.; Gui, W.H. Nonlinear system identification and control using state transition algorithm. Appl. Math. Comput. 2014, 226, 169–179. [Google Scholar] [CrossRef]
Figure 1. The sodium aluminate solution evaporation process in alumina production.
Figure 1. The sodium aluminate solution evaporation process in alumina production.
Processes 10 02525 g001
Figure 2. The structure of the new prediction modeling method.
Figure 2. The structure of the new prediction modeling method.
Processes 10 02525 g002
Figure 3. The ARIMA-GARCH-SVR data-driven prediction process.
Figure 3. The ARIMA-GARCH-SVR data-driven prediction process.
Processes 10 02525 g003
Figure 4. The comparison of data reconciliation: (a) the standard deviation of the reconciled data and the measured data; (b) the relative standard deviation of measured variables.
Figure 4. The comparison of data reconciliation: (a) the standard deviation of the reconciled data and the measured data; (b) the relative standard deviation of measured variables.
Processes 10 02525 g004
Figure 5. Comparison of actual concentration and predicted concentration for different models when the data fluctuate greatly.
Figure 5. Comparison of actual concentration and predicted concentration for different models when the data fluctuate greatly.
Processes 10 02525 g005
Figure 6. Comparison relative error of different prediction models when the data fluctuate greatly.
Figure 6. Comparison relative error of different prediction models when the data fluctuate greatly.
Processes 10 02525 g006
Figure 7. Error distribution of different prediction models when the data fluctuate greatly.
Figure 7. Error distribution of different prediction models when the data fluctuate greatly.
Processes 10 02525 g007
Figure 8. Error distribution of different prediction models when the data fluctuate smoothly.
Figure 8. Error distribution of different prediction models when the data fluctuate smoothly.
Processes 10 02525 g008
Figure 9. Results for different prediction models, including RMSE and MAPE values when the data fluctuate greatly.
Figure 9. Results for different prediction models, including RMSE and MAPE values when the data fluctuate greatly.
Processes 10 02525 g009
Figure 10. Results for different prediction models, including RMSE and MAPE values when the data fluctuate smoothly.
Figure 10. Results for different prediction models, including RMSE and MAPE values when the data fluctuate smoothly.
Processes 10 02525 g010
Table 1. The mechanistic models of the typical equipment.
Table 1. The mechanistic models of the typical equipment.
EquipmentMechanistic Model
The sixth evaporator F 01 ρ 0 = F i ρ i + V i
F 01 C 0 = F i C i
Q i = k i A i ( T V i T i )         = F 01 ρ 0 c p 0 T 0 + V i 1 H i 1 F i ρ i c p i T i V i H i V i 1 T n i c p w Q i l o s s ,   i = 6
The fifth evaporator F 02 ρ 0 + F i + 1 ρ i + 1 + V i 1 s = F i ρ i + V i
F i C i = F 02 C 0 + F i + 1 C i + 1
Q i = k i A i ( T V i T i )         = F 02 ρ 0 c p 0 T 0 + F i + 1 ρ i + 1 c p i + 1 T i + 1 + V i 1 H i 1 + V i 1 s H i 1 s             F i ρ i c p i T i V i H i V i 1 T n i c p w Q i l o s s , i = 5
The first flash evaporator F i ρ i = F j s ρ j s + V j s
F j s C j s = F 0 C 0
F i ρ i c p i T i = F j s ρ j s c p j s T j s + V j s H j s + Q j l o s s
T j s = T V j s + Δ V j s + 1 ,   i = 1 , j = 1
Table 2. Comparison of prediction performance indicators for different prediction models when the data fluctuate greatly.
Table 2. Comparison of prediction performance indicators for different prediction models when the data fluctuate greatly.
Prediction ModelRMSEMAEMAPE
Proposed model2.81782.3720.9889
Mechanistic model2.98752.66511.1273
BP model5.43974.60171.9189
KELM model5.49484.69191.956
LSTM model6.18795.02112.102
SVR model6.00975.02112.0933
Table 3. Comparison of prediction performance indicators for different prediction models when the data fluctuate smoothly.
Table 3. Comparison of prediction performance indicators for different prediction models when the data fluctuate smoothly.
Prediction ModelRMSEMAEMAPE
Proposed model1.30681.04340.3608
Mechanistic model2.69722.42761.0139
BP model1.84181.56550.7122
KELM model1.71381.45400.6652
LSTM model2.03751.67950.7591
SVR model1.83961.51570.6909
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Hua, Y.; Jin, X.; Xie, S. Outlet Liquid Material Concentration Prediction of an Evaporation Process Based on Knowledge and Data Information. Processes 2022, 10, 2525. https://doi.org/10.3390/pr10122525

AMA Style

Hua Y, Jin X, Xie S. Outlet Liquid Material Concentration Prediction of an Evaporation Process Based on Knowledge and Data Information. Processes. 2022; 10(12):2525. https://doi.org/10.3390/pr10122525

Chicago/Turabian Style

Hua, Yuyang, Xin Jin, and Sen Xie. 2022. "Outlet Liquid Material Concentration Prediction of an Evaporation Process Based on Knowledge and Data Information" Processes 10, no. 12: 2525. https://doi.org/10.3390/pr10122525

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop