1. Introduction
The energy structure characteristics of “rich coal, poor oil and less gas” in China make coal resources the main energy resource [
1]. In fact, coal is acknowledged worldwide as one of the most important resources for human beings and chemical raw materials for industry, economy, medicine and daily life [
2]. In coal production, a considerable amount of coal gangue is inevitably mixed into the raw coal, which generally accounts for 15~20% of the raw coal output. Coal gangue is a kind of solid waste rich in sulfur and a large number of heavy metals, including arsenic, cadmium, chromium, copper, tribute, etc. It is mainly composed of Al
2O
3 and SiO
2 [
3]. Coal gangue has the characteristics of high density and low combustion value. As a result, the mixed combustion of coal gangue and coal will reduce its combustion efficiency and increase the emission of exhaust gas. Hence, it is necessary and crucial to explore scientific and proper ways to make good use of coal gangue [
4,
5,
6,
7,
8,
9,
10,
11].
Nowadays, coal gangue is no longer considered a useless byproduct due to the development of carbon-based waste recycling technologies and the energy industry. Coal gangue can be used in many fields and in various ways, such as building materials, functional filler, power generation, etc. The possible application fields of coal gangue are generally determined by its calorific value. Coal gangue with a high calorific value, which is set between 6270 and 12550 kJ/kg, is used to generate electrical power with a fluidized-bed roaster or recirculating fluidized-bed boiler. Coal gangue with medium calorific value (2090~6270 kJ/kg) and low calorific value (less than 2090 kJ/kg) can be used for making brick and cement and for land rehabilitation and functional fillers, respectively. Therefore, it is necessary and important to identify the calorific value of coal gangue before determining which fields it should be used in [
12,
13,
14].
Generally, the calorific value of coal or gangue is measured by following the Standard GBT 213-2008 in China, which is expensive, complicated and low efficiency [
15]. Hence, it is necessary to establish a reliable, convenient and efficient approach to provide the calorific value of coal gangue. According to the investigation of published literature, it was found that previous research on coal gangue mainly focused on the identification of coal and the separation of coal from gangue. Furthermore, there is rarely literature related to how to obtain the calorific value of coal gangue conveniently and rapidly. Thus, the research status on coal gangue in China has not kept up with carbon peaking and carbon neutrality goals [
16,
17].
Therefore, this paper proposes a novel calorific value forecasting model combining support vector regression with a genetic algorithm. Considering the nonlinear mapping relationship between features of coal gangue and calorific value, a hybrid kernel function is adopted to promote the exploration and exploitation of the forecasting model, while a genetic algorithm is adopted to tune critical parameters to obtain reliable and accurate results. This method can output the calorific value of coal gangue simply and correctly without implementing any measurements.
The remainder of the paper is organized as follows: methodologies employed in this paper are briefly described in
Section 2.
Section 3 presents the calorific value forecasting procedure.
Section 4 presents the experimental results and comparative analysis. Finally,
Section 5 summarizes the overall contributions and discusses the future direction of the research.
3. Procedure for Forecasting Using Proposed Regression
To establish an efficient and accurate forecasting model, it is necessary and important to measure basic characteristics of coal gangue related closely to calorific value. Afterward, a dataset composed of those raw measuring data will be used to train and test the forecasting model. In order to establish a reliable and accurate forecasting model, two steps are needed: (1) data preparing and preprocessing, and (2) model training and verification of the forecasting model.
3.1. Data Preparing and Preprocessing
There is great diversity in characteristics of coal gangue from different coal mines and different regions. Even for coal gangue from an identical coal mine but different coal seam may produce large differences in fundamental characteristics. Therefore, it is impossible to gather all kinds of coal gangues and measure basic characteristics. In this paper, more than 1000 pieces of coal gangue samples collected from several major coal mines in China are investigated, and the fundamental characteristics obtained are applied to build forecasting models.
Since basic characteristics of coal gangue, such as air drying base moisture (
Mad), air drying base ash (
Aad), air drying base volatiles (
Vad) and air drying base fixed carbon (
FCad), are recognized as crucial features of the air drying base bomb calorific value (
Qb,ad), measurement experiments were carried out to gain those parameters. The experimental instruction strictly followed Standard GBT 213-2008 and Standard GBT 212-2008 to guide the measuring process. The typical tools used in this study include balance, drier, Muffle furnace and calorimeter, which are shown below in
Figure 5. Among them, ‘balance’ (
Figure 5a) was used to weigh 1 g of coal gangue for the measuring experiment. A muffle furnace (
Figure 5b) can be used to measure air drying base ash, air drying base volatiles and air drying base fixed carbon. The drier (
Figure 5c) was applied to obtain air drying base moisture. The calorimeter (
Figure 5d) was used to measure the calorific value of coal gangue.
Considering the repeatability and tiny difference among obtained samples, 750 measured sample results out of over one thousand results composed of
Mad,
Aad,
Vad,
FCad and
Qb,ad were used to establish the coal gangue dataset. The partially measured samples gained are shown in
Table 1 below.
It can be seen from
Table 1 that there is a large range of changes among different features. Thus, these feature data are normalized in the range of [0,1] to reduce estimation error, promote calculation speed and improve generalization. The raw data were normalized with Equation (11)
where
xi and
xn represent the data before and after normalization, respectively;
xmax and
xmin are the maximum and minimum of the raw testing results.
3.2. Training and Testing of the Forecasting Model
The normalized coal gangue samples were separated into a training set and a testing set. Then, hybrid kernel functions and support vector regression were used to establish the calorific value prediction model. Then, GA was implemented in this paper to tune the critical parameters of SVR and kernels to promote forecasting accuracy and generalization ability. The details of establishing a calorific value forecasting model with GA are briefly described below:
Critical parameters of GA, including the size of the population, crossover possibility, mutation possibility and maximum iteration number, are predefined at the beginning. Then, the initial value of the chromosome, which is composed of penalty factor
C, kernel bandwidth σ, intensive loss parameter ε, adjustable weight
μ, etc., is set randomly. In addition, the real code method was adopted in this study to encode all parameters as it is ideal for solving complicated issues and convenient for deploying genetic operators to individuals. The ranges of all free parameters are shown in
Table 2.
The fitness function is generally applied to evaluate the performance of each individual. The 5-fold cross-validation method was adopted to assess the forecasting accuracy in this study. Mean absolute percentage error (MAPE) and squared correlation coefficient (
r2) were employed as the fitness function to estimate the quality of each individual and assess prediction performance. In general, the smaller value of the MAPE, the higher the forecasting accuracy, while the greater the value of
r2, the better the prediction performance. In addition, absolute percentage error (APE) is also used to estimate the quality of forecasting results. APE, MAPE and
r2 can be computed as follows:
where
xi is the training data, and
yi and
f(
xi) represent the actual value and forecasting value provided by the established model.
n stands for the size of the sample set.
A potential solution with better fitness has a higher probability of being chosen to reproduce offspring by genetic operators. Roulette wheel selection, arithmetical crossover and uniform mutation methods were adopted to generate new offspring. A flowchart of the optimization process is shown in
Figure 6. At last, the optimal solution was applied to build the optimal forecasting model, and the index MAPE and
r2 were used to estimate the forecasting performance of the proposed model.
4. Discussion
The 750 coal gangue samples established in
Section 3.1 were used to create a forecasting model. The data distributions of those features are displayed in
Figure 7. It can be observed from
Figure 7 that features of coal gangue vary within a large range, and hence, it is necessary to implement normalization to improve generalization performance and decrease computational error. After normalization, 80% of all samples were employed for the training set, and the rest of the samples (or 150 samples) were used to test the predicting performance of the established model. To be specific, features including
Mad,
Aad,
Vad and
FCad were set as inputs, and the air drying base bomb calorific value (
Qb,ad) was employed as the output. Then, the forecasting performances of SVR models established on the single kernel function were compared to obtain proper kernel functions to establish the forecasting model with hybrid kernel functions. It should be noted that the proposed approaches were tested experimentally in the MATLAB (R2016) environment with the help of the LIBSVM toolbox.
All local kernel functions and global kernel functions have pros and cons. Therefore, there is no obligation to use specific types of kernel functions to establish forecasting models. Firstly, the forecasting performance of each kernel function was tested and compared. The default value for each kernel was adopted in this experiment. The experimental results of the training set are shown in
Table 3.
It can be found from
Table 3 that between global kernels, the linear kernel function has better forecasting accuracy (MAPE) and depiction ability (
r2) than that of the polynomial kernel function. For local kernels, the Gaussian kernel function is able to offer better forecasting performance (MAPE and
r2) than that of the Sigmoid kernel function. In addition, the liner kernel and Gaussian kernel-based forecasting model can predict the calorific value of coal gangue better than that of other models based on other kernels, which can be seen in
Table 4 and
Figure 8 and
Figure 9.
According to the obtained results based on the training set and testing set, which are shown in
Table 4 and
Figure 8 and
Figure 9, it can be observed that all indexes of the linear kernel and Gaussian kernel used for assessing the forecasting performance are better than that of polynomial and Sigmoid kernel functions. Thus, the linear kernel function and Gaussian kernel function were selected to build the hybrid kernel function to further improve forecasting accuracy. Afterward, GA was introduced to tune the critical kernel parameters of ε-SVR, including penalty factor C, kernel bandwidth σ, intensive loss parameter ε and adjustable weight
μ. The changing range for each parameter is shown in
Table 2. In addition, five-fold cross-validation was adopted to assess the fitness for selecting the optimum choice among the candidate solutions. In order to reduce the randomness of the final results, the numerical experiments of each model were conducted 20 times. The optimal parameters were set as
C = 16.98, σ = 1.35, ε = 0.02 and
μ = 0.79; the actual values and forecasted calorific values (
Qb,ad) for testing set are shown as
Figure 10.
According to
Table 5 and
Figure 10 and
Figure 11, it can be found that the forecasting model based on the hybrid kernel function has higher accuracy on both the training set and testing set than that of models based on a single kernel function. To be specific, the average MAPE decreased by 57.37% and 44.64% for the training set and testing set (compared with Gaussian kernel) when the hybrid kernel function was adopted, while the average squared correlation coefficient increased by 1.47% and 1.31% for the training set and testing set (compared with Gaussian kernel), respectively. It can be observed from
Figure 11 that only 5.33% of all APE (only 8 samples) are higher than 0.1, and 30.67% of all APE (46 samples) are higher than 0.05, which suggests that the proposed method is capable of providing accurate forecasting values with minor errors. In summary, the support vector regression model with a hybrid kernel function and genetic algorithms is able to supply a desirable performance and impressive depiction ability on calorific value forecasting of coal gangue.
Moreover, the forecasting performance of the HKF-SVR with other conventional methods (including generalized regression neural network and radial basis function neural network) are compared to further demonstrate the superiority of the proposed approach. The forecasted calorific values (
Qb,ad) by RBFNN, GRNN and HKF-SVR are displayed in
Figure 12 and
Table 6.
Compared with RBFNN and GRNN, it can be seen from
Table 6 that the proposed approach has the lowest MAPE and the best
r2 on both the training set and testing set, which verifies the validity and superiority of the hybrid kernel function on forecasting. In summary, the calorific value forecasting model based on HKF-SVR is much more accurate and reliable than other conventional methods, which is beneficial for recycling coal gangue and reducing the environmental pollution.
5. Conclusions
In order to handle the issue that there is a lack of methods for forecasting the calorific value of coal gangue conveniently and accurately, a novel approach based on hybrid kernel function–support vector regression and genetic algorithms is presented in this paper. Firstly, the key characteristics of coal gangue gathered from major coal mines are measured and employed to build a sample set. Then, the forecasting performances of single kernel function-based models are compared, and the results suggest that the linear kernel and Gaussian kernel are capable of providing better accuracy and trend description ability. Next, a hybrid kernel linearly combining two kernel functions was used to create a calorific value forecasting model. Moreover, a genetic algorithm is introduced to optimize the critical parameters of SVR and the adjustable weight. The experimental results indicate that the hybrid kernel function-based model results in lower MAPE and a higher squared correlation coefficient, and the HKF-SVR model is more suitable for forecasting the calorific value of coal gangue than that of the single kernel function. Moreover, the forecasting performance of the presented method is superior to other conventional forecasting methods.