Next Article in Journal
\({\mathbb{T}}\)-Proper Hypercomplex Centralized Fusion Estimation for Randomly Multiple Sensor Delays Systems with Correlated Noises
Previous Article in Journal
A Novel Approach to Automated 3D Spalling Defects Inspection in Railway Tunnel Linings Using Laser Intensity and Depth Information
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Early Warning of Gas Concentration in Coal Mines Production Based on Probability Density Machine

School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China
*
Author to whom correspondence should be addressed.
Sensors 2021, 21(17), 5730; https://doi.org/10.3390/s21175730
Submission received: 6 July 2021 / Revised: 22 August 2021 / Accepted: 24 August 2021 / Published: 25 August 2021
(This article belongs to the Section Intelligent Sensors)

Abstract

:
Gas explosion has always been an important factor restricting coal mine production safety. The application of machine learning techniques in coal mine gas concentration prediction and early warning can effectively prevent gas explosion accidents. Nearly all traditional prediction models use a regression technique to predict gas concentration. Considering there exist very few instances of high gas concentration, the instance distribution of gas concentration would be extremely imbalanced. Therefore, such regression models generally perform poorly in predicting high gas concentration instances. In this study, we consider early warning of gas concentration as a binary-class problem, and divide gas concentration data into warning class and non-warning class according to the concentration threshold. We proposed the probability density machine (PDM) algorithm with excellent adaptability to imbalanced data distribution. In this study, we use the original gas concentration data collected from several monitoring points in a coal mine in Datong city, Shanxi Province, China, to train the PDM model and to compare the model with several class imbalance learning algorithms. The results show that the PDM algorithm is superior to the traditional and state-of-the-art class imbalance learning algorithms, and can produce more accurate early warning results for gas explosion.

1. Introduction

Coal resource is an important basic energy source in China. From 2015 to 2019, the average annual coal production in China accounted for about 68% of the total energy production [1]. However, coal mine safety accidents have caused serious economic losses and also seriously endangered the lives of miners. Gas explosions are an important cause of coal mine safety accidents, which may cause many miners to die or can destroy the whole coal mine [2]. According to statistics, from 2007 to 2017, gas explosions accounted for 50% of coal mine safety accidents in China. From 2004 to 2015, 10,298 persons died from gas explosion accidents, accounting for 29.7% of deaths in various coal mine accidents [3,4]. Therefore, the prevention of gas explosion accidents should be the top priority of coal mine safety accident prevention in China. How to reduce such hazards and to achieve safe coal mining is a major problem. It is of great significance to improve the ability of gas disaster prediction by strengthening the study of gas disaster prediction and early warning technology.
In recent years, researchers have found that the gas concentration generally rises abnormally before incidents of coal mine gas explosions [5]. Therefore, in order to predict gas concentration accurately and prevent gas accidents effectively, researchers have proposed some gas prediction methods. Models of gas concentration or gas outburst forecasts are largely based on BP neural networks [6,7,8], LSTM neural networks [9], the SVR algorithm [10,11], the ELM algorithm [12], the Gaussian process regression algorithm [13], and some other mathematical or statistical methods [14,15,16]. These methods always use the time-series data collected by the gas sensors to establish the regression prediction model of gas concentration. However, we note that the number of the actual gas data collected by the sensors exceeding the warning threshold is scarce, so the traditional warning models cannot learn the rule for when the gas concentration rises abnormally, thus it is difficult to achieve the effect of warning in advance.
In this study, we consider gas early warning prediction as a binary classification issue. Specifically, the sensing data are divided into one of two classes, early warning class and non-early warning class, based on a pre-defined early warning threshold. It is reasonable that regarding gas early warning prediction as a binary classification issue, owing to in practical coal mining production, it is not necessary to accurately predict gas concentration over a period of future time, but should judge whether it has a high risk to impact safety production or not. Several previous studies support this system of construction, e.g., Ruta and Chen [17] constructed a methane concentration warning model by combining multiple classification models with optimization approach to provide a relatively accurate warning for the future 3 min’ methane emission; Zhang et al. [18] developed a gas outburst early warning system by adopting an entropy-weight Bayes inference model. These models seem to be effective, however, they all ignore an important data character, i.e., these kind of data are always extremely imbalanced.
To deal with class imbalanced data, some different solutions have been proposed, and they can be roughly divided into two groups: data level and algorithm level. Data level generally adopts resampling strategies to increase instances belonging to the minority class, or to decrease instances from the majority class, and further re-balances the data distribution [19,20,21]. Several popular resampling algorithms include random undersampling (RUS) [22], random oversampling (ROS) [22] and the synthetic minority oversampling technique (SMOTE) [23] etc. ROS tends to be overfitting, RUS is apt to lose some key information related with classification, while SMOTE is inclined to propagate noises. In recent years, some advanced sampling algorithms have also been proposed to address the problems mentioned above. For example, Xie et al. [24] proposed a GL algorithm which first takes advantage of the mixture-Gaussian model to estimate the distribution of minority instances, and then oversamples minority class based on the estimated results. As for the algorithm-level strategy, it mainly includes cost-sensitive learning [25,26] and the threshold moving technique [27,28]. Cost-sensitive learning designates different penalties for training errors belonging to different classes, further balancing the training errors of different classes. The threshold moving technique firstly trains a classification model, and then moves classification hyperplane towards majority class to repair the bias.
To promote the modeling quality of the early warning model on skewed gas concentration data, in this study we proposed a new class imbalance learning solution called the probability density machine (PDM), which adopts a KNN-PDE K nearest neighbors probability density estimation (KNN-PDE)-alike algorithm [29] to approximately estimate the probability density of each instance, and then directly compares the probability density of an instance on each class to decide which category that instance belongs to. The PDM algorithm has a good adaptability to the skewed data distribution. The PDM algorithm can directly achieve a good warning effect on gas concentration monitoring data, even without resampling any instances. The effectiveness and superiority of the PDM is verified on six gas concentration monitoring data sets collected from a coal mine in China.
The rest of this study is organized as follows. Section 2 reviews the corresponding methods related with the gas concentration prediction. Section 3 describes the procedure of data collection, data preprocessing and instance generation. In Section 4, we firstly analyze why imbalanced data distribution always hurts the performance of predictive models in context of Gaussian Naive Bayes (GNB) as our proposed PDM algorithm inherits from GNB [30,31], and then based on the analysis, we describe the proposed KNN-PDE-alike and PDM algorithms in detail. Section 5 analyzes and discusses the experimental results. Finally, Section 6 concludes this paper.

2. Related Works

In recent years, researchers focused more and more attentions on the gas concentration and gas outburst prediction in coal mine safe production, and have presented some solutions.
Zhang et al. [6] noted that gas disasters are related to many factors, including crustal stress, coal structural performance, geological structure, gas content, etc. They combined the GM (1, 1) grey prediction model and BP neural network to predict the gas emission value, and acquired an improved result in comparison with only adopting any one single model. A similar method was proposed by Wang et al. [7], which also adopted an artificial neural network to handle various factors and various nonlinear relationships in geological conditions, and acquired an excellent prediction result about gas outburst. To avoid the BP neural network falling into local minimization, Wu et al. [8] combined genetic algorithm (GA) and the simulated annealing algorithm (SA) as a new genetic algorithm to improve the generalization of the neural network, and further improve prediction accuracy of gas outburst.
Lyu et al. [9] first fused the gas information of multiple sensors inside the coal mine, and then used the LSTM model based on encoder-decoder to construct multivariant regression and predict the short-term gas concentration. Wu et al. [10] firstly used the t-distributed stochastic neighbor embedding (t-SNE) algorithm to perform non-linear dimensionality reduction in coal mine gas-related multi-dimensional monitoring data, then extracted the spatial feature data of the monitoring data, and finally used the support vector regression (SVR) algorithm to predict the top corner gas concentration. Meng et al. [11] also adopted SVR to predict mine gas emission rate and found it outperforms artificial neural networks. Wu et al. [12] decomposed the time series of gas concentration into many time-frequency components by using the wavelet analysis algorithm, and then constructed the prediction model of gas concentration by adopting the extreme learning machine (ELM). Dong et al. [13] combined the Bayesian network method, chaotic phase space reconstructive technology, and Gaussian process regression model to construct a gas concentration prediction model that can produce competitive prediction results.
The above-mentioned related studies considered the sophisticated geological factors which have a strong association with coal mine exploration, but a weak association with the process of coal mines production; or they regarded multiple different groups of sensing and monitoring data, and adopt multivariant regression to construct the prediction model. In this paper, we focus on a single variance, i.e., the gas concentration monitoring data acquired from a single sensor. We wish to analyze each single time-series and discover the regular temporal patterns from each time-series. In addition, we note in gas concentration monitoring sequence, the high concentration corresponds significantly less points, thus the data hold extremely imbalanced distribution. However, all existing models almost ignore this problem, which may be a great risk to subsequent modeling. It is also an important consideration in our study.

3. Data

3.1. Data Acquisition

We collected data by gas concentration sensors from 363 monitoring points in a coal mine in Datong city, Shanxi Province, China. The data are collected in the form of a time series during the period of 2019.10.22 00:00:00~2020.03.24 00:00:00, a total of 5 months. Figure 1 presents the original gas concentration variation trend of a monitoring point during these 5 months, where each point in the horizon-axis represents the average value of gas concentration in 10 min. In Figure 1, there is no clear law to describe the global gas concentration variation, but some local variation rules can be observed. In other words, the gas concentration of a time point can be only decided by its nearest short-term time series fragment.

3.2. Data Preprocessing

The actual operation of the gas data collector is affected by various factors such as the changes in the downhole environment, so that the actual monitoring data obtained are not a data sequence with uniform time intervals. Therefore, we first normalize each time series, taking 10 min as an interval, and adopting the average value of the received gas concentration readings existing in each 10 min as the representative of gas concentration for the corresponding 10 min.
Specifically, we note that if there were no readings within any one 10 min in the original sequence, then it would be represented as a null value of the corresponding position in the new normalized sequence. The statistical analysis of the original gas concentration data in coal mines shows that in the collected gas concentration data, the missing rate of a large number of monitoring points exceeds 20% during the period of 2019.10.22 00:00:00~2020.03.24 00:00:00, which might greatly affect the downstream modeling and analysis. Therefore, we only selected gas concentration data from six monitoring points with a missing rate of less than 20% in this study. These six monitoring points are respectively the return air gas face on surface 8301, the working face gas on surface 8301, the top corner gas on surface 8301, the top corner gas on surface 8222, the return air gas face on surface 8222, and the working face gas on surface 8222 (we call them D1, D2, D3, D4, D5 and D6 in brief). Figure 1 shows the variation trend of gas concentration in data set D1, and the variation trend of gas concentration in data sets D2–D6 are shown in Figure A1, Figure A2, Figure A3, Figure A4 and Figure A5, respectively. The statistical results of these data sets are shown in Table 1.
In Table 1, it can be seen that these six data sets have 6.1~15.5% missing rate, which further destroys the downstream modeling procedure. To address this problem, we adopted a linear interpolation-based missing value imputation strategy. Assuming there exists c continuous missing values, the former non-missing value is a, and the latter non-missing value is b, then the difference is d = b − a, the step is e = d / c , thus based on the concept of interpolation, the imputed c missing values are a + 1 × e ,   a + 2 × e , ,   a + c × e .
Figure 2a,b show the variation trend of gas concentration before and after imputing the missing data within a week. From Figure 2b, we can see that the gas concentration variation trend is more continuous and regular after data imputation, which verifies its rationality.

3.3. Instance Generation

Next, we need to consider how to take advantage of the imputed time series to generate instances, and to use them for training the classification model. As referred above, the variation in gas concentration is local, but not global, which means a future short-time gas concentration variation only relates with a recently happened fragment of gas concentration variation. Therefore, we adopt the slide-time-window strategy to generate instances (see Figure 3).
In Figure 3, we can see that in time series data, each slide time window corresponds with an instance, and when the window slides forward a step, it can acquire a new instance. Suppose a time series can be represented as T = { x 1 , x 2 , x 3 , , x N 1 , x N } , the length of attributes, predictive values, and slide time window are respectively l, t, and s, where s = l + t, then it would generate Ns + 1 instances. Here, l denotes how long of recently monitored data can be used to predict future status, while t indicates how far of the future status can be predicted. Considering the high frequency of variation in gas concentration monitoring data, we designate l = 24 and t = 6 empirically in this paper. That is to say, we use the recent 4 h of experience to predict the future 1 h status.
Then the input matrix of the generated data set can be expressed as:
X = [ x 1 x 2 x l x 2 x 3 x l + 1 x N l t + 1 x N n t + 2 x N t + 1 ]
and the output matrix represents as:
Y = [ x l + 1 x l + 2 x N t + 1 x l + t x l + t + 1 x N ]
We consider the early warning as a classification task in this study, thus the readings in the expected output Y should be divided into two categories according to a previously given early warning threshold of gas concentration. When the reading exceeds the threshold, the expected output is expressed as +1, indicating it belongs to the class of early warning, otherwise, the expected output is expressed as −1, indicating it belongs to the category of non-early warning. Then, the original output matrix Y is transformed to be the form below:
Y = [ 1 + 1 1 1 1 + 1 ]
In this study, we designate 0.40 as the threshold and in actual production the early warning threshold may be much higher than 0.40. Specifically, for each future 10 min, it corresponds a prediction, hence the data set can be further divided into six different subsets. We provide the statistics about the class imbalance rate (the number of instances belonging to the majority class divided by that of the minority class) on the data sets D1–D6 (see Figure 4). From Figure 4, it can be seen that although the data sets hold different class imbalance rates, the imbalance distribution is obvious, which might bring challenges for further modeling the classifier.

4. Methodology

4.1. Gaussian Naive Bayes Model and the Reason Why It Is Hurt by Imbalanced Data Distribution

In this section, we discuss why skewed data distribution hurts classification models based on the Bayes theorem. The reason for selecting the Bayes theorem to provide explanation is because it is the theoretical basis of statistical machine learning [32,33], and our proposed PDM inherits from it.
As we know, the original Naive Bayes model can be only used to deal with the data with discrete attributes. If the attribute space of a classification task is continuous, it needs to adopt a variant of Naive Bayes named Gaussian Naive Bayes [30,31] to model the classifier. Specifically, GNB assumes that the conditional probability of data features satisfies the Gaussian distribution, thus for any instance (xi), its conditional probability can be calculated as below:
P ( x i | y ) = 1 2 π σ y 2 e ( x i μ y ) 2 2 σ y 2
where μ y and σ y 2 denote mean and variance, respectively. These two values can be directly estimated from the original data. Here, P(xi|y) represents the conditional probability, i.e., in class y, the probability density of the instance xi. Then, if we know the number of instances in class y and the number of all instances, we can calculate the prior probability P(y). Further, based on the Bayes formula, the posterior probability P(y|xi) can be calculated by:
P ( y | x i ) = P ( x i | y ) P ( y ) P ( x i | y ) P ( y ) + P ( x i | ~ y ) P ( ~ y )
where P(y|xi) denotes the probability of xi belonging to the class y.
Without loss of generality, we suppose the classification task is binary. Let Φ = { ( x i , y i ) | x i m , y i { Y + , Y } , 1 i q } be the training data set, where Y + and Y denote the minority and majority class, respectively. Then Φ can be divided into two different groups Φ + = { ( x i , y i ) | x i m , y i = Y + , 1 i q + } and Φ = { ( x i , y i ) | x i m , y i = Y , 1 i q } , where q = q + + q and meanwhile q > q + . To further simplify the procedure of analysis, we suppose the number of attributes m = 1, and in this attribute, both classes satisfy the Gaussian distribution (see Figure 5). Then the prior probability P ( Y + ) , P ( Y ) and the conditional probability P ( X | Y + ) , P ( X | Y ) can be directly acquired. According to Bayesian formula, the posterior probability of two classes can be calculated as follows:
P ( Y + | X ) = P ( X | Y + ) P ( Y + ) P ( X ) ,   P ( Y | X ) = P ( X | Y ) P ( Y ) P ( X )
where
P ( X ) = P ( X | Y + ) P ( Y + ) + P ( X | Y ) P ( Y )
When P ( Y + | X ) = P ( Y | X ) , i.e., P ( X | Y + ) P ( Y + ) = P ( X | Y ) P ( Y ) , selecting the corresponding X as the separating point of two classes can guarantee the training error rate is minimal.
As shown in Figure 5, when both classes hold the same density distributions, i.e., P ( Y + ) = P ( Y ) , the misclassification risk can be averagely born by both classes. While if P ( Y ) > P ( Y + ) , to guarantee P ( X | Y + ) P ( Y + ) = P ( X | Y ) P ( Y ) , the separating point moves towards the minority class, which means the minority class would sacrifice more classification accuracy than that in the majority class. Specifically, the larger the difference between P ( Y + ) and P ( Y ) is, the more accuracy loss the minority class would bear.
This explains the reason why skewed data distribution hurts the performance of GNB. It is only related with the prior probability, but not conditional probability density. Therefore, we can find:
P ( Y + | X ) P ( X | Y + ) , P ( Y | X ) P ( X | Y )
which means if we can estimate the conditional probability density for each instance within each class it belongs, it can directly classify each instance. However, GNB assumes the instances belonging to each class satisfy the Gaussian distribution, which may severely misesteem the conditional probability density, further causing low modeling quality.

4.2. KNN-PDE-alike Probability Density Estimation Algorithm

To address the problem mentioned above, we benefit from the idea of the K-nearest-neighbors probability density estimation (KNN-PDE) strategy [34,35] to propose a robust and universal algorithm named KNN-PDE-alike.
Suppose there are q instances, then for each instance xi, we can find its Kth nearest neighbors in the remainder q − 1 instances, and represent their distance as d i K , where K < q. It is clear that a smaller d i K denotes a higher probability density for the instance xi, and vice versa. Considering the value principle is counterintuitive, we transform each distance to be its reciprocal, i.e., 1 / d i K . Next, each instances’ conditional probability density can be calculated by:
P ( x i ) = 1 / d i K Z
where Z is a normalized factor that is calculated as below:
Z = i = 1 q 1 / d i K
Actually, P(xi) is not the real conditional probability density, but its value reflects the same proportional relation as the conditional probability density, thus we call it relative conditional probability density. An instance holding a larger P(xi) value means it has a larger conditional probability density and lies in a denser region, and vice versa. In addition, from Equations (9) and (10), it is also not difficult to observe that the sum of all q instances’ relative conditional probability densities equals 1.
The procedure of the KNN-PDE-alike conditional probability density estimation Algorithm 1 is described as follows.
Algorithm 1. KNN-PDE-alike algorithm
Input: a data set Φ = {xi | xiRm, 1 ≤ iq}, the neighborhood parameter K.
Output: a 1 × q vector CPD to record the relative conditional probability density of all instances.
Procedure:
1. For each instance xi, find its Kth nearest neighbor, and record their distance d i K ;
2. Calculate the normalized factor Z by Equation (10);
3. Calculate the relative conditional probability density P(xi) by Equation (9);
4. Record the relative conditional probability density one by one into CPD and output it.

4.3. Probability Density Machine

Based on KNN-PDE-alike algorithm, we can precisely estimate the relative conditional probability density of each instance regardless of data distribution types. Then, an unbiased prediction can be provided. The proposed unbiased CIL prediction algorithm is called the probability density machine (PDM). Specifically, considering in imbalanced data, different classes hold a different number of instances, which means the relative conditional probability density of different classes are scaled in different scales, hence it is difficult to directly compare the relative conditional probability density of two instances belonging to two different classes. To deal with this problem, a pre-normalization process should be firstly conducted to unify the dimension of the relative conditional probability density from different classes. Suppose the instance xi is from the majority class, then its normalized conditional probability density can be calculated as below:
P ( x i ) = 1 / d i K Z × C I R
where Z and CIR denote the normalized factor of majority class and class imbalance ratio, respectively.
The other parameter which may influence the estimating accuracy of the relative conditional probability density is the neighborhood factor K. Obviously, it is inappropriate to designate a uniform value for the parameter K as there exists a significant difference about the number of instances belonging to different classes. In addition, we note it is also inappropriate to assign an oversize or too small value for the parameter K. If the K value is too large, the distinction of the relative conditional probability density from different instances would become ambiguous, but if the K value is too small, it would reflect more about a narrow local probability density distribution, but not the global probability density distribution. In this work, we suggest designating K as q by default.
The procedure of the PDM Algorithm 2 is described as follows.
Algorithm 2. PDM algorithm
Input: an imbalanced training set Φ = {(xi, yi) | xiRm, 1 ≤ Iq, yi∈{Y+, Y }}, a test instance x’.
Output: a predictive class label for x’.
Training Procedure:
1. For majority class Y and minority class Y+, extract their corresponding instances into Φ and Φ+ from Φ, count their number of instances and record them as q and q+, respectively;
2. Calculate CIL which equals |Φ|/|Φ+|;
3. For two classes Y and Y+, set the corresponding parameter K as q and q + , respectively;
4. For two classes, call KNN-PDE-alike algorithm to obtain and record the corresponding normalized factors Z and Z+.
Testing Procedure:
1. For class Y, put x’ and the corresponding Φ, K and Z into KNN-PDE-alike algorithm to obtain the corresponding relative conditional probability density P(x’);
2. For class Y+, put x’ and the corresponding Φ+, K+ and Z+ into KNN-PDE-alike algorithm to obtain the corresponding relative conditional probability density P+(x’);
3. For class Y, call CIL to adjust its relative conditional probability and obtain the normalized probability density P(x’) by Equation (11);
4. Compare P(x’) and P+(x’), if P(x’)> P+(x’), predicting x’ into Y, otherwise, x’ is predicted into Y+;
5. Output the class label of the test instance x’.

5. Experiments

5.1. Experimental Settings

In the experiment, we designated l = 24, t = 6, K = q by default, which were indicated in the sections above. Specifically, due to t = 6, we need to train six different classification models on each data set as our proposed PDM model or other classification models can only deal with single-output problem. In addition, we compared the proposed PDM algorithm with GNB [30,31] and its combination with several traditional sampling strategies, including RUS [22], ROS [22] and SMOTE [23]. We also compared it with several state-of-the-art class imbalance learning algorithms, including GL [24], FSVM-CIL [26] and ODOC-ELM [28]. All comparison algorithms used the default parameter settings in the corresponding references.
As for the performance evaluation metric, we all know accuracy is not an excellent metric to evaluate the quality of class imbalance learning models, thus we adopted a popular metric called F-measure in this paper. F-measure can be calculated as below:
F-measure = 2 × Precision × Recall Precision + Recall
In fact, F-measure represents a tradeoff between precision and recall.
Finally, in order to impartially compare the performance of various algorithms, we adopted the randomly external 5-fold cross validation 10 times to calculate the average performance as the final results.

5.2. Results and Discussions

Table 2 shows the experimental results of 8 compared algorithms. From the results, we can safely draw several conclusions as follows:
(1)
On gas concentration data, two traditional oversampling strategies, i.e., ROS and SMOTE, seem to be impossible for promoting the quality of the classification model, while RUS sometimes presents a little better performance than GNB. We believe it associates with the structural and distribution complexity of gas concentration monitoring data. On this kind of data, ROS makes the model extremely overfitting, and SMOTE generates many synthetic instances on inappropriate positions. GL-GNB, which considers the distribution during sampling, alleviates the problem of ROS and SMOTE to some extent. However, the performance promotion by adopting GL-GNB seems to be restricted as on D4 and D5, it produces worse performance than GNB.
(2)
Class imbalance rate can influence the performance of various algorithms to some extent, including the proposed PDM algorithm. It can clearly be observed that the worse F-measure values exist on those two highly imbalanced data sets, namely D3 and D6, while on other data sets, the classification performance is obviously better. We believe it is related to a rare number of minority training instances, which are not enough to precisely reconstruct the probability distribution of the minority class.
(3)
On most data sets, two algorithm-level algorithms, i.e., FSVM-CIL and ODOC-ELM, perform significantly better than those sampling-based strategies. Of course, they are both more sophisticated than those sampling algorithms. FSVM-CIL needs to explore data distribution and assign individual cost weight for each instance, while ODOC-ELM needs to adopt the random optimization algorithm to iteratively search the best threshold. We also note that on highly imbalanced data sets, e.g., D3 and D6, FSVM-CIL outperforms ODOC-ELM, while on the other data sets, ODOC-ELM performs better.
(4)
The proposed PDM algorithm outperforms all other solutions. In fact, it produced the best result on the most predicted time points of each data set. Specifically, in comparison with the GNB, the performance of PDM was improved 13.1~24.7%, while compared with several other algorithms, the performance of PDM promotes 0.4~40.7%. The results verifies the effectiveness and superiority of the proposed PDM algorithm.
Figure 6 shows the multi-step prediction results of the PDM algorithm on each data set. In Figure 6, it is not difficult to observe that the predicted quality always declines with the antedisplacement of the predicting time. It is consistent with our intuition as the experience should have a stronger association with a nearer future status. Therefore, we think the early warning model can only implement short-time prediction.

5.3. Significance Statistical Analysis

Next, we tested the actual difference between the PDM and the other compared algorithms in statistics. Specifically, the critical difference (CD) metric is used to show the difference of various algorithms. Figure 7 shows the CD diagram at a standard level of significance α = 0.05, where the average ranking of each algorithm is marked along the axis (higher rankings to the left). In the CD diagram, if a group of algorithms are not significantly different under the Nemenyi test [36,37], these algorithms are connected by a thick line.
From the results shown in Figure 7, we observe that the PDM algorithm achieves the statistically superior performance over all other algorithms except ODOC-ELM, and although we cannot say it has significant difference with ODOC-ELM algorithm, it has a lower average rank than ODOC-ELM. To summarize, the proposed PDM algorithm is a better choice than the compared algorithms.

5.4. Discussion about the Parameters

To accurately estimate the relative conditional density in PDM, the choice of parameter K is very important. As indicated in Section 3, it is inappropriate to assign an oversize or too small value for the parameter K. If the K value is too large, the distinction of relative conditional probability density from different instances would become ambiguous, but if the K value is too small, it would reflect more about a narrow local probability density distribution, but not the global probability density distribution. To verify the deduction, we varied the parameter K in the range of { q / 4 , q / 2 , q , 2 q , 4 q } . The variance of the F-measure performance with the variance of the parameter K is presented in Table 3. It is clear that when K is assigned as a value between q / 2 and 2 q , the performance of PDM can be safely guaranteed.
Next, we also focused on the impact of another parameter l which defines how long of recent experiences are useful for predicting the future status, and determines the dimension of attribute space. We varied l in the range of {6, 12, 24, 48, 72}, i.e., taking advantage of the experience of recent 1, 2, 4, 8 and 12 h, to observe the performance variance of the PDM. The performance variance with the variance of the parameter l is presented in Table 4. The results in Table 4 reflect that adopting the experience of recent 2~4 h is most appropriate for predicting the near future gas concentration status. An oversize l may insert some irrelevant noises, while a too small l may lack some significant information.

6. Conclusions

In this study, we focused on the early warning issue of gas concentration in coal mine production. Specifically, we consider it as an imbalanced binary-class classification issue, and in the context of the Naive Bayes theory, we propose a novel class imbalance learning algorithm called the probability density machine. By six real gas concentration monitoring data sets acquired from a coal mine in Datong city, Shanxi Province, China, the effectiveness and superiority of the proposed PDM algorithm was verified.
The contributions of this study can be concluded as follows:
(1)
We consider the early warning issue of gas concentration in coal mine production as a classification issue, and note its characteristics of class imbalance and sophisticated distribution;
(2)
In context of the Naive Bayes theory, we analyzed why imbalanced data distribution can hurt predicted models in theory;
(3)
A novel class imbalance learning algorithm called the probability density machine was used to promote the accuracy of early warning of gas concentration in coal mine production.
In future work, except the gas concentration data, other synchronously occurring sensing or monitoring data will also be considered to be added into the predicting model for the purpose of improving the predicting accuracy.

Author Contributions

Conceptualization, Y.C. and H.Y.; Data curation, S.W.; Formal analysis, Y.C. and H.Y.; Funding acquisition, H.Y.; Investigation, H.Y.; Methodology, Y.C., S.W. and H.Y.; Project administration, H.Y.; Resources, H.Y.; Software, Y.C., S.W. and M.Z.; Supervision, S.G. and H.Y.; Writing—original draft, Y.C. and S.W.; Writing—review and editing, S.G. and H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported in part by Natural Science Foundation of Jiangsu Province of China under grant No. BK20191457, Open Project of Artificial Intelligence Key Laboratory of Sichuan Province under grant No.2019RYJ02, National Natural Science Foundation of China under grants No. 62076111 and No. 62076215.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are collected from a coal mine in Datong city, Shanxi Province, China, and are not being released due to privacy issues. If necessary, please contact the author.

Acknowledgments

We want to show our gratitude to Jiafeng Yu in Jiangsu Maritime Institute for providing help to acquire gas concentration data and partial funding support in this study.

Conflicts of Interest

The authors declare there are no conflict of interests.

Appendix A

Figure A1. Variation trend of gas concentration in D2 data set.
Figure A1. Variation trend of gas concentration in D2 data set.
Sensors 21 05730 g0a1
Figure A2. Variation trend of gas concentration in D3 data set.
Figure A2. Variation trend of gas concentration in D3 data set.
Sensors 21 05730 g0a2
Figure A3. Variation trend of gas concentration in D4 data set.
Figure A3. Variation trend of gas concentration in D4 data set.
Sensors 21 05730 g0a3
Figure A4. Variation trend of gas concentration in D5 data set.
Figure A4. Variation trend of gas concentration in D5 data set.
Sensors 21 05730 g0a4
Figure A5. Variation trend of gas concentration in D6 data set.
Figure A5. Variation trend of gas concentration in D6 data set.
Sensors 21 05730 g0a5

References

  1. National Bureau of Statistics of the People’s Republic of China. Available online: https://data.stats.gov.cn/ (accessed on 8 May 2021).
  2. Ma, Q.; Zhang, Q.; Li, D.; Chen, J.; Ren, S.; Shen, S. Effects of premixed methane concentration on distribution of flame region and hazard effects in a tube and a tunnel gas explosion. J. Loss Prev. Process. Ind. 2015, 34, 30–38. [Google Scholar] [CrossRef]
  3. Zhang, J.; Cliff, D.; Xu, K.; You, G. Focusing on the patterns and characteristics of extraordinarily severe gas explosion accidents in Chinese coal mines. Process. Saf. Environ. Prot. 2018, 117, 390–398. [Google Scholar] [CrossRef] [Green Version]
  4. Zhu, Y.; Wang, D.; Shao, Z.; Xu, C.; Zhu, X.; Qi, X.; Liu, F. A statistical analysis of coalmine fires and explosions in China. Process. Saf. Environ. Prot. 2019, 121, 357–366. [Google Scholar] [CrossRef]
  5. Song, Y.; Yang, S.; Hu, X.; Song, W.; Sang, N.; Cai, J.; Xu, Q. Prediction of gas and coal spontaneous combustion coexisting disaster through the chaotic characteristic analysis of gas indexes in goaf gas extraction. Process. Saf. Environ. Prot. 2019, 129, 8–16. [Google Scholar] [CrossRef]
  6. Zhang, S.; Wang, B.; Li, X.; Chen, H. Research and Application of Improved Gas Concentration Prediction Model Based on Grey Theory and BP Neural Network in Digital Mine. Procedia CIRP 2016, 56, 471–475. [Google Scholar] [CrossRef] [Green Version]
  7. Wang, F.; Liu, W. Prediction Strategy of Coal and Gas Outburst Based on Artificial Neural Network. J. Comput. 2013, 8, 240–247. [Google Scholar] [CrossRef]
  8. Wu, Y.; Gao, R.; Yang, J. Prediction of coal and gas outburst: A method based on the BP neural network optimized by GASA. Process. Saf. Environ. Prot. 2020, 133, 64–72. [Google Scholar] [CrossRef]
  9. Lyu, P.; Chen, N.; Mao, S.; Li, M. LSTM based encoder-decoder for short-term predictions of gas concentration using multi-sensor fusion. Process. Saf. Environ. Prot. 2020, 137, 93–105. [Google Scholar] [CrossRef]
  10. Wu, H.; Shi, S.; Lu, Y.; Liu, Y.; Huang, W. Top corner gas concentration prediction using t-distributed Stochastic Neighbor Embedding and Support Vector Regression algorithms. Concurr. Comput. Pract. Exp. 2020, 32, e5705. [Google Scholar] [CrossRef]
  11. Meng, Q.; Ma, X.; Zhou, Y. Prediction of Mine Gas Emission Rate using Support Vector Regression and Chaotic Particle Swarm Optimization Algorithm. J. Comput. 2013, 8, 2908–2915. [Google Scholar] [CrossRef]
  12. Wu, X.; Qian, J.-S.; Huang, C.-H.; Zhang, L. Short-Term Coalmine Gas Concentration Prediction Based on Wavelet Transform and Extreme Learning Machine. Math. Probl. Eng. 2014, 2014, 858620. [Google Scholar]
  13. Dong, D.-W.; Li, S.-G.; Chang, X.-T.; Lin, H.-F. Prediction Model of Gas Concentration around Working Face Using Multivariate Time Series. J. Min. Saf. Eng. 2012, 29, 135–139. [Google Scholar]
  14. Yuan, B. Study on gas emission prediction of working face based on GM (1, 1) model. J. Phys. Conf. Ser. 2020, 1549, 042031. [Google Scholar] [CrossRef]
  15. Li, D.; Cheng, Y.; Wang, L.; Wang, H.; Wang, L.; Zhou, H. Prediction method for risks of coal and gas outbursts based on spatial chaos theory using gas desorption index of drill cuttings. Min. Sci. Technol. 2011, 21, 439–443. (In Chinese) [Google Scholar] [CrossRef]
  16. Wang, T.; Cai, L.-Q.; Fu, Y.; Zhu, T.-C. A Wavelet-Based Robust Relevance Vector Machine Based on Sensor Data Scheduling Control for Modeling Mine Gas Gushing Forecasting on Virtual Environment. Math. Probl. Eng. 2013, 2013, 579693. [Google Scholar]
  17. Ruta, D.; Cen, L. Self-Organized Predictor of Methane Concentration Warnings in Coal Mines. Lect. Notes Comput. Sci. 2015, 9437, 485–493. [Google Scholar]
  18. Zhang, J.; Ai, Z.; Guo, L.; Cui, X. Research of Synergy Warning System for Gas Outburst Based on Entropy-Weight Bayesian. Int. J. Comput. Intell. Syst. 2021, 14, 376–385. [Google Scholar] [CrossRef]
  19. He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
  20. Guo, H.; Li, Y.; Shang, J.; Gu, M.; Huang, Y.; Gong, B. Learning from class-imbalance data: Review of methods and applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar]
  21. Yu, H.; Ni, J.; Zhao, J. ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data. Neurocomputing 2013, 101, 309–318. [Google Scholar] [CrossRef]
  22. Batista, G.E.A.P.A.; Prati, R.C.; Monard, M.C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 2004, 6, 20–29. [Google Scholar] [CrossRef]
  23. Chawla, N.; Bowyer, K.W.; Hall, L.O. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  24. Xie, Y.; Peng, L.; Chen, Z.; Yang, B.; Zhang, H.; Zhang, H. Generative leaning for imbalanced data using the Gaussian mixed model. Appl. Soft Comput. J. 2019, 79, 439–451. [Google Scholar] [CrossRef]
  25. Imam, T.; Ting, K.M.; Kamruzzaman, J. z-SVM: An SVM for improved classification of imbalanced data. Proc. Aust. Conf. Artif. Intell. 2006, 264–273. [Google Scholar]
  26. Batuwita, R.; Palade, V. FSVM-CIL: Fuzzy support vector machines for class imbalance learning. IEEE Trans. Fuzzy Syst. 2010, 18, 558–571. [Google Scholar] [CrossRef]
  27. Yu, H.; Mu, C.; Sun, C.; Yang, W.; Yang, X.; Zuo, X. Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data. Knowl.-Based Syst. 2015, 76, 67–78. [Google Scholar] [CrossRef]
  28. Yu, H.; Sun, C.; Yang, X.; Yang, W.; Shen, J.; Qi, Y. ODOC-ELM: Optimal decision outputs compensation-based extreme learning machine for classifying imbalanced data. Knowl.-Based Syst. 2016, 92, 55–70. [Google Scholar] [CrossRef]
  29. Yu, H.; Sun, C.; Yang, X.; Zheng, S.; Zou, H. Fuzzy Support Vector Machine with Relative Density Information for Classifying Imbalanced Data. IEEE Trans. Fuzzy Syst. 2019, 27, 2353–2367. [Google Scholar] [CrossRef]
  30. Zhang, H.; Zhang, H.; Pirbhulal, S.; Wu, W.; Victor, H.C.D.A. Active Balancing Mechanism for Imbalanced Medical Data in Deep Learning-Based Classification Models. ACM Trans. Multimedia Comput. Commun. Appl. 2020, 16, 1–15. [Google Scholar] [CrossRef] [Green Version]
  31. Griffis, J.C.; Allendorfer, J.B.; Szaflarski, J.P. Voxel-based Gaussian naive Bayes classification of ischemic stroke lesions in individual T1-weighted MRI scans. J. Neurosci. Methods 2016, 257, 97–108. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  32. Berrar, D. Bayes’ Theorem and Naive Bayes Classifier. Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics; Elsevier Science Publisher: Amsterdam, The Netherlands, 2018; pp. 403–412. [Google Scholar]
  33. Kupervasser, O. Quantitative Structure-Activity Relationship Modeling and Bayesian Networks: Optimality of Naive Bayes Model. In Bayesian Networks—Advances and Novel Applications; IntechOpen Publisher: London, UK, 2019. [Google Scholar]
  34. Fukunaga, K.; Hostetler, L. Optimization of k nearest neighbor density estimates. IEEE Trans. Inf. Theory 1973, 19, 320–326. [Google Scholar] [CrossRef]
  35. Wang, Q.; Kulkarni, S.R.; Verdú, S. Divergence estimation for multidimensional densities via k-nearest-neighbor distances. IEEE Trans. Inf. Theory 2009, 55, 2392–2405. [Google Scholar] [CrossRef]
  36. Demsar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
  37. Garcia, S.; Fernandez, A.; Luengo, J.; Herrera, F. Advanced non-parametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sci. 2010, 180, 2044–2064. [Google Scholar] [CrossRef]
Figure 1. An example of gas concentration variation trend over 5 months at a monitoring site.
Figure 1. An example of gas concentration variation trend over 5 months at a monitoring site.
Sensors 21 05730 g001
Figure 2. The gas concentration variation trend of a monitoring point within a week before (a) and after (b) missing data imputation.
Figure 2. The gas concentration variation trend of a monitoring point within a week before (a) and after (b) missing data imputation.
Sensors 21 05730 g002
Figure 3. A graphical representation to describe how to generate instances, where the blue bar and red bar in the slide time window indicate the attributes and the corresponding predicting values, respectively.
Figure 3. A graphical representation to describe how to generate instances, where the blue bar and red bar in the slide time window indicate the attributes and the corresponding predicting values, respectively.
Sensors 21 05730 g003
Figure 4. Class imbalance rate on each data set.
Figure 4. Class imbalance rate on each data set.
Sensors 21 05730 g004
Figure 5. An example that explains why imbalanced data distribution can hurt classification models.
Figure 5. An example that explains why imbalanced data distribution can hurt classification models.
Sensors 21 05730 g005
Figure 6. Performance comparison of the PDM algorithm on multiple future time points.
Figure 6. Performance comparison of the PDM algorithm on multiple future time points.
Sensors 21 05730 g006
Figure 7. CD diagram.
Figure 7. CD diagram.
Sensors 21 05730 g007
Table 1. Statistics of gas concentration data at each monitoring point.
Table 1. Statistics of gas concentration data at each monitoring point.
Data SetTotalLossLoss Rate
D122,03213546.1%
D222,03218458.4%
D322,032223710.2%
D422,032253211.5%
D522,032315514.3%
D622,032342315.5%
Table 2. F-Measure of future one-hour early warning results of each algorithm on six data sets, where the best result is highlighted in bold.
Table 2. F-Measure of future one-hour early warning results of each algorithm on six data sets, where the best result is highlighted in bold.
Data SetAlgorithm10 min20 min30 min40 min50 min60 minAverage
D1PDM0.7380.6900.6590.6360.6180.5990.657
GNB0.5600.5370.5220.5110.5030.4940.521
RUS-GNB0.6010.5690.5480.5340.5240.5110.548
ROS-GNB0.5500.5290.5130.5020.4920.4840.512
SMOTE-GNB 0.5490.5270.5120.4990.4890.4810.510
GL-GNB 0.6290.5810.5630.5490.5370.5310.565
FSVM-CIL 0.6330.5970.6010.5850.5760.5440.589
ODOC-ELM0.6750.6300.5990.5720.5960.5630.606
D2PDM0.7470.7150.6870.6650.6410.6180.679
GNB0.5200.4970.4810.4680.4530.4410.477
RUS-GNB0.4340.4170.4030.3910.3810.3720.400
ROS-GNB0.5030.4800.4640.4480.4360.4270.460
SMOTE-GNB 0.5070.4830.4650.4480.4360.4250.461
GL-GNB 0.5280.5250.5100.5330.4790.4560.505
FSVM-CIL 0.4990.4860.4720.4810.4660.4520.476
ODOC-ELM0.6910.6520.6390.6010.5980.5730.626
D3PDM0.5260.4540.4120.3720.3420.3100.403
GNB0.2370.2170.2060.1970.1910.1830.205
RUS-GNB0.3680.2870.2250.1730.1380.1080.217
ROS-GNB0.2220.2030.1920.1820.1740.1670.190
SMOTE-GNB 0.2210.2010.1860.1780.1720.1640.187
GL-GNB 0.3070.2510.2040.1870.1920.1750.219
FSVM-CIL0.5190.4320.3770.2910.2860.2590.361
ODOC-ELM0.4280.2950.3560.2720.2610.2530.311
D4PDM0.6170.5490.505 0.473 0.452 0.435 0.505
GNB0.4160.3890.3720.3630.3550.3490.374
RUS-GNB0.2150.2090.2040.2010.1980.1960.204
ROS-GNB0.3980.3730.3580.3460.3370.3310.357
SMOTE-GNB0.3990.3700.3540.3420.3330.3270.354
GL-GNB0.4190.3810.3740.3500.3530.3320.368
FSVM-CIL0.5140.4530.4460.4290.3980.4010.440
ODOC-ELM0.5550.5070.4810.4860.4400.4390.485
D5PDM0.6610.6010.5600.5270.4870.4650.550
GNB0.3600.3300.3070.2920.2760.2630.305
RUS-GNB0.5750.5190.4820.4550.4180.3520.467
ROS-GNB0.3330.3060.2860.2710.2580.2470.284
SMOTE-GNB 0.3330.3040.2860.2680.2560.2430.282
GL-GNB 0.3490.3210.3040.2880.2650.2580.298
FSVM-CIL 0.5070.4860.4540.4290.4010.3820.443
ODOC-ELM0.4920.4710.4600.4430.3990.3850.442
D6PDM0.5880.548 0.5270.5100.496 0.4780.525
GNB0.3200.2940.2770.2660.2580.2510.278
RUS-GNB0.1390.1300.1250.1200.1160.1130.124
ROS-GNB0.3050.2760.2620.2500.2410.2320.261
SMOTE-GNB0.3210.2840.2670.2550.2440.2350.268
GL-GNB 0.3560.3320.3090.3110.2890.2730.312
FSVM-CIL 0.5760.5530.5150.5080.5020.4840.523
ODOC-ELM0.3890.3040.3270.2990.2710.2540.307
Table 3. Variance of the F-measure performance with the variance of the parameter K.
Table 3. Variance of the F-measure performance with the variance of the parameter K.
Data Set q / 4 q / 2 q   2 q 4 q
D10.6650.6680.6570.6480.636
D20.6830.6850.6790.6690.657
D30.3760.3980.4030.4150.401
D40.4790.4970.5050.5020.495
D50.5360.5510.5500.5360.503
D60.5010.5240.5250.5160.503
Table 4. Variance of the F-measure performance with the variance of the parameter l.
Table 4. Variance of the F-measure performance with the variance of the parameter l.
Data Set612244872
D10.6420.6840.6570.6390.614
D20.6590.6710.6790.6640.663
D30.3950.4170.4030.3570.344
D40.4880.5380.5050.4700.444
D50.5310.5470.5500.5210.515
D60.5110.5200.5250.5060.488
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Cai, Y.; Wu, S.; Zhou, M.; Gao, S.; Yu, H. Early Warning of Gas Concentration in Coal Mines Production Based on Probability Density Machine. Sensors 2021, 21, 5730. https://doi.org/10.3390/s21175730

AMA Style

Cai Y, Wu S, Zhou M, Gao S, Yu H. Early Warning of Gas Concentration in Coal Mines Production Based on Probability Density Machine. Sensors. 2021; 21(17):5730. https://doi.org/10.3390/s21175730

Chicago/Turabian Style

Cai, Yadong, Shiqi Wu, Ming Zhou, Shang Gao, and Hualong Yu. 2021. "Early Warning of Gas Concentration in Coal Mines Production Based on Probability Density Machine" Sensors 21, no. 17: 5730. https://doi.org/10.3390/s21175730

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop