Next Article in Journal
On Horadam Sequences with Dense Orbits and Pseudo-Random Number Generators
Next Article in Special Issue
Investigation of Transfer Learning for Tunnel Support Design
Previous Article in Journal
Action-Based Digital Characterization of a Game Player
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Rock Thin Section Image Identification Based on Convolutional Neural Networks of Adaptive and Second-Order Pooling Methods

School of Resources and Safety Engineering, Central South University, Changsha 410083, China
*
Authors to whom correspondence should be addressed.
Mathematics 2023, 11(5), 1245; https://doi.org/10.3390/math11051245
Submission received: 8 February 2023 / Revised: 25 February 2023 / Accepted: 1 March 2023 / Published: 4 March 2023

Abstract

:
In order to enhance the ability to represent rock feature information and finally improve the rock identification performance of convolution neural networks (CNN), a new pooling mode was proposed in this paper. According to whether the pooling object was the last convolution layer, it divided pooling layers into the sampling pooling layer and the classification pooling layer. The adaptive pooling method was used in the sampling pooling layer. The pooling kernels adaptively adjusted were designed for each feature map. The second-order pooling method was used by the classification pooling layer. The second-order feature information based on outer products was extracted from the feature pair. The changing process of the two methods in forward and back propagation was deduced. Then, they were embedded into CNN to build a rock thin section image identification model (ASOPCNN). The experiment was conducted on the image set containing 5998 rock thin section images of six rock types. The CNN models using max pooling, average pooling and stochastic pooling were set for comparison. In the results, the ASOPCNN has the highest identification accuracy of 89.08% on the test set. Its indexes are superior to the other three models in precision, recall, F1 score and AUC values. The results reveal that the adaptive and second-order pooling methods are more suitable for CNN model, and CNN based on them could be a reliable model for rock identification.

1. Introduction

Rock identification is a basic and prerequisite work of geological engineering [1]. For instance, geologists need lithological type information to infer the history of regional geological evolution, judge types of deep mineral resources, as well as oil and gas resources, and invert reserve information of various resources [2,3]. Engineers need lithological type information to guide and design the construction of geotechnical engineering, such as mining and tunnelling [4,5]. Insufficient rock type information may lead to a series of engineering disasters, including landslide, collapse, and settlement [6,7]. Hence, it is necessary to study how to identify rock types accurately and quickly.
Many scholars have conducted lots of research on rock identification and put forward many methods, which can be summarized into the following four types: microscopic observation, experimental tests, statistics and learning, and deep learning [8].
Microscopic observation and experimental tests belong to manual identification methods. They observe rock characteristics through optical microscopes, and analyze compositions and structures of rocks and minerals with the help of X-ray Diffraction (XRD), Electron Microprobe Analysis (EMPA), etc. Zhang [9] introduced the identification principle and process of rock microscopic observation in combination with cases. In order to identify acid volcanic rocks, Liu et al. [10] used X-ray Fluorescence Spectroscopy (XRF) to analyze the principal components of rocks and EMPA to check and show the minerals. Manual identification methods have achieved certain results; however, they are time-consuming, labor-intensive, costly, subjective, and greatly affected by professional levels of observers as well as professionalism of instruments. It is not applicable for rock identification of large-scale stratums in engineering.
The automatic identification methods based on rock features and machine learning have gradually gained the favor of scholars [11,12]. Chatterjee et al. [13] input color, shape and texture features extracted from rock images into SVM to identify rock types, and finally achieved an accuracy of 96.2%. Patel et al. [14] extracted nine color histogram features from rock images and input them into the probabilistic neural network. They successfully identified limestone types, with an error rate of less than 6%. Zhang et al. [15] used five machine learning models to identify rock and mineral images, and then selected three models with the best performance to stack. The stacked model effectively improved model performance. Machine learning methods can realize automatic identification of rock types. However, features required to classify still need to be selected subjectively by professionals. Feature types selected usually are few [8]. In addition, features of different lithology have different preferences, resulting in many problems in practical application.
In recent years, artificial intelligence (AI) methods and technologies have developed rapidly [16,17,18,19,20]. As a core method in the field of target detection, target identification and target segmentation, deep learning has been gradually applied to geological and rock engineering [21,22,23,24,25,26].
As a visual model of deep learning, convolutional neural networks (CNN) can automatically select the most suitable features to distinguish different type rocks. Rock type identification using CNN is usually based on rock thin section images. Rock thin sections are rock slices, which are observed and studied under the polarizing microscope. The rock slices are made from large rock samples through cutting, grinding and other operations, and they are about 60 mm × 60 mm in size and about 0.3 mm in thickness. Many scholars have performed lots of innovative research on rock thin section image identification. Polat et al. [27] used DenseNet121 and ResNet50 to identify six types of volcanic rock and tested the impact of four different optimizers on model accuracy. Alzubaidi et al. [28] used the architecture of ResNeXt-50 to identify the rock types of oil and gas reservoir logging core images, and the final accuracy reached 93.12%. In order to improve the identification accuracy, Liu et al. [29] built a mineral image identification model based on ResNet, embedding four visual attention blocks. Ma et al. [30] proposed the MaSE-ResNeXt model to enhance feature connectivity between different channels, and the identification accuracy on three kinds of rock thin section images finally reached 90.89%. Li et al. [8] researched the influence of three different optimization algorithms and two attenuation methods of learning rate on identification performance. Dos Anjos et al. [31] believed that the existing CNN needed to unify the input image size, which would lose the original image information inevitably. Their research work proposed a CNN based on a pyramid pooling layer. The image was down sampled according to the pyramid layering mode, which can process input images of all sizes. The final research showed that this method can improve the identification accuracy to a certain extent. Su et al. [32] believed that different shooting types of rock images had an impact on the final accuracy. Their research work inputted three types of rock images into three identical CNN, respectively, including the plane polarized light image, the cross polarized light image and the image after principal component analysis. The final rock type was determined through the maximum likelihood method based on the results of three CNN. The final accuracy reached 89.97%. Seo et al. [33] researched the impact of local images on the identification accuracy. They believed that the features in local areas in the rock image were more representative and definite. They proposed a model based on image segmentation. The large image was divided into several small parts, which were input into CNN in turn. The final lithology category was the one with the highest quantity of local identification results. Xu et al. [34] researched the impact of the fusion of image and data features on rock identification. Their research work proposed a fusion identification method, which inputs image features and parts of lithological data into the full connection layer of CNN. The final results showed that this method can improve the identification accuracy. Zhang et al. [15] researched the effect of different classifiers on the performance of rock identification models. They identified and classified the rock image features extracted from CNN by using five classifiers: logical regression (LR), support vector machine (SVM), random forest (RF), k-nearest neighbors (KNN), multilayer perceptron (MLP) and gaussian naive Bayes (GNB). The final result showed that the classification performance of the logistic regression, support vector machine and multilayer perceptron was better than other methods. However, the above research still has some deficiencies. Current research mainly focuses on the function of convolution layers, full connection layers and optimization algorithms of CNN, ignoring the effect of pooling layers on the performance of rock identification. Current pooling layers only play the role of down sampling. The current pooling methods used are main max pooling and average pooling. They are static and can only extract less information about rock features. Max pooling is sensitive to mutation of pixels, so it can retain texture features of rocks, as well as shape and size features of minerals to the maximum extent [35]. Average pooling pays more attention to preserving overall feature information of input images. It can better preserve color features, background features and global combination features of mineral composition in location and content [29]. However, internal structures and mineral morphology of rocks contain dozens or even hundreds of features, and the features required to distinguish different rocks are distinctive. Static pooling methods cannot select the best pooling method according to different lithological characteristics. At the same time, current pooling methods can only obtain first-order feature information from a single feature map, ignoring the relationship between feature pairs. Adequate exploration of feature distributions is important for realizing the full potentials of CNN [36].
Therefore, the functions of pooling layers were divided into two types: down sampling and classification, and the adaptive and second-order pooling methods were designed and respectively applied in this paper. The adaptive pooling set pooling kernels for each feature map. The parameters in pooling kernels participated in the training process, and finally they were configured by error feedback. During the process of the second-order pooling, the second-order feature information based on outer products between feature pairs was extracted and finally input into a classifier. After deducing the changing process of forward and back propagation, the proposed pooling methods were embedded into CNN to build an identification model of rock thin section images. Another three models using traditional pooling methods were used for comparison. The performance was comprehensively evaluated with multiple indicators to provide a reliable model for rock identification.

2. Data Collection

The rock thin section images used in this paper were from the symposium on Microscopic Images of Rocks in the open-source database, China Scientific Data [37,38,39,40,41,42]. There were 3374 images collected from the open-source database. Six kinds of common rocks were selected for experiment, including wackestone, granite, schist, quartz sandstone, conglomerate and crystalline dolomite. The sites of rock samples and the production process of rock thin section images were shown in Figure 1a. As shown in Figure 1b, these rock samples came from geological drillings in different regions of China. Through the steps in Figure 1c, rock samples were cut, polished and stuck into rock thin sections, and then they were photographed under polarizing microscopes to obtain microscopic images. Wackestone is a kind of sedimentary rock. The matrix filling of it is plaster, and the particles are mainly bioclasts and cuttings of rocks. Granite is a kind of igneous rock with holocrystalline structures. Its main mineral compositions are quartz, feldspar, and a small amount of mica. Schist is a kind of metamorphic rock with lamellar structures. Its common mineral compositions include quartz, quartzite, and mica. Quartz sandstone is a kind of sedimentary rock with clastic structures. Its mineral composition is mainly quartz, containing a small amount of feldspar. The cement between minerals is calcareous. The diameter of mineral particles is mostly between 0.2 mm and 2 mm. Conglomerate is a kind of sedimentary rock with clastic structures. Its mineral compositions are mainly quartz and feldspar and contains some cuttings of rocks. The cement between minerals is siliceous. The diameter of mineral particles is mostly between 2 mm and 5 mm. Crystalline dolomite is a kind of sedimentary rock with recrystallized structures. The crystalline minerals are mostly quartz, with diameters ranging from 0.005 mm to 0.03 mm. The cement between minerals is siliceous.
To prevent over fitting in the training process and increase robustness of results, several commonly used image enhancement technologies were used to expand the image set, including image rotation, image flip, brightness change, noise addition and histogram equalization [30,33]. The image set contained 5998 images after enhancement. It was roughly divided into the training set, validation set and test set with the ratio of 7:2:1. The specific division details can be seen in Table 1. The pixels of the images in the original set were not uniform, while the CNN model required the input images to be consistent in size, so the bilinear interpolation algorithm was used to unify the image size to 227 × 227 pixels. The process of bilinear interpolation was shown in Figure 2. According to the coordinate defined in Figure 2, the pixel values of the interpolation points can be calculated by Equations (1) and (2); the pixel value of the target point can be calculated by Equation (3).
f ( x , y 0 ) = x 1 x x 1 x 0 f ( x 0 , y 0 ) + x x 0 x 1 x 0 f ( x 1 , y 0 )
f ( x , y 1 ) = x 1 x x 1 x 0 f ( x 0 , y 1 ) + x x 0 x 1 x 0 f ( x 1 , y 1 )
f ( x , y ) = y 1 y y 1 y 0 f ( x , y 0 ) + y y 0 y 1 y 0 f ( x , y 1 )
where f(x,y) is the pixel value of the target point; f(x,y0), f(x,y1) are pixel values of the interpolation point; f(x0,y0), f(x1,y0), f(x0,y1), f(x1,y1) are pixel values of the original points.

3. Proposed Method

3.1. Basic CNN Model

The structure of CNN can be divided into two modules: feature extraction and feature classification, and its overall workflow can be divided into two processes: forward propagation and back propagation (parameters updating). Its basic structure was shown in Figure 3.
Images to be identified were input into the feature extraction module, which was stacked by multiple convolution layers and pooling layers. The module was used to extract the underlying features of images. Then, the extracted features were input into the feature classification module, which was composed of full connection layers and a softmax layer. Its function was to classify the input features. The final output was a vector, in which each element value represented the probability that the image belonged to the corresponding category. Through this process, a forward propagation was completed. Then the sum of the error between calculated results and real values was taken as the total loss of CNN. The partial derivatives of the total loss to parameters were taken as the error sensitivity. The error sensitivity of each layer was calculated by back propagation. Then the parameter values were optimized by optimizers. Through this process, a back propagation process was completed. The gradient descent method was usually used as the optimizer in CNN. Its calculation process was shown in Equation (4). The CNN model completed an iteration through forward and backward propagation. The specific function of each layer was introduced below.
n e w ( w ) = o l d ( w ) η δ w
where δw was the error sensitivity of weight parameter w; η was the learning rate.
The function of convolution layers is to extract feature maps. Convolution kernels slide across the input images from left to right and top to bottom. At each sliding position, the sub-region elements and convolution kernels perform convolution operations as shown in Equation (5). The output results constitute feature maps.
X j l = f ( i M j X i l 1 k i j l + b j l )
where X j l is the jth feature map of the lth layer, k i j l is the convolution kernel of the lth layer, is the convolution operation symbol, which represents the sum of the multiplication of corresponding position elements of two matrixes. The function of activation layers is to increase the nonlinearity of output and make multi-layer stacking meaningful. The commonly used activation functions include ReLU and Sigmoid. The function of pooling layers is to reduce the size of the feature maps. Pooling layers output results through a sliding window similar to convolution layers. Each sliding outputs a special value of the corresponding sub-region. The function of the full connection layers is to classify the input feature data. It can map the input features to the sample tag space and obtain the values of the sample belonging to each category. The calculation process is shown below.
X j l = f ( i X i l 1 · W i + b j l )
The function of the softmax layer is to convert the results from full connection layers into probabilities between 0 and 1. The calculation process is shown below.
p ( x i ) = e x i j = 1 K e x j

3.2. Rock Thin Section Image Identification Model

In this section, the pooling layers were first divided into the sampling pooling layer and the classification pooling layer according to whether the pooling object was the last convolution layer. Then, the adaptive pooling method was designed for the sampling pooling layer, and the second-order pooling method was designed for the classification pooling layer. Then based on the Alexnet [43] framework, these two pooling methods were embedded to design a rock thin section image identification model. Alexnet is a classic CNN model, which won the championship in the IMAGENET-2012 competition. It was widely used in rock image identification because of its small model and fast computation [44], the adaptive pooling and second-order pooling methods were embedded to build a feature extractor, and then the feature extractor was combined with a classifier to construct a rock thin section image identification model (ASOPCNN). The overall structure of the model was shown in Figure 4. Rock thin section images with 227 × 227 pixels were input into the feature extractor, which included 5 convolution layers, 2 adaptive pooling layers and 1 s-order pooling layer. The detailed parameters of the convolution and pooling layers were shown in left bottom of Figure 4. The parameter table in the convolution layer successively represented the kernel size and number, the layer name, the moving step and padding value. The parameter table in the pooling layer successively represented the kernel size and number, the layer name and the pooling step. The second-order pooling layer specifically included five layers as shown in lower right corner of Figure 4. In order to speed up the training process, the 2nd, 4th and 5th convolution layers were set as the upper and lower parts, using two GPU for parallel computing. The activation function used the ReLU. The rock features extracted were input into the classifier. The classifier contained 3 full connection layers and 1 softmax layer. The numbers of neurons in the full connection layer were 4096, 4096, 6, respectively. The probability values belonging to each rock category were output as the results.

3.3. Adaptive Pooling Method

The purpose of sampling pooling layers is to reduce the size of feature maps [45]. The traditional pooling methods are mainly max pooling, mean pooling and stochastic pooling. Their pooling processes were shown in Figure 5. With max pooling, the maximum value of each region was output. The average value of each region was output through average pooling. With stochastic pooling, the probability of each element in the region was calculated, and then the output was selected randomly according to the magnitude of probability [20]. However, pooling methods have certain selectivity in the process of representing rock features. Figure 6 shows the selectivity of two pooling methods. It can be seen in Figure 6a that texture features became more prominent after max pooling, while the number of textures decreased, and the structure becomes fuzzy after average pooling. It showed that max pooling had a strengthening effect on texture features, while average pooling had a restraining effect. Figure 6b shows the pooling process of wackestone. From the color histogram, we can see that color values of the original image were concentrated in the range of 50 to 200 but are relatively dispersed. After average pooling, the value range remained unchanged, and the aggregation degree was higher. It indicated that some disturbing color information was lost after average pooling. However, the color range changed after max pooling. It indicated that the original color feature had been lost, which also proved the inapplicability of max pooling to color features.
No matter what traditional pooling method is used, all feature maps use a single sampling way, which will inevitably cause loss of feature information and affect performance of CNN. An adaptive pooling method was proposed in this paper, and its pooling process was shown in Figure 7. For input feature maps, firstly, the pooling kernels were deployed for each feature map; each kernel was independent of each other. Then, initial values were set randomly between 0 and 1 for each kernel. Finally, convolutional results were calculated between the feature maps and the pooling kernels through Equation (8).
O = i = 1 m j = 1 n I i j · W i j i = 1 m j = 1 n W i j
where O was the output value; I was the input value; W was the parameter of pooling kernels. The parameters were adaptively adjusted, and the process can be divided into the following four steps, as shown in the right side of Figure 7.
Step 1: Calculate total loss L. The total loss was the difference between real values and actual output values.
Step 2: Calculate partial derivatives. The partial derivatives of total loss to each parameter of pooling kernels were calculated layer-by-layer through back propagation.
Step 3: Update parameters. According to the partial derivatives and the learning rate, new parameters were calculated through the gradient descent method.
Step 4: Limit parameter boundary. If the calculated parameter was bigger than 1, then it was equal to 1; if the calculated parameter was smaller than 0, then it was equal to 0.
The adaptive pooling method affected the forward and back propagation process of CNN. Its forward propagation was the convolutional calculation of the target feature maps with the pooling kernels. The result feature maps can be calculated by Equation (9).
y i , j = m = 1 b n = 1 b w m , n · x s · ( i 1 ) + m , s · ( j 1 ) + n
where xi,j and yi,j, respectively, represented the values in row i and column j of the target feature maps and the result feature maps; wm,n represented the value in row m and column n of pooling kernels; b represented the size of pooling kernels; s represented the step size.
The back propagation process needed to calculate δxi,j and δwi,j through Equations (10) and (11). δxi,j was the error sensitivity of target feature maps, which represented the partial derivative of the total loss to each element in the target feature maps. In the same way, δwi,j was the error sensitivity of pooling kernels, which was used to calculate error sensitivity of front layers.
δ x i , j = m = 1 c n = 1 c δ y m , n · w i s · m + s , j s · n + s · f [ i s · ( m 1 ) ] · f [ j s · ( n 1 ) ]
δ w i , j = m = 1 c n = 1 c δ y m , n · x s · ( m 1 ) + i , s · ( n 1 ) + j
where δym,n was the error sensitivity of the result feature map; c was the size of the result feature maps. f(x) was a judgement function, which equaled to 1 while 0 < x < b and equaled to 0 for else. The detailed derivation process of forward and back propagation can be seen in Appendix A.

3.4. Second-Order Pooling Method

The classification pooling layer is connected behind the last convolution layer. Its purpose is to integrate feature maps into a compact global feature. Then the global feature is input into a full connection layer for classification. The process of integrating feature maps for traditional pooling methods was shown in Figure 8. Each feature map was expanded into a vector after down sampling, then all vectors were connected to form a global feature vector in turn. However, this method only counted first-order feature information. Because in the integration process, each feature map was independent and uncorrelated. As a result, the ability of feature representation was limited.
Carreira et al. [46] proposed a method based on outer products to extract second-order feature information. However, they only used it for image segmentation. Based on their work of them, the second-order pooling method was proposed in this paper by embedding the second-order information extraction process into CNN and deriving its forward and back propagation. The process of the second-order pooling was shown in Figure 9. The whole process can be divided into the following six steps.
Step 1: Combine and split. For feature maps of the last convolution layer, they were combined to form a feature block; then, the feature block was divided into local feature vectors in sequence from left to right and top to bottom according to the element position. The local feature vectors contained information of all feature maps at a single position.
Step 2: Calculate outer products. The local feature vectors were taken as the main body, then outer products of each local feature vector were calculated to obtain the second-order feature matrixes.
Step 3: Choose upper triangle part. All second-order feature matrixes were combined to form a second-order feature block, and then the block was divided into second-order feature vectors in sequence from left to right and top to bottom according to the element position. To avoid duplication, only the upper triangle part of the block was retained. Each second-order feature vector contained information of all positions in each original feature map.
Step 4: Convert vectors into matrixes. Each second-order feature vector was converted into a second-order feature map in sequence from left to right and top to bottom according to the size of the original feature map.
Step 5: Global average pooling. The second-order feature maps were down sampled to obtain the global feature vector through global average pooling.
Step 6: Feature selection. Because the positive excitation function, such as ReLU and Sigmoid functions, was generally used in the process of feature extraction, the numerical difference of elements can reflect the importance of feature information. The second-order pooling can further expand this difference. In order to reduce the number of parameters and speed up the training process, a part of feature information was selected in sequence from large to small according to the importance of feature information. The number of features as a super parameter needed to be preset.
The second-order pooling method affected forward and back propagation of CNN. In forward propagation process, the elements in the second-order feature matrixes, second-order feature maps and global feature vectors can be, respectively, obtained by Equations (12)–(14).
y i , j , k = m = 1 a n = 1 a x m , n , i · x m , n , j · I [ a ( m 1 ) + n = k ]
z α , β , γ = k = 1 a 2 i = 1 b j = 1 b y i , j , k · I [ b ( i 1 ) + j i 2 i 2 = γ ] · I [ c ( α 1 ) + β = γ ]
Z p = α = 1 c β = 1 c z α , β , p α · β
where xm,n,l represented the element in row m and column n of the ith original feature map; yi,j,k represented the element in row I and column j of the kth second-order feature matrix; zα,β,γ represented the element in row α and column β of the γth second-order feature map; Zp represented the value of the pth element in the global feature vector; a, b, c, respectively, represented the size of the original feature maps, the second-order feature matrixes and the second-order feature maps; I (condition) was a judgment function; if the condition was met, then I = 1, otherwise I = 0.
In the back propagation process, since there were no parameters to be updated, only elements of error sensitivity were required to calculate. The error sensitivity of the second-order feature maps, the second-order feature matrixes and the original feature maps can be, respectively, obtained by Equations (15)–(17).
δ z α , β , γ = δ Z γ α · β
δ y i , j , k = α = 1 c β = 1 c δ z α , β , b · ( i 1 ) + j ( i 2 i ) / 2 · I [ b ( α 1 ) + β = k ]
δ x m , n , l = i = 1 b j = 1 b δ y i , j , s 2 · ( m 1 ) + n · x m , n , j · I ( i = l ) · [ I ( i = m ) · I ( j = n ) + 1 ]
where δZγ represented the error sensitivity of the global feature vector; δzα,β,γ represented the error sensitivity of the second-order feature maps; δyi,j,k represented the error sensitivity of the second-order feature matrixes; δxm,n,l represented the error sensitivity of the original feature maps. The detailed derivation process of the forward and back propagation can be seen in Appendix B.

3.5. Evaluation Metrics

In order to effectively evaluate classification performance of CNN models, several indicators were used, including accuracy rate (ACC), precision rate (PRE), recall rate (REC), F1 score (F1), confusion matrix and the receiver operating characteristic curve (ROC) [47,48,49]. The confusion matrix was shown in Figure 10, through which the identification effect of CNN models can be observed intuitively. In the confusion matrix, each row is the prediction labels of samples, and each column is the real labels of samples. P (Positive) represents the prediction label is positive; N (Negative) represents the prediction label is negative; T (Ture) represents the sample is predicted correctly; F (False) represents the sample is predicted wrongly.
The accuracy rate (ACC) represents the proportion of correct classification, which can be calculated by Equation (18). The precision rate (PRE) represents the proportion of correct classification in labels which were predicted to be positive, which can be calculated by Equation (19). The recall rate (REC) refers to the proportion of correct classification in labels which were realistically positive, which can be calculated by Equation (20). F1 score is the harmonic mean value of PRE and REC, which can be calculated by Equation (21).
A C C = T P + T N T P + T N + F P + F N
P R E = T P T P + F P
R E C = T P T P + F N
F 1 = 2 T P 2 T P + F P + F N
The receiver operating characteristic curve (ROC) can show the performance of models under different classification thresholds. The ROC curve uses the false positive rate (FPR) as the abscissa and the true rate (TPR) as the ordinate, which can be calculated by Equations (22) and (23), respectively. AUC is the area under the curve, and large AUC value indicates good performance of models. When AUC ≤ 0.5, prediction result is less effective than random guess, and the model has no predictive worth.
F P R = F P T N + F P
T P R = T P T P + F N

4. Results and Discussions

The program of the proposed model was written based on Matlab language and the deep learning library in Matlab 2021b. It was 32 k in size. The experimental process was carried out on the server with a 64-core CPU, a 192 G RAM and the Linux operating system, which belonged to the Super Cloud Computing Center in Beijing, China. The average cost of per image during training was 1.257 s. In order to verify the effectiveness of the adaptive and second-order pooling methods, the original model of Alexnet, which used max pooling, was used for comparative verification. It was recorded as MAXCNN. On this basis, all pooling layers of the Alexnet were replaced with average pooling and stochastic pooling respectively to build another 2 models, which were recorded as MEACNN and STOCNN respectively. The structures of four models were shown in Table 2. The experiment mainly included the following three aspects: (1) the identification results on the training and validation set; (2) the identification results on the test set; (3) identification performance of various models in each rock category.
In order to speed up the training process, the initial parameters were set by means of transfer learning. The pre-training parameters in the original Alexnet were applied to the convolution layers of the four models. The training set contained 4212 pictures, and the validation set contained 1200 pictures. Mini-batch method was used in the training process, and mini-batch gradient descent method was used to update parameters. 70 pictures were trained in each batch, with 60 iterations in each round. one verification operation was performed every 10 iterations. The learning rate was set to 5 × 10−4. The feature selection number of second-order pooling layer was set to 9216. A total of 30 cycles, 1800 iterations, were conducted in the whole process.

4.1. Training and Validation Results

The total loss is a non-negative function used to measure the difference between predicted values and real values. The smaller the total loss is, the better the training effect and robustness are. Figure 11 showed the loss decline of four models in the training process. In order to compare different decline processes conveniently, the initial loss of ASOPCNN was taken as the benchmark, and the loss values of the other models were divided by the benchmark value for unified processing. It can be seen from Figure 11a that the total loss of ASOPCN on the training and verification sets was the smallest, indicating that the training effect of ASOPCN was the best. It can be seen from Figure 11b that the difference between ASOPCN and the other three models was that the loss fluctuation was more obvious. This was because the pooling layers of ASOPCN contained random initial parameters. The quality of these parameters had a greater impact on the training performance. Figure 12 showed the changing process of model accuracy. It can be seen that the change rate of ASOPCNN was slower than that of STOCNN at the initial stage. With the increase in the iteration number, the rate of ASOPCNN exceeded that of the other three models and became the largest. In order to quantitatively compare the final convergence accuracy, the mean accuracy of the last round was taken as the final accuracy, and the results were shown in Table 3. The ASOPCNN model had the highest accuracy on both training and verification sets, which were 0.9286 and 0.8671, respectively. It showed that adaptive and second-order pooling methods were helpful for optimization of training process.

4.2. Testing Results

The performance of models on unknown test sets can represent their actual application effects. There were 586 images of six rock types in the rock image test set. Figure 13 showed the accuracy of the four models on the test set. The accuracy of ASOPCNN was 0.8908, higher than MAXCNN (0.6911), MEACNN (0.6297) and STOCNN (0.7696), indicating that the adaptive and second-order pooling methods were more suitable for CNN than traditional pooling methods. In order to visualize identification results, we randomly selected an image from each rock type in the test set for display, as shown in Figure 14. The identification probability corresponding to each image was shown in Table 4. It can be seen that the identification accuracy of each type of rock images was higher than 0.8, which indicated that the ASOPCN had high identification confidence coefficients and can be used as a reliable model for rock thin section image identification.
Confusion matrixes can show the identification performance of each specific category. The confusion matrix obtained by the four models was shown in Figure 15. The elements on the diagonal line were correct identification numbers of each rock type. The correct number of rock images identified by ASOPCNN for each type were 99, 81, 76, 83, 90 and 93, which were higher than the other three models. From color depths of elements in the confusion matrix, it can be seen that the probability of misclassification between conglomerate and quartz sandstone was greater. From Figure 14d,e, conglomerate and quartz sandstone were very similar. Their main mineral compositions were quartz and feldspar, and their structures were clastic structures. The main difference was the mineral particle size. The numbers of misclassified samples in each category of the ASOPCNN were less than that of the other three models, indicating that the ASOPCNN can distinguish some image samples that were difficult for the others to a certain extent.
Figure 16 showed the comparison of the precision, recall and f1 score on the test set. It can be seen that the evaluation indicators of all models showed a similar trend of first falling and then rising, which indicated that the wackestone and crystalline dolomite were easier to identify than the other four types. In Figure 16a, ASOPCNN showed better performance in the recall rate. From Figure 16b, we can see that the precision rates of ASOPCNN in wackstone, schist, sandstone and conglomerate were higher. Figure 15c showed that ASOPCCNN was higher in F1 score. From the above results, it can be seen that ASOPCCNN had superior classification performance on the whole. Figure 17 showed the ROC curve of the four models on six rock categories. It can be seen that the AUC value of the ASOPCN was the highest in the categories of wackestone, schist, quartz sandstone, conglomerate and crystalline dolomite. It indicated that the ASOPCNN had the better performance. In Figure 17b, although the AUC value of the ASOPCNN was lower than that of the STOCNN for the granite category, the correct identification number was greater in Figure 15a,d. It indicated that the identification probability values of ASOPCNN were slightly lower than that of STOCNN.

4.3. Discussion and Analysis

According to the above results, the pooling layers of CNN can be divided into sampling pooling layers and classification pooling layers, and the adaptive pooling and second-order pooling methods can be used respectively. It can provide better performance than using traditional pooling methods. This was the explain of experimental results. In fact, from the theoretical analysis, the performance of adaptive and second-order pooling methods was generally not worse than traditional pooling methods. This is because the adaptive pooling method can be transformed into the traditional pooling methods through training, and the second-order pooling method completely contains all the information of the traditional pooling methods.
Adaptive pooling can be seen as a collection of multiple pooling methods. In the adaptive pooling method, a pooling kernel was configured for each feature map. The calculation process is expressed as Equation (8). The parameters in the pooling kernel can be adjusted according to propagation errors. When the parameters are equal, the Equation (8) can transform into the Equation (24), representing the average pooling.
O = i = 1 m j = 1 n I i j m n
When there was only one non-zero value in pooling kernels, if the non-zero value corresponded to the maximum value, then the Equation (8) can transform into the Equation (25), representing the max pooling.
O = max { I i j } , ( i = 1 , 2 , , m ; j = 1 , 2 , , n )
When the non-zero value did not correspond to the maximum value, the Equation (8) can transform into the Equation (26), representing the stochastic pooling.
O = r a n d { I i j } , ( i = 1 , 2 , , m ; j = 1 , 2 , , n )
When the parameter had more than one non-zero value, the adaptive pooling can also be regarded as other non-special pooling methods. Therefore, the adaptive pooling method can be transformed into the traditional pooling methods through training.
The purpose of classification pooling layer is to integrate feature map information to form a global vector. The integration process of the traditional pooling methods was shown in Figure 8. It only counted first-order feature information. Each feature map was independent and uncorrelated. The integration process of the second-order pooling methods was shown in Figure 18. Second-order pooling was to add more feature information on the basis of guaranteeing the original first-order feature information. Outer products between feature pairs were calculated to obtain second-order correlation. As shown in the yellow diagonal part of the second-order feature block, the diagonal elements were the square of the original first-order feature elements, the feature information carried by them remained unchanged. It can be seen that the first-order feature information extracted by the traditional methods took a small proportion of the information extracted by the second-order pooling method. The upper triangle part of the second-order feature block carried the second-order feature information of feature pairs, which had better feature representation ability than first-order feature information [50]. Therefore, the second-order pooling method completely contained all the information of the traditional pooling methods.

5. Conclusions

In this paper, the shortcomings of traditional pooling methods in the process of rock identification were analyzed, and the adaptive pooling and second-order pooling methods were proposed. The theoretical advantages of these two pooling methods applied to the CNN were analyzed, and then the changing process of forward and back propagation of these two pooling methods was deduced. On the basis of a visual CNN framework, Alexnet, a rock identification model called ASOPCNN was constructed by using the two pooling methods. There were 5998 images for model training, including 6 common rock categories. The rock images were collected from the symposium on Microscopic Images of Rocks in the open-source database, China Scientific Data. Three models were set for comparison, which respectively used the max pooling, average pooling and stochastic pooling methods. The accuracy rate, precision rate, recall rate, F1 score, and AUC values were used to evaluate the model performance. In the results, the total loss of the ASOPCNN is minimum when the training process converges, and the accuracy of ASOPCNN model in training and validation sets are 92.86% and 86.71%, respectively. In the test set of 586 pictures, the identification accuracy of ASOPCNN reaches 89.08%. It is at least 12% higher than the models using traditional pooling methods. The precision rate, recall rate, F1 score and AUC values of the ASOPCNN are 89.33%, 89.07%, 0.8914 and 0.8934, respectively. The experimental results show that the performance of the ASOPCNN is better than the other three models using traditional pooling methods. It also shows that the ASOPCNN could be a reliable model in rock identification of rock thin section images. In addition, among the six rock types in this paper, the wackestone and crystalline dolomite are easier to distinguish than granite, schist, conglomerate and quartz sandstone. Conglomerate and quartz sandstone are more likely to be misclassified because they are similar in mineral compositions and structures. The adaptive and second-order pooling methods were generally not worse than the traditional pooling methods. This is because the adaptive pooling method can be transformed into the traditional pooling methods through training, and the second-order pooling method completely contains all the information of the traditional pooling methods.

Author Contributions

Conceptualization, Z.Z.; Methodology, H.Y.; Supervision, Z.Z. and X.C.; Data curation, H.Y.; Formal analysis, H.Y.; Funding acquisition, Z.Z. and X.C.; Validation, Z.Z. and X.C.; Writing—original draft, H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Key Research and Development Program of China (Grant No.: 2022YFC2903901) and the National Natural Science Foundation of China (Grant No.: 52274249).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this paper can be obtained from the symposium on Microscopic Images of Rocks in the open-source database, China Scientific Data.

Acknowledgments

We would like to acknowledge the editors and reviewers for their invaluable comments.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The specific derivation process of forward and back propagation of adaptive pooling method is as follows. Since each feature map performs independent but identical pooling operations, only a single feature map is calculated. The feature map x = [ x 11 x 21 x a 1 x 12 x 22 x a 2 x 1 a x 2 a x a a ] , called the target feature map, is obtained from the convolution layer. The size of the target feature map is a. The adaptive pooling method is used for down sampling operation. The pooling kernel is w = [ w 11 w b 1 w 1 b w b b ] , where b is the size of the pooling kernel. The result feature map obtained after pooling operation is y = [ y 11 y 12 y 1 c y 21 y 22 y 2 c y c 1 y c 2 y c c ] , where c is the size of the result feature map. The element of y can be calculated by Equation (A1).
y 11 = w 11 x 11 + w 12 x 12 + w 13 x 13 + + w 33 x 33 + + w b b x b b y 12 = w 11 x 1 , 1 + s + w 12 x 1 , 2 + s + w 13 x 1 , 3 + s + + w 33 x 3 , 3 + s + + w b b x b , b + s y 21 = w 11 x 1 + s , 1 + w 12 x 1 + s , 2 + w 13 x 1 + s , 3 + + w 33 x 3 + s , 3 + + w b b x b + s , b y 22 = w 11 x 1 + s , 1 + s + w 12 x 1 + s , 2 + s + w 13 x 1 + s , 3 + s + + w 33 x 3 + s , 3 + s + + w b b x b + s , b + s
The above equations can be equivalently rewritten as a convolution calculation equation as follows.
y i , j = m = 1 b n = 1 b w m , n · x s · ( i 1 ) + m , s · ( j 1 ) + n
where x is regarded as a Convolution operator; yi,j represents the element in the row i and column j of the result feature map. During the back propagation, the partial derivative matrix of total loss L to elements of pooling kernels is δ w i , j = [ δ w 11 δ w b 1 δ w 1 b δ w b b ] . Variables δ y i , j = [ δ y 11 δ y 12 δ y 1 c δ y 21 δ y 22 δ y 2 c δ y c 1 δ y c 2 δ y c c ] and δ x i , j = [ δ x 11 δ x 21 δ x a 1 δ x 12 δ x 22 δ x a 2 δ x 1 a δ x 2 a δ x a a ] can be obtained from similar definitions. The Equation (A3) can be obtained from the chain derivation rule.
L w i , j = L y i , j · y i , j w i , j
The relationship between yi,j and wi,j is shown in Equation (A1). After the derivation of both sides of Equation (A1), Equation (A4) can be obtained by combining Equation (A3).
δ w 11 = δ y 11 x 11 + δ y 12 x 1 , 1 + s + δ y 21 x 1 + s , 1 + δ y 22 x 1 + s , 1 + s + δ w 12 = δ y 11 x 12 + δ y 12 x 1 , 2 + s + δ y 21 x 1 + s , 2 + δ y 22 x 1 + s , 2 + s + δ w 13 = δ y 11 x 13 + δ y 12 x 1 , 3 + s + δ y 21 x 1 + s , 3 + δ y 22 x 1 + s , 3 + s + δ w 33 = δ y 11 x 33 + δ y 12 x 3 , 3 + s + δ y 21 x 3 + s , 3 + δ y 22 x 3 + s , 3 + s + δ w b b = δ y 11 x b b + δ y 12 x b , b + s + δ y 21 x b + s , b + δ y 22 x b + s , b + s +
Equation (A4) can be equivalently written as the following matrix form.
[ δ w 11 δ w b 1 δ w 1 b δ w b b ] = [ x 11 x 21 x a 1 x 12 x 22 x a 2 x 1 a x 2 a x a a ] [ δ y 11 0 0 δ y 21 δ y c 1 0 0 0 0 0 0 0 0 0 0 δ y 12 0 0 δ y 22 δ y c 2 δ y 1 c 0 0 δ y 2 c δ y c c ]
The upper equation is recorder as δ w i , j = δ y i , j A , where is the convolution operation symbol. Matrix A is a convolution operator. It is formed by inserting zero vectors with the same dimension between each row and column of the matrix. The number of inserting zero vectors was s-1. Through induction and summary, Equation (A5) can be equivalent to the following equation.
δ w i , j = m = 1 c n = 1 c x s · ( m 1 ) + i , s · ( n 1 ) + j · δ y m , n
where δym,n is the convolution operator. Parameters of the pooling kernel can be updated by gradient descent algorithm as follows.
w i , j = w i , j η · δ w i , j
where η is the learning rate. If wi,j1, then wi,j = 1; if wi,j0, then wi,j = 0. The Equation (A8) can be obtained from the chain derivation rule.
L x i , j = L y i , j · y i , j x i , j
To facilitate inductive solution, let s = 2. After the derivation of both sides of Equation (A1), Equation (A9) can be obtained by combining Equation (A8).
δ x 11 = δ y 11 w 11 δ x 12 = δ y 11 w 12 δ x 13 = δ y 11 w 13 + δ y 12 w 11 δ x 21 = δ y 11 w 21 δ x 22 = δ y 11 w 22 δ x 23 = δ y 11 w 23 + δ y 12 w 21 δ x 31 = δ y 11 w 31 + δ y 21 w 11 δ x 32 = δ y 11 w 32 + δ y 21 w 12 δ x 33 = δ y 11 w 33 + δ y 12 w 31 + δ y 21 w 13 + δ y 22 w 11
Through induction and summary, Equation (A9) can be equivalently rewritten as the following matrix form in the case of s = 2.
[ δ x 11 δ x 21 δ x a 1 δ x 12 δ x 22 δ x a 2 δ x 1 a δ x 2 a δ x a a ] = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 w 11 w 1 b 0 0 0 0 0 0 0 0 w b 1 w b b 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] [ δ y c c δ y 2 c 0 δ y 1 c δ y c 2 δ y 22 0 δ y 12 0 0 0 0 δ y c 1 δ y 21 0 δ y 11 ]
The upper equation is recorded as δ x i , j = B C . Matrix B is composed of matrix W with an appropriate number of 0 in the outer circle. Matrix C is a convolution operator. It was formed by rotating the matrix elements by 180 degrees and then inserting an appropriate number of zero vectors between every two columns as same as every two rows. The numbers of zero filling in the outer of matrix B and zero vectors inserted in matrix C are related to the pooling step s and the size c of the result feature map. The calculation formula is as follows.
α = s ( c 1 ) β = s 1
Through induction and summary, Equation (A10) can be equivalent to the Equation (A12).
δ x i , j = m = 1 c n = 1 c w i s · m + s , j s · n + s · δ y m , n · f [ i s · ( m 1 ) ] · f [ j s · ( n 1 ) ]
where f ( x ) is the judgement function, which equals to 1 while n < x < b and equals to 0 for else.

Appendix B

The specific derivation process of forward and back propagation of second-order pooling method is as follows. The lth original feature map calculated from the last convolution is x l = [ x 1 , 1 , l x 2 , 1 , l x a , 1 , l x 1 , 2 , l x 2 , 2 , l x a , 2 , l x 1 , a , l x 2 , a , l x a , a , l ] , ( l = 1 , 2 , 3 , , s 1 ) , where a is the size of the original feature maps; s1 is the number of feature maps contained. The original feature maps are split from left to right and top to bottom. Then the elements at the same position in all the original feature maps are integrated to form the local first-order feature vector, which is recorded as A = ( x i , j , 1 , x i , j , 2 , x i , j , 3 , , x i , j , s 1 ) , ( i , j = 1 , 2 , , a ) . The outer products of all local first-order feature vectors are calculated according to the following equation.
[ x i , j , 1 2 x i , j , 2 x i , j , 1 x i , j , 3 x i , j , 1 x i , j , s 1 x i , j , 1 x i , j , 1 x i , j , 2 x i , j , 2 2 x i , j , 3 x i , j , 2 x i , j , s 1 x i , j , 2 x i , j , 1 x i , j , 3 x i , j , 2 x i , j , 3 x i , j , 3 2 x i , j , s 1 x i , j , 3 x i , j , 1 x i , j , s 1 x i , j , 2 x i , j , s 1 x i , j , 3 x i , j , s 1 x i , j , s 1 x i , j , s 1 ] = ( x i , j , 1 , , x i , j , s 1 ) T · ( x i , j , 1 , , x i , j , s 1 ) , ( i , j = 1 , 2 , , a )
The above equation is recorded as y = ATA. y is the outer product of the local first-order feature vector, called the second-order feature matrix. The kth second-order feature matrix can also be recorded as y k = [ y 1 , 1 , k y 2 , 1 , k y 3 , 1 , k y b , 1 , k y 1 , 2 , k y 2 , 2 , k y 3 , 2 , k y b , 2 , k y 1 , 3 , k y 2 , 3 , k y 3 , 3 , k y b , 3 , k y 1 , b , k y 2 , b , k y 3 , b , k y b , b , k ] ,   ( k = 1 , 2 , 3 , , s 2 ) , where b is the size of the second-order feature matrix; s2 is the number of feature maps contained. Combining Equation (A13), the relationship between the second-order feature matrixes and the original feature maps can be constructed as follows.
y i , j , k = m = 1 a n = 1 a x m , n , i · x m , n , j · I [ a ( m 1 ) + n = k ] , ( i , j = 1 , 2 , , b ; k = 1 , 2 , , s 2 )
where I(condition) is the judgment function. If the condition is met, then I = 1; otherwise I = 0. It is not difficult to see the following variable relations.
b = s 1 s 2 = a 2
The second-order feature matrixes are split from left to right and top to bottom. Then the elements at the same position in all second-order feature matrices are integrated to form the second-order feature vectors, which can be recorded as B = ( y i , j , 1 , y i , j , 2 , y i , j , 3 , , y i , j , s 2 ) ( i , j = 1 , 2 , 3 , , b ) . According to Equation (A15), all second-order feature vectors can be rewritten into matrix form as follows. Because the second-order feature matrixes are symmetric, only the upper triangular part is converted into the corresponding second-order feature vectors.
( y i , j , 1 , y i , j , 2 , y i , j , 3 , , y i , j , s 2 ) [ y i , j , 1 y i , j , a + 1 y i , j , 2 a + 1 y i , j , a ( a 1 ) + 1 y i , j , 2 y i , j , a + 2 y i , j , 2 a + 2 y i , j , a ( a 1 ) + 2 y i , j , 3 y i , j , a + 3 y i , j , 2 a + 3 y i , j , a ( a 1 ) + 3 y i , j , a y i , j , 2 a y i , j , 3 a y i , j , a 2 ]
The matrix transformed from the second-order feature vector is called the second-order feature map, and the γth second-order feature maps can be recorded as z γ = [ z 1 , 1 , γ z 2 , 1 , γ z 3 , 1 , γ z c , 1 , γ z 1 , 2 , γ z 2 , 2 , γ z 3 , 2 , γ z c , 2 , γ z 1 , 3 , γ z 2 , 3 , γ z 3 , 3 , γ z c , 3 , γ z 1 , c , γ z 2 , c , γ z 3 , c , γ z c , c , γ ] , ( γ = 1 , 2 , 3 , , s 3 ) , where c is the size of the second-order feature maps; s3 is the number of feature maps contained. The relationship between the second-order feature maps and the second-order feature matrixes can be constructed by combining Equation (A16).
z α , β , γ = k = 1 s 2 i = 1 b j = 1 b y i , j , k · I [ b ( i 1 ) + j i 2 i 2 = γ ] · I [ c ( α 1 ) + β = γ ] , ( α , β = 1 , 2 , , c ; γ = 1 , 2 , , s 3 )
It is not difficult to see the following variable relations.
c = a s 3 = b 2 + b 2
The second order feature matrixes are down sampling to obtain the global feature vector. It can be recorded as Z = ( Z 1 , Z 2 , Z 3 , , Z s 4 ) , where s4 is the number of vectors contained. The calculation equation is as follows.
Z p = α = 1 c β = 1 c z α , β , p α · β , ( p = 1 , 2 , 3 , , s 4 )
In the back propagation process, since the entire pooling process doesn’t contain parameters, only the derivative of the total loss to each feature map is required. The derivative of the total loss to the global feature vector is δ Z = ( δ Z 1 , δ Z 2 , δ Z 3 , , δ Z s 4 ) , which is obtained through the fully connected layer. The error of the global feature vector is back propagated to the second-order feature matrixes, and the relationship between them is shown in Equation (A19). Then the error sensitivity in the second-order feature maps, recorded as δ z γ = [ δ z 1 , 1 , γ δ z 2 , 1 , γ δ z 3 , 1 , γ δ z c , 1 , γ δ z 1 , 2 , γ δ z 2 , 2 , γ δ z 3 , 2 , γ δ z c , 2 , γ δ z 1 , 3 , γ δ z 2 , 3 , γ δ z 3 , 3 , γ δ z c , 3 , γ δ z 1 , c , γ δ z 2 , c , γ δ z 3 , c , γ δ z c , c , γ ] , ( γ = 1 , 2 , 3 , , s 3 ) , can be calculated as follows.
δ z α , β , γ = δ z γ α · β , ( α , β = 1 , 2 , 3 , , c ; γ = 1 , 2 , 3 , , s 3 )
It can be seen from Equation (A16) that the second-order feature maps need to convert into vectors when the error is transmitted from them to the second-order feature vectors. Therefore, the second-order feature maps are converted into the second-order feature vectors in the following way.
[ δ z 1 , 1 , γ δ z 2 , 1 , γ δ z 3 , 1 , γ δ z c , 1 , γ δ z 1 , 2 , γ δ z 2 , 2 , γ δ z 3 , 2 , γ δ z c , 2 , γ δ z 1 , 3 , γ δ z 2 , 3 , γ δ z 3 , 3 , γ δ z c , 3 , γ δ z 1 , c , γ δ z 2 , c , γ δ z 3 , c , γ δ z c , c , γ ] ( δ z 1 , 1 , γ , δ z 1 , 2 , γ , , δ z c , c , γ ) , ( γ = 1 , 2 , 3 , , s 3 )
All the second-order feature vectors are combined into the second-order feature matrixes in order, and the error remains unchanged. Because the second-order feature matrixes are symmetric and the elements of the lower triangular part of the matrixes are not involved in the calculation, the error of the elements of the lower triangular part equals to 0. The equation for calculating the error sensitivity of the second-order feature matrixes can be deduced as follows by replacing δz with δy in Equation (A22).
δ y i , j , k = { α = 1 c β = 1 c δ z α , β , b ( i 1 ) + j ( i 2 i ) / 2 · I [ b ( α 1 ) + β = k ] , i < j 0 , i > j
The second-order feature matrixes back propagate the error to the local first-order feature vectors. The relationship between them is shown in Equation (A13). The error sensitivity of the local first-order feature vectors can be obtained by derivation of Equation (A13). In the process of transforming the local first-order feature vectors into the original feature maps, the element value does not change but the position changes. Finally, the relationship between the error sensitivity of the second-order feature matrixes and original feature maps can be established as follows.
δ x m , n , l = i = 1 b j = 1 b δ y i , j , s 2 · ( m 1 ) + n · x m , n , j · I ( i = l ) · [ I ( i = m ) · I ( j = n ) + 1 ] , ( m , n = 1 , 2 , , a ; l = 1 , 2 , , s 1 )

References

  1. Xu, Z.; Ma, W.; Lin, P.; Hua, Y. Deep learning of rock microscopic images for intelligent lithology identification: Neural network comparison and selection. J. Rock. Mech. Geotech. Eng. 2022, 14, 1140–1152. [Google Scholar] [CrossRef]
  2. Liu, N.; Huang, T.; Gao, J.; Xu, Z.; Wang, D.; Li, F. Quantum-Enhanced Deep Learning-Based Lithology Interpretation from Well Logs. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4503213. [Google Scholar] [CrossRef]
  3. Pi, Z.; Zhou, Z.; Li, X.; Wang, S. Digital image processing method for characterization of fractures, fragments, and particles of soil/rock-like materials. Mathematics 2021, 9, 815. [Google Scholar] [CrossRef]
  4. Zhou, Z.; Lu, J.; Cai, X.; Rui, Y.; Tan, L. Water saturation effects on mechanical performances and failure characteristics of rock-concrete disc with different interface dip angles. Constr. Build. Mater. 2022, 324, 126684. [Google Scholar] [CrossRef]
  5. Yin, J.; Lu, J.; Tian, F.; Wang, S. Pollutant Migration Pattern during Open-Pit Rock Blasting Based on Digital Image Analysis Technology. Mathematics 2022, 10, 3205. [Google Scholar] [CrossRef]
  6. Zhou, Z.; Zang, H.; Cao, W.; Du, X.; Chen, L.; Ke, C. Risk assessment for the cascading failure of underground pillar sections considering interaction between pillars. Int. J. Rock. Mech. Min. Sci. 2019, 124, 104142. [Google Scholar] [CrossRef]
  7. Xue, Y.; Li, X.; Li, G.; Qiu, D.; Gong, H.; Kong, F. An analytical model for assessing soft rock tunnel collapse risk and its engineering application. Geomech. Eng. 2020, 23, 441–454. [Google Scholar]
  8. Li, D.; Zhao, J.; Ma, J. Experimental Studies on Rock Thin-Section Image Classification by Deep Learning-Based Approaches. Mathematics 2022, 10, 2317. [Google Scholar] [CrossRef]
  9. Zhang, B. Application of thin section micro-image in identification of rock. Petrochemical. Ind. Technol. 2016, 23, 108. [Google Scholar]
  10. Xu, C.; Li, L.; Zhang, J.; He, L.; Zhang, G.; Wang, Y. Application of X-ray Fluorescence Spectrometry and Electron Microprobe in the Identification of Intermediate-Felsic Volcanic Rocks. Rock. Miner. Anal. 2016, 35, 626–633. [Google Scholar]
  11. Singh, N.; Singh, T.; Tiwary, A.; Sarkar, K. Textural identification of basaltic rock mass using image processing and neural network. Comput. Geosci. 2010, 14, 301–310. [Google Scholar] [CrossRef]
  12. Mlynarczuk, M.; Gorszczyk, A.; Slipek, B. The application of pattern recognition in the automatic classification of microscopic rock images. Comput. Geosci. 2013, 60, 126–133. [Google Scholar] [CrossRef]
  13. Chatterjee, S. Vision-based rock-type classification of limestone using multi-class support vector machine. Appl. Intell. 2013, 39, 14–27. [Google Scholar] [CrossRef]
  14. Patel, A.; Chatterjee, S. Computer vision-based limestone rock-type classification using probabilistic neural network. Geosci. Front. 2016, 7, 53–60. [Google Scholar] [CrossRef] [Green Version]
  15. Zhang, Y.; Li, M.; Han, S.; Ren, Q.; Shi, J. Intelligent Identification for Rock-Mineral Microscopic Images Using Ensemble Machine Learning Algorithms. Sensors 2019, 19, 3914. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [Green Version]
  17. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  18. Esteva, A.; Kuprel, B.; Novoa, R. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017, 542, 115–118. [Google Scholar] [CrossRef]
  19. Pan, S.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
  20. Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef] [Green Version]
  21. Bamford, T.; Esmaeili, K.; Schoellig, A. A deep learning approach for rock fragmentation analysis. Int. J. Rock. Mech. Min. Sci. 2021, 145, 104839. [Google Scholar] [CrossRef]
  22. Wang, P.; Wang, S.; Zhu, C.; Zhang, Z. Classification and extent determination of rock slope using deep learning. Geomech. Geophys. Geo. 2020, 6, 33. [Google Scholar] [CrossRef]
  23. Li, H.; Hu, Q.; Mao, Y.; Niu, F.; Liu, C. Deep Learning-based Model for Automatic Salt Rock Segmentation. Rock. Mech. Rock. Eng. 2022, 55, 3735–3747. [Google Scholar] [CrossRef]
  24. Zhao, J.; Wang, F.; Cai, J. 3D tight sandstone digital rock reconstruction with deep learning. J. Pet. Sci. Eng. 2021, 207, 109020. [Google Scholar] [CrossRef]
  25. Cao, D.; Ji, S.; Cui, R.; Liu, Q. Multi-task learning for digital rock segmentation and characteristic parameters computation. J. Pet. Sci. Eng. 2022, 208, 109202. [Google Scholar] [CrossRef]
  26. Li, D.; Zhao, J.; Liu, Z. A Novel Method of Multitype Hybrid Rock Lithology Classification Based on Convolutional Neural Networks. Sensors 2022, 22, 1574. [Google Scholar] [CrossRef]
  27. Polat, O.; Polat, A.; Ekici, T. Automatic classification of volcanic rocks from thin section images using transfer learning networks. Neural Comput. Appl. 2021, 33, 11531–11540. [Google Scholar] [CrossRef]
  28. Alzubaidi, F.; Mostaghimi, P.; Swietojanski, P.; Clark, S.; Armstrong, R. Automated lithology classification from drill core images using convolutional neural networks. J. Pet. Sci. Eng. 2021, 197, 107933. [Google Scholar] [CrossRef]
  29. Liu, Y.; Zhang, Z.; Liu, X.; Wang, L.; Xia, X. Deep Learning Based Mineral Image Classification Combined with Visual Attention Mechanism. IEEE Access 2021, 9, 98091–98109. [Google Scholar] [CrossRef]
  30. Ma, H.; Han, G.; Peng, L.; Zhu, L.; Shu, J. Rock thin sections identification based on improved squeeze-and-Excitation Networks model. Comput. Geosci. 2021, 152, 104780. [Google Scholar] [CrossRef]
  31. dos Anjos, C.; Avila, M.; Vasconcelos, A.; Neta, A.; Medeiros, L.; Evsukoff, A.; Surmas, R.; Landau, L. Deep learning for lithological classification of carbonate rock micro-CT images. Comput. Geosci. 2021, 25, 971–983. [Google Scholar] [CrossRef]
  32. Su, C.; Xu, S.; Zhu, K.; Zhang, X. Rock classification in petrographic thin section images based on concatenated convolutional neural networks. Earth Sci. Inform. 2020, 13, 1477–1484. [Google Scholar] [CrossRef]
  33. Seo, W.; Kim, Y.; Sim, H.; Song, Y.; Yun, T. Classification of igneous rocks from petrographic thin section images using convolutional neural network. Earth Sci. Inform. 2022, 15, 1297–1307. [Google Scholar] [CrossRef]
  34. Xu, Z.; Shi, H.; Lin, P.; Liu, T. Integrated lithology identification based on images and elemental data from rocks. J. Pet. Sci. Eng. 2021, 205, 108853. [Google Scholar] [CrossRef]
  35. Saeedan, F.; Weber, N.; Goesele, M.; Roth, S. Detail-Preserving Pooling in Deep Networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
  36. Li, P.; Xie, J.; Wang, Q.; Zuo, W. Is Second-order Information Helpful for Large-scale Visual Recognition. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
  37. Ma, R.; Liu, C.; Yang, J.; Wang, Y.; Liu, J. A Carbonate Microscopic Image Dataset of the Permo-Carboniferous Taiyuan Formation in the Southern Margin of the North China Block. Science Data Bank. Available online: https://www.scidb.cn/en/detail?dataSetId=727517165267189760 (accessed on 15 October 2022).
  38. Liu, Y.; Hou, M.; Liu, X.; Qi, Z. A Micrograph Dataset of Buried Hills and Overlying Glutenite in Bozhong Sag, Bohai Bay Basin. Science Data Bank. Available online: https://www.scidb.cn/en/detail?dataSetId=752623639467130880 (accessed on 15 October 2022).
  39. Lai, W.; Jiang, J.; Qiu, J.; Yu, J.; Hu, X. Photomicrograph Dataset of Rocks for Petrology Teaching in Nanjing University. Science Data Bank. Available online: https://www.scidb.cn/en/detail?dataSetId=732953783604084736 (accessed on 15 October 2022).
  40. Qi, Z.; Hou, M.; Xu, S.; He, L.; Tang, Z.; Zhang, M. A Carbonate Microscopic Image Dataset of Sinian Dengying Period in Northwestern Margin of Sichuan Basin. Science Data Bank. Available online: https://www.scidb.cn/en/detail?dataSetId=733012342660399104 (accessed on 15 October 2022).
  41. Ma, Q.; Chai, R.; Yang, J.; Du, Y.; Dai, X. A microscopic Image Dataset of Mesozoic Metamorphic Grains Bearing Sandstones from Mid-Yangtze, China. Science Data Bank. Available online: https://www.scidb.cn/en/detail?dataSetId=727525043063488512 (accessed on 15 October 2022).
  42. Cai, W.; Hou, M.; Chen, H.; Liu, Y. A Micrograph Dataset of Terrigenous Clastic Rocks of Upper Devonian Lower Carboniferous Wutong Group in Southern Lower Yangtze. Science Data Bank. Available online: https://www.scidb.cn/en/detail?dataSetId=732987889075355648 (accessed on 15 October 2022).
  43. Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
  44. Yang, Z.; He, B.; Liu, Y.; Wang, D.; Zhu, G. Classification of rock fragments produced by tunnel boring machine using convolutional neural networks. Autom. Constr. 2021, 125, 103612. [Google Scholar] [CrossRef]
  45. Yu, Z.; Dai, S.; Xing, Y. Adaptive Salience Preserving Pooling for Deep Convolutional Neural Networks. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo Workshops, Shanghai, China, 8–12 July 2019. [Google Scholar]
  46. Carreira, J.; Caseiro, R.; Batista, J.; Sminchisescu, C. Free-form region description with second-order pooling. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1177–1189. [Google Scholar] [CrossRef]
  47. Zhou, J.; Huang, S.; Wang, M.; Qiu, Y. Performance evaluation of hybrid GA–SVM and GWO–SVM models to predict earthquake-induced liquefaction potential of soil: A multi-dataset investigation. Eng. Comput. 2021, 38, 4197–4215. [Google Scholar] [CrossRef]
  48. Zhou, J.; Huang, S.; Zhou, T.; Armaghani, D.; Qiu, Y. Employing a genetic algorithm and grey wolf optimizer for optimizing RF models to evaluate soil liquefaction potential. Artif. Intell. Rev. 2022, 55, 5673–5705. [Google Scholar] [CrossRef]
  49. Zhou, J.; Huang, S.; Qiu, Y. Optimization of random forest through the use of MVO, GWO and MFO in evaluating the stability of underground entry-type excavations. Tunn. Undergr. Space. Technol. 2022, 124, 104494. [Google Scholar] [CrossRef]
  50. Li, P.; Xie, J.; Wang, Q.; Gao, Z. Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Figure 1. The sites of rock samples and the production process of rock thin section images; (a) Rock sample acquisition area; (b) Lithological map of different stratum; (c) Manufacturing process of rock thin sections; (d) Rock thin section image set.
Figure 1. The sites of rock samples and the production process of rock thin section images; (a) Rock sample acquisition area; (b) Lithological map of different stratum; (c) Manufacturing process of rock thin sections; (d) Rock thin section image set.
Mathematics 11 01245 g001
Figure 2. The calculation process of the bilinear interpolation algorithm.
Figure 2. The calculation process of the bilinear interpolation algorithm.
Mathematics 11 01245 g002
Figure 3. The structure and workflow of CNN model.
Figure 3. The structure and workflow of CNN model.
Mathematics 11 01245 g003
Figure 4. The structure of rock thin section image identification model (ASOPCNN).
Figure 4. The structure of rock thin section image identification model (ASOPCNN).
Mathematics 11 01245 g004
Figure 5. Process of traditional pooling methods, including max pooling, average pooling and stochastic pooling.
Figure 5. Process of traditional pooling methods, including max pooling, average pooling and stochastic pooling.
Mathematics 11 01245 g005
Figure 6. (a) The selectivity of textural features in the pooling process; (b) The selectivity of color features in the pooling process.
Figure 6. (a) The selectivity of textural features in the pooling process; (b) The selectivity of color features in the pooling process.
Mathematics 11 01245 g006
Figure 7. Process of the adaptive pooling method.
Figure 7. Process of the adaptive pooling method.
Mathematics 11 01245 g007
Figure 8. Process of integrating feature maps of traditional pooling methods.
Figure 8. Process of integrating feature maps of traditional pooling methods.
Mathematics 11 01245 g008
Figure 9. The process of the second-order pooling method.
Figure 9. The process of the second-order pooling method.
Mathematics 11 01245 g009
Figure 10. Schematic of the confusion matrix.
Figure 10. Schematic of the confusion matrix.
Mathematics 11 01245 g010
Figure 11. (a) The loss decline process of four models on the training set; (b) the loss decline process of four models on the verification set.
Figure 11. (a) The loss decline process of four models on the training set; (b) the loss decline process of four models on the verification set.
Mathematics 11 01245 g011
Figure 12. (a) The changing process of model accuracy on the training set; (b) the changing process of model accuracy on the verification set.
Figure 12. (a) The changing process of model accuracy on the training set; (b) the changing process of model accuracy on the verification set.
Mathematics 11 01245 g012
Figure 13. Accuracy of each model on the test set.
Figure 13. Accuracy of each model on the test set.
Mathematics 11 01245 g013
Figure 14. Prediction results of partial images on the test set; (a) wackestone; (b) granite; (c) schist; (d) quartz sandstone; (e) conglomerate; (f) crystalline dolomite.
Figure 14. Prediction results of partial images on the test set; (a) wackestone; (b) granite; (c) schist; (d) quartz sandstone; (e) conglomerate; (f) crystalline dolomite.
Mathematics 11 01245 g014
Figure 15. Confusion matrixes of each identification model on the test set, the meaning of digits: 1-wackestone, 2-granite, 3-schist, 4-quartz sandstone, 5-conglomerate, 6-crystalline dolomite. (a) ASOPCNN; (b) MAXCNN; (c) MEACNN; (d) STOCNN.
Figure 15. Confusion matrixes of each identification model on the test set, the meaning of digits: 1-wackestone, 2-granite, 3-schist, 4-quartz sandstone, 5-conglomerate, 6-crystalline dolomite. (a) ASOPCNN; (b) MAXCNN; (c) MEACNN; (d) STOCNN.
Mathematics 11 01245 g015
Figure 16. Performance of various rock types evaluated by different metrics, the meaning of digits: 1-wackestone, 2-granite, 3-schist, 4-quartz sandstone, 5-conglomerate, 6-crystalline dolomite. (a) Recall rate; (b) precision rate; (c) F1 score.
Figure 16. Performance of various rock types evaluated by different metrics, the meaning of digits: 1-wackestone, 2-granite, 3-schist, 4-quartz sandstone, 5-conglomerate, 6-crystalline dolomite. (a) Recall rate; (b) precision rate; (c) F1 score.
Mathematics 11 01245 g016
Figure 17. The ROC and AUC values of the four models for six rock categories. (a) wackestone; (b) granite; (c) scholar; (d) quartz sandstone; (e) conglomerate; (f) crystalline dolomite.
Figure 17. The ROC and AUC values of the four models for six rock categories. (a) wackestone; (b) granite; (c) scholar; (d) quartz sandstone; (e) conglomerate; (f) crystalline dolomite.
Mathematics 11 01245 g017
Figure 18. Second-order pooling contained the feature information of traditional pooling methods.
Figure 18. Second-order pooling contained the feature information of traditional pooling methods.
Mathematics 11 01245 g018
Table 1. Rock thin section image set.
Table 1. Rock thin section image set.
Rock TypeTraining SetValidation SetTest SetTotal
Wackestone636180103919
Granite732216921040
Schist67219289953
Quartz sandstone7562161021074
Conglomerate7682161041088
Crystalline dolomite64818096924
Total421212005865998
Table 2. Model structure configuration.
Table 2. Model structure configuration.
ASOPCNNMAXCNNMEACNNSTOCNN
Convolution 11 × 11-96 filtersConvolution 11 × 11-96 filtersConvolution 11 × 11-96 filtersConvolution 11 × 11-96 filters
Adaptive poolingMax poolingMean poolingStochastic pooling
2 Group convolution 5 × 5-128 filters2 Group convolution 5 × 5-128 filters2 Group convolution 5 × 5-128 filters2 Group convolution 5 × 5-128 filters
Adaptive poolingMax poolingMean poolingStochastic pooling
Convolution 3 × 3-384 filtersConvolution 3 × 3-384 filtersConvolution 3 × 3-384 filtersConvolution 3 × 3-384 filters
2 Group convolution 3 × 3-192 filters2 Group convolution 3 × 3-192 filters2 Group convolution 3 × 3-192 filters2 Group convolution 3 × 3-192 filters
2 Group convolution 3 × 3-128 filters2 Group convolution 3 × 3-128 filters2 Group convolution 3 × 3-128 filters2 Group convolution 3 × 3-128 filters
Second order poolingMax poolingMean poolingStochastic pooling
3 Fully Connected 4096 4096 63 Fully Connected 4096 4096 63 Fully Connected 4096 4096 63 Fully Connected 4096 4096 6
SoftmaxSoftmaxSoftmaxSoftmax
Table 3. The final accuracy on training and verification sets.
Table 3. The final accuracy on training and verification sets.
ModelASOPCNNMAXCNNMEACNNSTOCNN
Training set0.92860.70860.66570.8412
Validation set0.86710.68080.61630.8075
Table 4. Identification probability of rock images in Figure 14, the meaning of digits: 1- wackestone, 2-granite; 3-schist, 4-quartz sandstone, 5-conglomerate, 6-crystalline dolomite.
Table 4. Identification probability of rock images in Figure 14, the meaning of digits: 1- wackestone, 2-granite; 3-schist, 4-quartz sandstone, 5-conglomerate, 6-crystalline dolomite.
Rock CategoryWackestoneGraniteSchistQuartz SandstoneConglomerateCrystalline Dolomite
10.92410.01560.03260.01230.00620.0092
20.06580.86060.03690.02230.00560.0088
30.00330.00150.93410.00610.00890.0461
40.00310.00260.00110.90730.00300.0829
50.00230.13560.00110.00100.84830.0117
60.00330.00110.00100.00100.09180.9028
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, Z.; Yuan, H.; Cai, X. Rock Thin Section Image Identification Based on Convolutional Neural Networks of Adaptive and Second-Order Pooling Methods. Mathematics 2023, 11, 1245. https://doi.org/10.3390/math11051245

AMA Style

Zhou Z, Yuan H, Cai X. Rock Thin Section Image Identification Based on Convolutional Neural Networks of Adaptive and Second-Order Pooling Methods. Mathematics. 2023; 11(5):1245. https://doi.org/10.3390/math11051245

Chicago/Turabian Style

Zhou, Zilong, Hang Yuan, and Xin Cai. 2023. "Rock Thin Section Image Identification Based on Convolutional Neural Networks of Adaptive and Second-Order Pooling Methods" Mathematics 11, no. 5: 1245. https://doi.org/10.3390/math11051245

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop