Next Article in Journal
Effect of Time of Girdling on Leaf Photosynthetic Performance and Kiwifruit Quality Characteristics at Harvest and Post-Storage
Previous Article in Journal
Drying Behaviour of Western Hemlock with Schedules Developed for Norway Spruce and Scots Pine
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction of Pr/Nd Element Content Based on One-Dimensional Convolution with Multi-Residual Attention Blocks

1
School Electrical and Automation Engineering, East China Jiaotong University, Nanchang 330013, China
2
State Key Laboratory of Performance Monitoring and Protecting of Rail Transit Infrastructure, East China Jiaotong University, Nanchang 330013, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(19), 11086; https://doi.org/10.3390/app131911086
Submission received: 4 September 2023 / Revised: 27 September 2023 / Accepted: 29 September 2023 / Published: 9 October 2023

Abstract

:
Insufficient color feature extraction can lead to poor prediction performance in rare earth element composition estimation. To address this issue, we propose a one-dimensional convolutional method for predicting rare earth element composition. First, images of rare earth element solutions, color space features (HSV and YUV), and spatial texture features are extracted. Because the trend of rare earth element composition is closely related to the extraction stage, we select the corresponding extraction stage of the image as a key feature. A feature selection technique based on Random Forest Recursive Feature Elimination with Cross-Validation (RF-RFECV) is applied to select the most relevant features, with a mixed feature set being obtained. Based on this, a one-dimensional convolutional neural network prediction model with multiple residual attention blocks (MRAB-DNN) is introduced. The proposed model incorporates the Residual Attention Block (RAB) structure, which mitigates the effects of noisy weights, subsequently enhancing both prediction accuracy and the rate of convergence. Experimental assessments on field images utilizing the MRAB-DNN model with an amalgamation of features indicate that our methodology surpasses alternative techniques in thorough image feature extraction. Moreover, it presents dual advantages of speed and precision in predicting the composition of rare earth elements. Such a model holds potential for real-time monitoring of rare earth element composition in extraction production processes.

1. Introduction

Rare earth elements are strategic resources in China, finding widespread application in fields such as electronics, optoelectronics, and metallurgy. In rare earth separation enterprises, the primary objective of automating production process is to obtain single rare earth elements with high-purity. A crucial step in this process involves real-time monitoring of the changes in the content of rare earth element components during the extraction process [1]. Traditional detection methods, such as Inductively Coupled Plasma Atomic Emission Spectroscopy (ICP-AES), Inductively Coupled Plasma Mass Spectrometry (ICP-MS), EDX-ray Fluorescence (XRF), and Ultraviolet-Visible (UV-VIS) spectrophotometry [2,3,4,5] are commonly used both domestically and internationally for detecting the content of rare earth element detection. While these methods offer high accuracy and reliability, they often require extensive preparation and testing time, which does not meet the requirements of real-time online detection. Soft sensing methods, being data-driven modeling technologies, enable model training and prediction based on real-time data; thus, these methods can achieve real-time content prediction, facilitating online detection and quick response to anomalies. Consequently, soft sensing methods are increasingly becoming the preferred rapid detection approach.
Owing to the unique physical and chemical properties of certain rare earth elements, rare earth solutions of different concentrations exhibit distinct colors under visible light. Researchers have attempted to leverage these color characteristics for rapid detection of their component content using soft sensing methods. In [6], an association was formulated between the H component and the composition of Pr/Nd elements in a solution, leveraging the least squares technique within the HSI color model. In subsequent research [7,8], the H/S components of the HSI color model were employed as input variables for modeling. Least-square support vector machine (LSSVM) and multi-RBF models were subsequently developed for predicting component concentrations. Further, in the literature [9,10], the first moments of the H, S, and I components were extracted. These features were used as the input variables of the model, and weighted least-square support vector machine (WLSSVM) and genetic algorithm–improved extreme learning machine (GA-ELM) models were employed to establish a soft sensor model of component content. In another study [11], ten color features, including RGB, HIS, RR, RG, RB, and CVA of Pr/Nd extracted from solution images were used as inputs, then an improved GRA-real-time learning algorithm was utilized to predict component content. However, these methods only consider the HSI features of the image color space, neglecting other representative features. This single feature extraction method often struggles to fully encapsulate all the information of the image; when there are too many single features, feature redundancy can easily occur.
Additionally, different elements with varied concentrations in the solution can lead to changes in the color and brightness of the solution images. These alterations often appear as visual features, including variations in texture and the presence of spots. The Gray Level Co-occurrence Matrix (GLCM) [12] is a predominant tool in the realm of image texture feature extraction due to its simplicity of implementation, extensive applicability, and robust representational capability. For instance, in the literature [13], the GLCM has been employed to extract the building area of high-resolution SAR images. When combined with Bhattacharyya feature selection, it can achieve excellent results in detecting the boundaries of the building area. In another study [14], spectral features and spatial texture features extracted by the GLCM were combined to construct a random forest classifier, enabling the extraction of wolfberry planting information. In another work [15], GLCM was adopted for feature extraction from plant leaf images, leading to excellent classification results using deep convolutional neural networks. However, when the dimensionality of extracted features becomes high, feature selection techniques are needed to eliminate irrelevant or redundant features and improve model performance. In [16], a feature importance recursive elimination method combining support vector machines, random forests, and generalized boosted regression algorithms was used to select the optimal feature subset, yielding promising results. In [17], the authors applied XGBoost for denoising and dimensionality reduction of biological-related features, leading to the development of the StackPPI model, which improved the accuracy of protein–protein interaction prediction. In [18], an approach based on mutual information coupled with bare-bones particle swarm optimization (BBPSO) was utilized to achieve superior feature subset selection performance through effective population initialization, local search operations, and adaptive mutation operations.
To summarize, in this paper we introduce a one-dimensional convolutional approach that relies on feature fusion for predicting the content of rare earth elements. Initially, machine vision technology is used to extract the RGB components of the Pr/Nd solution image color space, which is then converted into HSV and YUV color spaces. The primary components of these two color spaces are extracted to serve as the image’s color features. Following this, the Gray Level Co-occurrence Matrix (GLCM) is used to extract seven features from the Pr/Nd solution image, namely, the energy, correlation, contrast, homogeneity, autocorrelation, and matrix mean, which function as the image’s spatial texture features. Considering that in real-world rare earth extraction processes the content value of rare earth element components across different extraction stages is often strongly correlated with the extraction stage, the corresponding extraction stage in the image is selected as a crucial feature. The composite features of the rare earth solution are derived using a feature recursive elimination method based on a random forest. For these mixed features, we paper propose a one-dimensional convolutional neural network prediction model with multiple residual attention blocks (MRAB-DNN). This model incorporates a residual attention block (RAB) structure, enabling the model to diminish the weight proportion of noise through the attention mechanism while extracting deep features, facilitating rapid convergence. Through experimental validation using Pr/Nd element data in the field, the method presented in this paper can be applied to the real-time detection of rare earth element components at extraction production sites.
The structure of this paper is as follows. Section 2 describes the color feature extraction of rare earth solution and introduces several spatial texture feature extraction methods. In Section 3, we introduce the Random Forest model used to obtain the optimal feature set. Section 4 introduces the one-dimensional residual attention convolutional neural network model used to predict the component content of the rare earth elements. Section 5 reports the experimental results and provides analysis. Finally, the paper is concluded in Section 6.

2. Rare Earth Solution Image Feature Extraction

2.1. Color Feature Extraction

During the process of rare earth extraction production, the field environment is often harsh, which is not conducive to the direct acquisition of images. Therefore, an image acquisition device is utilized to first capture the image of the rare earth solution. Subsequently, operations such as image background segmentation and image filtering are performed. The central area of the image is selected for cropping, yielding a 128 × 128 sample map. The first-order components of RGB are then extracted. RGB is a color space commonly employed by hardware devices. However, the RGB color space is influenced by light intensity and color offset, and there is a correlation between its color channels. Consequently, it is not suitable for feature extraction from rare earth solution images. Other color spaces, such as HSV and YUV, align more closely with human visual perception and can more effectively describe the brightness, saturation, and other color characteristics. The HSV space primarily describes the brightness, saturation, and hue of the color, while the YUV space describes the brightness and chrominance. Both are nonlinear transformations based on the RGB space, and their combination can more comprehensively describe color information. Their conversion relationships to the RGB color space are as follows:
M ax = max ( R , G , B )
M i n = min ( R , G , B )
H = 0 , M ax = M i n 6 0 × G B M ax M i n + 0 , M a x = R a n d G B 6 0 × G B M ax M i n + 360 , M a x = R a n d G < B 6 0 × G B M ax M i n + 120 , M a x = G 6 0 × G B M ax M i n + 120 , M a x = B
S = 0 , i f M a x = 0 M a x M i n M a x , o t h e r w i s e
V = m a x ( R , G , B )
In the above formula, the values of R, G, and B are real between 0 and 1, Max is the largest of RGB, and Min is the smallest. If (R, G, B) is reduced to (0, 255), then RGB is converted to the YUV formula, shown below.
Y U V = 0.299 0.587 0.114 0.169 0.331 0.5 0.5 0.419 0.081 R G B + 0 128 128

2.2. Spatial Texture Feature Extraction

The gray level co-occurrence matrix is a method to describe image texture features [19]; it can reflect the gray relationship between two pixels in the space, and is the basis of describing image texture features [20]. It is defined as follows: for a given direction θ and distance d, on a line with the direction θ , the number of occurrences of one pixel’s gray scale i and another point pair whose gray scale is j and whose pixel distance is d is the value of the point i , j of the gray level co-occurrence matrix. In order to reduce the computational complexity on the premise of retaining more information, in this paper we select eight statistical values as original texture features, which are as follows:
e n e r g y = i j p i , j
c o n t r a s t = i j ( i j ) 2 p ( i , j )
e n t r o p y = i j p i , j l o g p i , j
h o m o g e n e i t y = i j P i , j 1 + i j 2
a u t o c o r r e l a t i o n = i j i j p i , j
m e a n = i j i p ( i , j )
v a r = i j ( i M e a n ) 2 p ( i , j )
c o r r e l a t i o n = i j i M e a n j M e a n p i , j V a r 2
In these formulas, p i , j represents the value of point i , j of the gray level co-occurrence matrix with the given parameters θ , d. The aforementioned eight features encapsulate the grayscale texture information of the image: Energy reflects the uniformity of the grayscale distribution in the image, with a larger energy value indicating a more dispersed image; Contrast reflects the intensity of the grayscale differences in the image, with a greater contrast indicating a stronger difference in the rare earth solution; Entropy reflects the uncertainty of an image, with a higher entropy value indicating a more complex texture in the image space; Homogeneity reflects the distribution of image texture, with a smaller homogeneity value indicating a more dispersed image texture; Autocorrelation reflects the degree of correlation between grayscale values in an image and the grayscale values of surrounding pixels, with a greater autocorrelation value indicating a smoother image texture; Mean and variance describe the average of the co-occurrence matrix and the degree of deviation from the mean, respectively, reflecting the uniformity of the image; and Correlation reflects the law of change of the grayscale value of the image, with a higher correlation value indicating a more regular image.
In the above feature extraction process, the distance d of the gray co-occurrence matrix is set as 1, and the angle θ is taken as the mean value of each feature in the direction of 0, 45, 90, 135 to balance the influence of different directions on feature values. Therefore, eight features of the gray level co-incidence matrix are extracted as the spatial texture features of the image.

3. Optimal Feature Set Acquisition

The Recursive Feature Elimination (RFE) method based on Random Forest (RF-RFE) employs a Random Forest as the estimator. It evaluates the importance of features using the random Forest model and selects the optimal feature subset by recursively eliminating those features deemed less important.

3.1. Random Forest

A random forest is an ensemble learning algorithm based on decision trees, first introduced by Leo Breiman and Adele Cutler in 2001 [21]. It uses the bootstrap resampling technique to draw multiple samples from the original dataset, then constructs decision tree models for each feature subset, and finally aggregates the prediction outcomes of multiple decision trees. The ultimate prediction result is determined by calculating the arithmetic mean of these decision trees’ predictions. The architecture of the random forest is illustrated in Figure 1.

3.2. Recursive Feature Elimination Method

Recursive Feature Elimination (RFE) is a feature selection technique utilized to select the optimal subset from a given feature set. Its core concept involves progressively eliminating the least important features, repeatedly training the model, and assessing feature importance until the required number of features or performance indicators are met. The algorithm operates in a cyclical process encompassing three steps: (1) training the model based on the dataset and obtaining the corresponding weight of each feature; (2) calculating the evaluation scores of all features using predetermined evaluation indices; and (3) eliminating the feature with the lowest evaluation score in the current dataset. This looping process continues until only the final variable remains in the feature set. Ultimately, a list sorted by feature importance is generated. The RFE method can filter out the feature subset that significantly influences the target task and remove irrelevant features.

3.3. Feature Selection Based on RF-RFECV

Among various feature selection models, RF-RFECV has emerged as the preferred method for feature selection when dealing with rare earth element datasets owing to its unique advantages. Notably, the dataset in this study possesses a high-dimensional and complex feature space, making manual selection of the number of features exceedingly challenging. RF-RFECV excels in automatically determining the optimal number of features, eliminating the subjectivity associated with manual selection, and ensuring the stability and reliability of the chosen feature subset. Additionally, the feature selection outcomes of RF-RFECV remain unaffected by the randomness and noise in the data, making it particularly well-suited for situations such as rare earth element datasets, which may be susceptible to noise and instability.
Second, RF-RFECV integrates the characteristics of random forests, recursive feature elimination, and cross-validation, effectively enhancing model performance and efficiency. By estimating feature importance using random forests, RF-RFECV identifies the most informative features, thereby reducing the introduction of noise and complexity associated with unnecessary features. The incorporation of cross-validation comprehensively considers the randomness and noise in the data, significantly reducing the instability of feature selection results. This aspect is particularly crucial for datasets involving rare earth elements, where highly reliable and stable feature selection results are essential for predicting the content of rare earth elements accurately.
Considering the unique nature and requirements of rare earth element datasets, RF-RFECV stands out as the feature selection method of choice in this study. Its inherent qualities, including automation, robustness, and resilience, align well with the stringent standards and reliability requirements of feature selection, consequently greatly enhancing the performance and interpretability of the rare earth element content prediction model. Hence, we have chosen to employ RF-RFECV as the feature selection model in this study. The steps for obtaining the optimal feature subset based on the RF-RFECV method are depicted in Figure 2.
Initially, a dataset of rare earth image data is procured. Subsequently, machine vision technology is employed to extract the RGB color features of the image. The color space conversion formula is then utilized to convert these into HSV and YUV color features. Following this, the gray level co-occurrence matrix is used to extract the spatial texture features of the image. Finally, the RF-RFECV algorithm is applied to screen the color features, spatial texture features, and key features, ultimately yielding a mixed feature set.

4. One-Dimensional Residual Attention Convolutional Neural Network Model

4.1. One-Dimensional Convolutional Neural Network

Convolutional Neural Networks (CNNs) are deep feedforward networks primarily utilized in the field of computer vision for tasks such as image classification and target detection. The concept of convolution and downsampling layers can be traced back to 1980, when Kunihiko Fukushima [22] proposed it. In 1990, LeCun and colleagues [23,24] published a paper introducing the principles of the modern CNN framework. However, it was not until 2012, when AlexNet [25] achieved significant success on the ImageNet dataset, that CNNs began to receive widespread attention.
The one-dimensional Convolutional Neural Network (1D CNN) is a variant of the CNN primarily used to process one-dimensional sequence data. Its network structure is similar to that of two-dimensional CNNs, encompassing a convolution layer, pooling layer, and fully connected layer. Subsequently, the attention mechanism [26] and the residual network [27] are applied to the convolutional neural network.
In the convolution layer, the input feature map undergoes convolution with the convolution kernel to yield an output feature map, which is then subjected to nonlinear transformation via the activation function. The computational method for this process can be expressed by the following formula:
X j l = f i M j X j l 1 w i j l + b j l
In the formula, X j l is the output of layer l, f is the nonlinear activation function, M j is the set of all input feature graphs associated with the output feature graph of layer l 1 , X j l 1 indicates the input feature map in layer, * rep l 1 resents the convolution operation between the convolution kernel and the input, w is the weight matrix corresponding to the convolution kernel, and b is the bias matrix.
The pooling layer serves to decrease the size and quantity of parameters within the feature map. This reduces the risk of overfitting while retaining the essential features. The computational process of the pooling layer unfolds as follows:
X j l = f β j l · d o w n X j l 1 + b j l
In the above formula, β represents the weight matrix, while b denotes the bias matrix, and ‘down’ signifies the down-sampling function. The one-dimensional pooling layer is primarily categorized into two types, namely, maximum pooling and average pooling. Maximum pooling selects the maximum value within each window for output, whereas average pooling computes the average value within each window for output. Generally, maximum pooling reflects the most prominent features while average pooling smooths the region and selects the smoothing feature.
The fully connected layer is employed to classify or regress the outputs of the convolution layer and the pooling layer. The outputs of the fully connected layer are as follows:
δ i = f w i p i + b i
In the formula, i = 1 , 2 , , k ; there are k outputs, with δ i being the i output; w i and b i are the weights and thresholds of i neuron, respectively; and f is the activation function.

4.2. Residual Attention Block

The color features of the rare earth solution image data utilized in this study are susceptible to external environmental interference during the measurement process. This interference manifests as noise during model learning and training. As the number of layers in the model network increases, the weight corresponding to the noise continually increases as well. To mitigate this effect, an attention mechanism module based on the residual is introduced in this paper. The attention mechanism can adjust the weight of each channel, enabling the model to focus more on significant features, thereby reducing the impact of noise on the model.
The Residual Attention Block (RAB) primarily consists of two components, namely, the residual connection branch and the attention mechanism branch. The structure of the residual branch aligns with that of the residual network. In the residual branch, the network learns the residual map, not the underlying map of the input at that layer. The connection to the output through identity mapping is referred to as a skip connection [28].
In the residual branch, the network learns the difference between the input and the expected output, rather than directly learning the output. The implementation of the residual block is as follows:
y = f H ( x , W ) + h ( x )
where y represents the output of the residual block, f is the activation function, W is the weight parameter, x is the input, H ( x , W ) is the residual map to be learned, and h ( x ) is the identity map. In neural networks, the residual mapping is often easier to optimize than the original mapping. Assuming that an identity mapping in a neural network is its optimal solution, it is much more difficult to fit the identity mapping by stacking multi-layer nonlinear networks than to reduce the residual to zero.
If the input x is represented as the underlying mapping is represented as H ( x ) , the residual mapping is defined as:
F ( x ) = H ( x ) x .
Using the residual attention block, the deep information of the data can be extracted based on retaining the residual structure, while the weight ratio of noise can be suppressed by the attention mechanism. The residual attention block structure is shown in Figure 3.
The fundamental structure of a Residual Attention Block (RAB) is primarily composed of two convolution layers, namely, a global average pooling layer and a fully connected layer. An RAB contains three branches: the residual connection branch, the convolution branch, and the attention branch. The function of the residual connection branch is to add skip connections to the network. The convolution branch performs convolution operations on the input data. The attention branch changes the weight ratio between the feature maps, assigning more weight to more important features. The operation process of a RAB is as follows.
First, the input features are convolved to obtain the output of the convolution branch h ( x ) . Then, the channel information of the upper layer is globally averaged pooled to obtain the compressed global information, expressed in the following formula:
g ( x ) = 1 k k = 1 k h ( x ) .
In the above formula, h ( x ) is the output of the second convolution layer and k represents the number of channels in the convolution layer.
The global channel information is transformed into its corresponding weight ratio through a fully connected layer and the softmax activation function, yielding the output of the attention branch. The formula for this process is as follows:
z ( x ) = δ F g x , W .
In the above formula, δ is the softmax activation function, F ( g ( x ) , W ) for the global channel information for the input conditions, with the whole connection layer corresponding to the function.
Ultimately, the output of the convolution branch is multiplied by the output of the attention mechanism branch. The product of these outputs and the output of the residual branch are then weighted to obtain the overall output of the residual attention block.
y = f h x z x + x
In the above formula, y stands for residual attention output, f is the activation function, h ( x ) is a branch of convolution output, z ( x ) is a branch of attention output, and x connects the branch and output residuals.

4.3. MRAB-CNN Network Design

In this paper, we design a one-dimensional Convolutional Neural Network (RAB-CNN) based on residual attention blocks, effectively combining the residual network and attention mechanism. The model comprises multiple convolutional layers, fully connected layers, and residual attention blocks.
In order to enable the RAB-CNN model to extract richer feature information, the initial layer of the model utilizes two types of convolutional kernels, 1 × 1 and 3 × 1, for extracting image features. Subsequently, features extracted from different receptive fields are fused. Considering that the contributions of three feature categories vary in component content prediction, an attention mechanism is introduced after the feature fusion layer to create the fusion attention layer (Attention 1). This layer allows the model to dynamically assign weights to features based on their contributions. Furthermore, multiple residual attention units are added after the attention mechanism layer, with each residual attention unit consisting of a convolutional layer and a residual attention block. By incorporating convolutional layers before the residual attention blocks, channel consistency is ensured. After incorporating multiple residual attention units, global average pooling is performed to extract global feature information. Finally, a fully connected layer with 256 neurons is connected, followed by prediction.
Through experiments, it was observed that the RAB-CNN model exhibits fast convergence while extracting deep features. However, as the model does not fully adopt a residual network structure, the impact of the fusion attention layer on predictions gradually diminishes with an increase in model depth. To address the identified challenge, we undertook the refinement and optimization of the RAB-CNN model, bolstering the linkage between the fusion attention layer and the component content predictions. This enhancement entailed connecting the fusion attention layer’s output directly to a fully connected layer, thereby creating a comprehensive global residual structure. Importantly, while this structure delves into the extraction of deep features, it concurrently benefits from the feedback of the more superficial features. This augmented model, characterized by its multiple residual attention blocks, is designated as the Multi-Residual Attention-based Convolutional Neural Network (MRAB-CNN).
The network structure is illustrated in Figure 4, where the model takes as input the features selected through RF-RFECV screening. It undergoes feature extraction through 1 × 1 and 3 × 1 convolutional layers, followed by feature fusion. Subsequently, multiple residual attention units are incorporated, with global average pooling following the last residual attention unit. Simultaneously, the fusion attention layer is connected to a fully connected layer, creating a globally encompassing residual structure for the final prediction. Through these improvements to the RAB-CNN model, the MRAB-CNN model excels in capturing complex features, enhancing predictive performance, and leveraging both deep and shallow features to further bolster its generalization capability.

5. Experiment and Result Analysis

5.1. Feature Data Selection

5.1.1. Data Preparation

This study focuses on a production line of Pr/Nd element extraction and separation process at a rare earth company in Jiangxi. A total of 78 samples were collected from different Pr/Nd solution extraction tanks. Each sample solution was divided into two parts for processing. One portion of the sample was used for laboratory testing to obtain the element component content in the sample solution, with the component content of the Nd element ranging between 5% and 99.965%. The other portion of the sample was processed using an acquisition device developed by the laboratory to collect the image information of the sample solution and extract the R, G, and B color features of the image color space.

5.1.2. Feature Screening

Upon obtaining the image of the rare earth element solution, the original features of the image are categorized as color features, texture features, and key features. Using the original feature as input and the component content of Pr and Nd elements as output, the original features are standardized. The data is subsequently partitioned into a training set and a test set, with the test set comprising 19% of the overall data. The RF-RFECV algorithm is used to select the relevant features from the training set. The length of recursive feature elimination in each iteration is set to 1, and the number of cross-validations is set to 3, 5, 7, and 9. The optimal feature dimension is determined based on the average value of the feature dimensions corresponding to the multiple cross-validation results, as depicted in Figure 5:
As illustrated in Figure 5, the RF score of the model increases with the increase in feature dimensions. When the feature dimension of the model increases from 1 to 9, the model score sees a significant rise. However, when the dimension of the model exceeds 9, the score results exhibit an oscillating phenomenon. Moreover, as the feature dimension continues to increase, the model’s equal division decreases. It is speculated that as the feature dimension increases, the complexity of the model increases with it, potentially leading to an overfitting phenomenon that results in deteriorating performance.
By combining the results of multiple cross-validations and comparing the scores of the feature dimensions, it is evident that the average score of RF-RFECV is the highest when the feature dimension is 11. This allows for the screening out of less important features in order to reduce feature redundancy. Therefore, the number of feature dimensions is selected to be 11, which is used to represent the overall information of the sample. Simultaneously, the feature importance of each feature is obtained, as shown in Figure 6, and the top 11 features in the figure are selected. The features are ranked in terms of their importance in descending order, as follows: extraction order, h, U, the mean value of the gray level co-occurrence matrix, Y, autocorrelation, v, V, contrast, s, and finally homogeneity. After screening by RF-RFECV, these 11 features are selected as mixed features.

5.1.3. Validation of Feature Validity

Upon obtaining the mixed feature set, experiments were designed to validate its effectiveness. Using the Root Mean Square Error (RMSE), Mean Relative Error (MEANRE), Maximum Relative Error (MAXRE), and Coefficient of Determination (R2) as evaluation indices, experiments were designed to compare different numbers of residual attention units. The calculation formula for each evaluation index is as follows:
R M S E = 1 n i = 1 n y i y ¯ i 2
M ean R E = 1 n i = 1 n y i y ¯ i y i × 100 %
M a x R E = m a x y i y ¯ i y i × 100 %
R 2 = 1 i = 1 n y i y ¯ i 2 i = 1 n y i y m e a n 2
In the above formula, n represents the total number of samples, y i is the actual value of the sample, y ¯ i is the predicted value of the model, and y m e a n represents the average of all predicted values. We hope that RMSE, MEANRE, and MAXRE are small, and that R 2 is close to 1, as these values reflect better accuracy and predictive ability on the part of the model.
The BP neural network, Support Vector Machine (SVM), Gradient Boosting Regression Tree (GBDT), and Extreme Learning Machine (ELM) were established as the component content prediction models. The HSV feature set, the combination of HSV and YUV feature set, and the mixed feature set selected in this paper are divided into a training set and a prediction set. The parameters of these models were set and adjusted to their best for comparison and verification. The experimental results are shown in Table 1 and Figure 7.
As per Table 1, while the MeanRE and RMSE of the BP and GBDT models under mixed features are superior, MaxRE is inferior to the HSV feature set. It is speculated that this may be due to the fact that for simpler models the mixed feature set may contain too many features, making the model susceptible to the influence of low importance features. This can affect the prediction performance of the model, resulting in poorer MaxRE results. Comparing the four prediction models, in most cases the combined performance using the mixed features is better than that using either HSV alone or HSV+YUV features.
As depicted in Figure 7, the overall performance of mixed features surpasses that of single color features under the MeanRE and RMSE indicators. The coefficient of determination (R2 index) for each model is higher with mixed features than with other features. Upon analyzing the combined features of HSV and HSY+YUV, it is evident that the average performance of the combined features of HSY+YUV is superior to that of using only HSV characteristics. The speculated reason for this is that although the HSV space can reflect most of the information of component content, the total information is less than that of other feature sets, resulting in a suboptimal prediction effect.
In summary, while it is generally the case that more features can encompass more information, an excess of information can lead to feature redundancy. Simultaneously, spatial texture characteristics and extraction order can play a supportive role in predicting the component content of rare earth elements. Moreover, the overall performance of mixed features filtered by the RF-RFECV algorithm is superior to that of single color features, indicating that mixed features can provide more comprehensive and richer feature information.

5.2. Determination of Network Structure Parameters

In light of the dataset’s size, the number of convolution kernels in the first layer was set to 8. Simultaneously, experiments were conducted to compare varying numbers of residual attention units in order to select the optimal number. Assuming that the number of convolution kernels of the residual attention unit in the first layer is 16, the number of convolution kernels in the nth layer will be 16 × 2 n 1 . We assess the model’s performance with the number of residual attention units ranging from 1 to 6, and the experimental results are displayed in Figure 8.
In Figure 8, the horizontal axis represents the number of residual attention units, while the vertical axis represents the value of each evaluation index. As can be seen from the figure, when the number of residual attention units is 3 and 5, the three evaluation indicators of MeanRE, MaxRE, and mean RMSE are significantly smaller than other values. Therefore, the optimal number of residual attention units should be either 3 or 5. Upon closer observation, it is apparent that when the number of residual attention units is 3, the three evaluation index values of MeanRE and RMSE are smaller than when the number of residual attention units is 5. Simultaneously, when the performance of the model does not vary significantly, a smaller model is preferable, consider the constraints of model size and physical memory.
Ultimately, the structure of the MRAB-CNN model is determined as follows. The first layer consists of the mixed features after screening. After convolution and fusion of the two convolution kernels, the feature fusion layer is obtained. The attention mechanism is then added to the feature fusion layer to obtain the fusion attention layer (Attention 1), followed by the connection of three residual attention units, with the number of channels being 16, 32, and 64. Global average pooling is performed on the last layer of the residual attention unit, leading to the fully connected layer. Simultaneously, the fusion attention layer is connected to the fully connected layer to form a large residual structure. Finally, the model prediction is made.

5.3. Model Training Strategy

The MRAB-CNN model proposed in this paper takes 11 input variables, with the output being the predicted value of component content. After finalizing the model structure, the MRAB-CNN model underwent training. For comparison, we used the RAB-CNN and Deep Neural Network (DNN) models. The DNN model was configured with five layers, each containing 64 neurons, while the parameters for the RAB-CNN model were kept consistent with those of the MRAB-CNN model.
The training strategy is outlined as follows. Initially, the data were standardized. Subsequently, the Adam algorithm was employed as the model optimization algorithm and the Mean Square Error was used as the loss evaluation function. The initial learning rate was set at 0.0005 and the activation function was uniformly set as Relu. Considering the small size of the model’s data, the batch size was set to 2, and the model was trained over 350 iterations. The loss curve during the training process is depicted in Figure 9. As can be discerned from the figure, the MRAB-DNN model designed in this paper achieves the convergence effect of the DNN model in 200 rounds when the training times is 20 rounds. This reflects the rapid convergence of the residual attention block, which can amplify the effective features’ weights and accelerate the model’s convergence during training. The convergence rate of the MRAB-DNN model with a large residual structure is slightly higher than that of the RAB-DNN model during training. By assessing the convergence of each model in Figure 9, it can be seen that the MRAB-DNN model can reach a state of convergence within a limited number of training times. Therefore, it can be applied to the prediction of element component content in the rare earth extraction process.

5.4. Model Comparison

To validate the efficacy of the MRAB-CNN model, we initially utilize mixed features as datasets. During the experimental process, we established a variety of prediction models and compared them using identical test and training data. By adjusting the model’s parameters and configuration, it was possible to achieve an optimal state. Subsequently, through comparative experiments, we evaluated and compared the performance of these models on the same dataset. The corresponding parameters of each model are as follows:
(1)
Genetic Algorithm optimized Extreme Learning Machine (GA-ELM). The Genetic Algorithm was configured with a population size of 20 and ran for 100 iterations. The crossover probability was set to 0.01, while the mutation probability was set at 0.7. The Extreme Learning Machine was configured with 25 nodes.
(2)
Least Squares Support Vector Machine (LSSVM). This model employs the radial basis kernel function (rbf). The penalty factor (C) was set at 70 and the kernel function coefficient (gamma) was 0.11.
(3)
Xgboost model. The model was configured with 84 decision trees (n_estimators). The learning rate was set to 0.205, while the maximum depth of each decision tree (max_depth) was 15. The minimum sample weight of the child nodes (min_child_weight) was set to 8, and the parameter of the L2 regularization term (lambda) was set to 3.31.
(4)
RAB-DNN. This model has an attention layer with eight channels and contains three residual attention units. The number of channels was set to 16, 32, and 64, respectively, across the layers. The fully connected layer of the model was configured with 256 neurons.
The aforementioned models were individually trained and used for predictions. Evaluation indices such as MeanRE, MaxRE, RMSE, and R2 were employed to obtain a comparison of each model, as depicted in Table 2. Simultaneously, the relative error percentage results of each model’s predicted values were graphed, as illustrated in Figure 10.
As per the data presented in Table 2, the GA-ELM model exhibits commendable performance among the machine learning models discussed above, with an RMSE of merely 0.7978. However, the prediction accuracy of the MRAB-CNN model surpasses that of the GA-ELM model by 44.3%, indicating its superior performance.
As evidenced by Table 2 and Figure 10, the MRAB-CNN model proposed in this paper outperforms other models in terms of MeanRE, MaxRE, RMSE, and R2 evaluation indicators. The relative error percentage curve of the model exhibits minimal fluctuation. The overall performance of the improved RAB-DNN model surpasses that of other machine learning models. This demonstrates the effectiveness of the one-dimensional convolution model based on multiple residual attention blocks proposed in this paper for predicting the content of rare earth elements.
Simultaneously, a comparison of the optimal values of this model with those of other models reveals that the average relative error of the MRAB-DNN model is 0.4465% lower than that of the GA-ELM model. The maximum relative error is 1.3741 lower than that of the Xgboost model, the root mean square error is 0.2882 lower than that of the RAB-CNN model, and the coefficient of determination is 0.00041 higher than that of the RAB-CNN model. Furthermore, the maximum relative error of this method is within 3%, which is less than the 5% requirement for component content detection error set by rare earth extraction and separation enterprises, thereby meeting their detection requirements. In conclusion, the one-dimensional convolution prediction model based on the residual attention block proposed in this paper exhibits strong comprehensive performance and can be applied to the detection of component content in rare earth extraction production sites.

5.5. Contributions

In our endeavor to predict the content of rare earth elements, a meticulous comparison was carried out relative to extant literature. To illustrate, in [9,10] the authors solely utilized the color features of rare earth solution images, focusing on the first-order moments of the H, S, and I components. Conversely, in [11] the authors incorporated a slightly broader scope, extracting ten color features from the images, which included RGB, HIS, RR, RG, RB, and CVA, among others. Nevertheless, these singular feature extraction methodologies exhibit pronounced limitations, often inadequately representing the complete information spectrum present in the images. This is evident in their Root Mean Square Error (RMSE) values of 1.7062, 0.6358, and 1.1480 respectively. In stark contrast, our proposed method yielded a significantly reduced RMSE of 0.4441, underscoring its superiority.
To overcome this limitation, our study proposes a multi-feature extraction method that includes color features, spatial texture features, and key features. This approach enables us to capture more diverse information, significantly improving the predictive performance of the model. Additionally, we introduced a model combining one-dimensional convolutional neural networks with residual attention blocks (MRAB-CNN) to better utilize the extracted multi-feature information. Our experimental results demonstrate that the MRAB-CNN model exhibits outstanding performance in predicting rare earth element content, and the size of model is just 3.3 MB. This research provides robust support for real-time and rapid content analysis in rare earth extraction processes, offering a more accurate and efficient solution built upon the foundation of the existing literature.

6. Conclusions

In the prediction of rare earth component content, the single color feature of the extracted image often fails to encapsulate all the image’s information, resulting in suboptimal prediction outcomes from soft sensor models. Therefore, in this paper we propose a one-dimensional convolution method for Pr/Nd component content prediction based on multiple residual attention blocks. Initially, machine vision technology is employed to extract the color space information of the image. Subsequently, the gray level co-occurrence matrix is utilized to extract the spatial texture feature information of the image. Simultaneously, a simulation of the actual field situation is conducted and the corresponding extraction order information is added to the component content of the rare earth element solution that requires prediction. The RF-RFECV algorithm is then used to filter the aforementioned features to obtain mixed features. Finally, the BP, SVM, GBDT, and ELM models were validated on different feature sets. Our experimental results indicate that the multi-feature extraction method can provide more information and enhance the prediction accuracy of the model.
In response to the above mixed features, this paper introduces a one-dimensional convolutional network (MRAB-CNN) based on residual attention blocks. The model integrates the residual network with the attention mechanism, enabling it to extract deeper data information while suppressing the weight ratio of noise through the attention mechanism. When compared with GA-ELM, LSSVM, Xgboost, and RAB-CNN models, the root mean square error of the model in predicting rare earth element component content is the best. This model exhibits exceptional performance in the prediction task of rare earth component content and can meet the real-time and rapid detection requirements of rare earth component content at the extraction site.

Author Contributions

Conceptualization, F.X. and J.C.; methodology, F.X.; software, J.C.; validation, J.Z.; formal analysis, F.X.; resources, J.C.; writing—original draft preparation, F.X.; writing—review and editing, F.X., J.C. and J.C.; supervision, J.C.; funding acquisition, F.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the National Key Research and Development Program of China under Grant 2020YFB1713700, in part by the Regional Project of NSFC under Grant 61963015 and 62363010, and funded by Science and Technology Research Project of Education Department of Jiangxi Province (GJJ190314), and the Graduate Innovation Fund Project (YC2021-B144, YC2022-S526).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chai, T.; Yang, H. Situation and Developing Trend of Rare-Earth Countercurrent Extraction Processes Control. J. Rare Earths 2004, 22, 604–610. [Google Scholar]
  2. Pathak, S.; Jayabun, S.; Rajeswari, B.; Pathak, N.; Mohapatra, M.; Sengupta, A.; Kadam, R. Determination of trace metallic constituents in nuclear-grade BeO matrixby DC arc carrier distillation andICP-AES: A comparative evaluation. At. Spectrosc. 2020, 40, 215–220. [Google Scholar] [CrossRef]
  3. Zhang, Y.; Sun, Y.; Zhou, J.; Yang, J.; Deng, J.; Shao, J.; Zheng, T.; Ke, Y.; Long, T. Preparation of REE-doped NaY (WO 4) 2 single crystals for quantitative determination of rare earth elements in REE: NaY (WO 4) 2 laser crystals by LA-ICP-MS. Anal. Methods 2022, 14, 4085–4094. [Google Scholar] [CrossRef] [PubMed]
  4. Silachyov, I.Y. Combination of instrumental neutron activation analysis with x-ray fluorescence spectrometry for the determination of rare-earth elements in geological samples. J. Anal. Chem. 2020, 75, 878–889. [Google Scholar] [CrossRef]
  5. Xia, J. Recent status of La and Ce quantitative analysis by UV-VIS spectrophotometry. J. Yulin Norm. Univ. (Nat. Sci. Ed.) 2001, 22, 77–79. [Google Scholar]
  6. Yang, H.; Gao, Z.; Lu, R. Component content detection method based on color feature recognition of rare earth ions. Chin. J. Rare Earth 2012, 30, 108–112. [Google Scholar]
  7. Lu, R.; Yang, H. Soft measurement for component content based on adaptive model of Pr/Nd color features. Chin. J. Chem. Eng. 2015, 23, 1981–1986. [Google Scholar] [CrossRef]
  8. Lu, R.; Ye, Z.; Yang, H.; He, F. Prediction of component content in praseodymium/neodymium extraction process by multiple RBF models. J. Chem. Eng. 2016, 67, 8. [Google Scholar]
  9. Zhu, J.; Zhang, X.; Yang, H.; Lu, R. Soft measurement of praseodymium/neodymium component content under single light condition. J. Chem. Eng. 2019, 70, 9. [Google Scholar]
  10. Lu, R.; He, Q.; Yang, H.; Zhu, J. Multi-component content prediction of rare earth mixed solution based on GA-ELM. Comput. Eng. 2021, 47, 8. [Google Scholar]
  11. Lu, R.; Deng, B.; Yang, H.; Zhu, J.; Yang, G.; Dai, W. Prediction of praseodymium/neodymium component content based on improved GRA-real-time learning algorithm. Control Decis. 2022. Available online: http://www.cnki.com.cn/Article/CJFDTotal-KZYC20221109006.htm (accessed on 10 November 2022).
  12. Haralick, R.M.; Shanmugam, K.; Dinstein, I. Textural features for image classification. IEEE Trans. Syst. Man Cybern 1973, 3, 610–621. [Google Scholar] [CrossRef]
  13. Zhao, L.; Qin, Y.; Gao, G.; Kuang, G. High resolution SAR image construction area detection using GLCM texture analysis. J. Remote Sens. 2009, 13, 475–490. [Google Scholar]
  14. Wang, C.; Shi, Y.; Hou, C. Identification of Lycium barbarum planting area based on Sentinel-2A image. J. Ecol. 2022, 41, 1033. [Google Scholar]
  15. Yogeshwari, M.; Thailambal, G. Automatic feature extraction and detection of plant leaf disease using GLCM features and convolutional neural networks. Mater. Today Proc. 2021, 81, 530–536. [Google Scholar] [CrossRef]
  16. Jeon, H.; Oh, S. Hybrid-recursive feature elimination for efficient feature selection. Appl. Sci. 2020, 10, 3211. [Google Scholar] [CrossRef]
  17. Chen, C.; Zhang, Q.; Yu, B.; Yu, Z.; Lawrence, P.J.; Ma, Q.; Zhang, Y. Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier. Comput. Biol. Med. 2020, 123, 103899. [Google Scholar]
  18. Song, X.F.; Zhang, Y.; Gong, D.W.; Sun, X.Y. Feature selection using bare-bones particle swarm optimization with mutual information. Pattern Recognit. 2021, 112, 107804. [Google Scholar] [CrossRef]
  19. Liu, L.; Kuang, G. Summary of image texture feature extraction methods. Chin. J. Image Graph. 2009, 14, 14. [Google Scholar]
  20. Zhang, Y.; Zhang, Y.; Zhang, B. Desktop dust detection algorithm based on gray gradient co-occurrence matrix. Comput. Appl. 2019, 39, 6. [Google Scholar]
  21. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  22. Fukushima, K.; Miyake, S. Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognit. 1982, 15, 455–469. [Google Scholar] [CrossRef]
  23. Kamilaris, A.; Prenafeta-Boldú, F.X. A review of the use of convolutional neural networks in agriculture. J. Agric. Sci. 2018, 156, 312–322. [Google Scholar] [CrossRef]
  24. Cun, Y.L.; Boser, B.; Denker, J.; Henderson, D.; Jackel, L. Handwritten digit recognition with a backpropogation network. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 27–30 November 1989. [Google Scholar]
  25. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
  26. Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
  27. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
  28. He, K.; Zhang, X.; Ren, S.; Sun, J. Identity Mappings in Deep Residual Networks; Springer International Publishing: Cham, Switzerland, 2016; pp. 630–645. [Google Scholar]
Figure 1. Random forest structure.
Figure 1. Random forest structure.
Applsci 13 11086 g001
Figure 2. Flow chart showing the process of obtaining the optimal feature set.
Figure 2. Flow chart showing the process of obtaining the optimal feature set.
Applsci 13 11086 g002
Figure 3. Residual attention block composition diagram.
Figure 3. Residual attention block composition diagram.
Applsci 13 11086 g003
Figure 4. EMRAB-CNN model structure diagram.
Figure 4. EMRAB-CNN model structure diagram.
Applsci 13 11086 g004
Figure 5. RF-RFECV feature dimension selection diagram.
Figure 5. RF-RFECV feature dimension selection diagram.
Applsci 13 11086 g005
Figure 6. Feature importance score plot.
Figure 6. Feature importance score plot.
Applsci 13 11086 g006
Figure 7. Comparison diagram of different feature sets. (a) MeanRE; (b) MaxRE; (c) RMSE; (d) R2.
Figure 7. Comparison diagram of different feature sets. (a) MeanRE; (b) MaxRE; (c) RMSE; (d) R2.
Applsci 13 11086 g007
Figure 8. Residual attention unit comparison diagram.
Figure 8. Residual attention unit comparison diagram.
Applsci 13 11086 g008
Figure 9. Model training loss curve change diagram.
Figure 9. Model training loss curve change diagram.
Applsci 13 11086 g009
Figure 10. Relative error comparison diagram of the predicted value of the model.
Figure 10. Relative error comparison diagram of the predicted value of the model.
Applsci 13 11086 g010
Table 1. Comparison of different feature sets.
Table 1. Comparison of different feature sets.
MODELFEATUREMeanREMaxRERMSE R 2
BPHSV2.2752.2751.5830.997
HSV+YUV1.7614.8311.5700.997
mixed1.3663.8591.4760.997
features
SVMHSV2.2749.3581.5200.997
HSV+YUV2.2739.8931.5340.997
mixed1.3244.2540.8870.999
features
GBDTHSV3.60815.903.7010.983
HSV+YUV3.50714.212.6390.991
mixed2.92417.322.3470.993
features
ELMHSV2.0108.2521.5310.997
HSV+YUV1.8638.4091.3640.997
mixed1.9429.9051.1820.998
features
Table 2. Comparison table of each model.
Table 2. Comparison table of each model.
MODELMeanREMaxRERMSER2
GA-ELM1.15545.25580.79780.99925
LSSVM1.29995.2851.05470.99868
Xgboost1.84483.99631.4740.99743
RAB-CNN1.1894.91720.73230.99936
MRAB-CNN0.70892.62220.44410.99977
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, F.; Chen, J.; Zhu, J. Prediction of Pr/Nd Element Content Based on One-Dimensional Convolution with Multi-Residual Attention Blocks. Appl. Sci. 2023, 13, 11086. https://doi.org/10.3390/app131911086

AMA Style

Xu F, Chen J, Zhu J. Prediction of Pr/Nd Element Content Based on One-Dimensional Convolution with Multi-Residual Attention Blocks. Applied Sciences. 2023; 13(19):11086. https://doi.org/10.3390/app131911086

Chicago/Turabian Style

Xu, Fangping, Jun Chen, and Jianyong Zhu. 2023. "Prediction of Pr/Nd Element Content Based on One-Dimensional Convolution with Multi-Residual Attention Blocks" Applied Sciences 13, no. 19: 11086. https://doi.org/10.3390/app131911086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop