A Novel Named Entity Recognition Algorithm for Hot Strip Rolling Based on BERT-Imseq2seq-CRF Model

Jing, Fengwei; Zhang, Mengyang; Li, Jie; Xu, Guozheng; Wang, Jing

doi:10.3390/app122211418

Open AccessArticle

A Novel Named Entity Recognition Algorithm for Hot Strip Rolling Based on BERT-Imseq2seq-CRF Model

by

Fengwei Jing

,

Mengyang Zhang

,

Jie Li

,

Guozheng Xu

and

Jing Wang

^*

National Engineering Research Center for Advanced Rolling Technology and Intelligent Manufacturing, University of Science and Technology Beijing, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(22), 11418; https://doi.org/10.3390/app122211418

Submission received: 11 October 2022 / Revised: 7 November 2022 / Accepted: 8 November 2022 / Published: 10 November 2022

(This article belongs to the Special Issue Practical Applications of New Optimization Methods and Intelligent Control)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Named entity recognition is not only the first step of text information extraction, but also the key process of constructing domain knowledge graphs. In view of the large amount of text data, complex process flow and urgent application needs in the hot strip rolling process, a novel named entity recognition algorithm based on BERT-Imseq2seq-CRF model is proposed in this paper. Firstly, the algorithm uses the BERT preprocessing language model to mine the dependencies in the domain text and obtain the corresponding representation vector. Then, the representation vector is sent to the encoder layer, and the output vector is input to the decoder at the same time, on the premise that the original model only considers the semantic vector. The Teacher-Forcing mechanism is integrated into the decoder layer to randomly modify the labeling results, and error accumulation is avoided to guarantee the sequence recognition effect. Finally, the validity of the labeling results is checked according to the conditional random field constraints, and the overall labeling quality of the algorithm is improved. The experimental results show that this model can efficiently and accurately predict the physical label of hot strip rolling, and the model performance index is better than other models, with the F1-Score reaching 91.47%. This model further provides technical support for information extraction and domain knowledge graph construction of hot strip rolling.

Keywords:

named entity recognition; hot strip rolling; Imseq2seq; BERT; knowledge graph

1. Introduction

The steel industry is one of the pillar industries supporting the rapid development of economy and has achieved rapid growth [1]. However, with the advent of the era of intelligence and data, steel manufacturing is facing many difficulties in on-site data management, storage, and application [2,3,4]. Hot strip rolling is an important scene in the field of steel manufacturing. There are a lot of work logs and technical documents in actual production. These field texts contain various professional information such as rolling process, equipment accuracy, and product quality. How to accurately and effectively extract domain information with application value from unstructured data has gradually become a hot spot in the industry [5,6]. In 2012, Google proposed the concept of a knowledge graph, which has rapidly fermented in recent years, becoming a hot topic of discussion in many fields, and also pointed out a way for the transformation and the steel industry upgrading.

The construction of a knowledge graph mainly includes knowledge extraction, knowledge update, knowledge processing, and knowledge application. In addition, knowledge extraction is also called information extraction, which aims to make targeted selection of unstructured data, determine the domain entities worthy of attention, and complete the type annotation and relationship binding. Then, Named Entity Recognition (NER) is mainly used to locate important text entities and predict the corresponding labelling entity type. The early NER algorithm is mainly based on rules or statistical machine learning [7]. The rule-based entity recognition method mainly relies on the artificially constructed rulebase, the recognition accuracy is high, but the execution efficiency is too low and the portability is poor. Later, statistical machine learning gradually replaced rule-based methods. For example, Conditional Random Field (CRF) improved the accuracy of model prediction by training massive data, and got rid of the problem of manual dependence to the greatest extent. However, the defect of the low generalization ability of the model is still not fundamentally solved.

In recent years, with the development of computer technology, NER based on deep learning has gradually become the research hotspot. Compared with statistical machine learning methods, the recognition algorithm based on deep learning has high execution efficiency, high speed, and strong model generalization ability [8]. Taking advantage of parallel computing, Convolutional Neural Networks (CNN) can quickly complete the task of entity recognition. However, due to the limitation of the model structure, it cannot rely on long-distance text information to improve the labeling effect. The sequential structure of Recurrent Neural Networks (RNN) is similar to text representation, which can effectively capture textual context information, but there is the problem of gradient explosion or gradient disappearance. Then, Bi-directional Long and Short Term Memory Networks (BiLSTM) and Bi-directional Gated Recurrent Unit (BiGRU) are used to improve and optimize RNN. They effectively avoid gradient disappearance and gradient explosion through clever gating design, and mine text dependencies in both directions, becoming the preferred method for information extraction in the era of big data.

In addition, NER is also integrating the information advantages of unsupervised learning, combining Bidirectional Encoder Representation from Transformers [9] (BERT) or A Lite BERT [10] (ALBERT) to enhance the semantic understanding of domain texts from a new perspective. The self-attention mechanism is used to mine the associated information in the unstructured data in both directions, which greatly improves the model recognition accuracy and application feasibility.

In this study, NER is regarded as a ‘translation task’ of equal-length sequences, and the seq2seq model is improved and optimized in the way that, under the condition of retaining the original compression information of the encoder, the preliminary prediction information of the encoder is transmitted to the decoding unit, and the prediction result is randomly corrected with the support of Teacher-Forcing and passed to the next sequential unit. Finally, the BERT-ImSeq2seq-CRF model is constructed by combining BERT and CRF to deeply mine the related information in the paper, and it is applied to the process information entity extraction task of hot strip rolling, and obtained a significance domain entity extraction effect. The improved model can provide certain reference value and technical support for the application of complex hot strip professional information management.

2. NER Application Status

Although the research progress of NER algorithm is in full swing, its application in the field is not satisfactory. Due to the background characteristics and historical accumulation of some fields, NER application is mainly concentrated in industries with uniform statistical information, such as agriculture [11,12], medicine [13,14,15], news [16], electricity [17,18], and so on.

In 2012, Google officially proposed the knowledge graph to improve user search experience and query efficiency [8]. Subsequently, all walks of life also realized the necessity of knowledge management and application in time, and joined the ranks of domain graph construction. Ref. [12] proposed an ME+R+BIOES entity labeling method based on the specific field information of crop diseases and insect pests, which can label and classify domain entities and entity relationships at the same time, improve the labeling efficiency, and solve the problem of one-to-many relationship extraction. Then, the cross-validation method was used to train and test 1619 pieces of crop pest data, and the experimental F1-Score was 91.34%. In view of the text form and storage characteristics of power information, Ref. [19] added the attention mechanism based on the ALBERT-BiGRU-CRF model. The model can fully capture the dependencies and contexts between sentences, and focus on key information. The experimental training effect is 7.3% higher than the original model on average. In view of the low efficiency of some entity recognition models and the inconsistent record characteristics of coal mine case representations, Ref. [20] presented an entity recognition model applied to the field of coal mine accidents. The ALBERT pre-training model is used to obtain the word vector representation of the text sequence, and four Iterated Dilated Convolutional Neural Networks (IDCNN) with the same structure are used to calculate and label the obtained word vector. Finally, the conditional random field model is used to constrain the predicted labels, which improves the model’s ability to capture long text information, and achieves good results in the entity labeling task of coal mine accident cases.

At present, the research progress of NER with information extraction in some fields is obvious to all, but there is still a gap in the field of steel manufacturing. Aiming at the application requirements and research gaps in steel manufacturing, this paper proposes the NER algorithm for hot strip rolling. The model is preliminarily verified by third-party data sets (Weibo dataset and news dataset). The feasible model is applied to the entity recognition task of hot strip rolling, and provides a cutting-edge technical solution for professional text information processing.

3. Model Design

In this study, the process-accurate NER of strip equipment is regarded as a kind of “translation task” of isometric sequences. Considering the task requirements of NER, the original seq2seq model is improved and optimized as the original compression information of the encoder is retained, the preliminary prediction information of the encoder is transmitted to the decoding unit, and the prediction result is randomly corrected with the help of Teacher-Forcing and passed to the next. It is achieved that predictive efficiency of the model is improved. Finally, the BERT-ImSeq2seq-CRF model was constructed by combining BERT and CRF to deeply mine the related information in the text, and it was applied to the process information entity extraction task of strip hot rolling, and obtained a good domain entity extraction effect. The existing model also provides a certain reference value and technical support for the application of complex hot rolling professional information management.

3.1. Seq2seq Model Improved

The seq2seq model belongs to a common Encoder–Decoder structure. The model successfully applies deep learning networks to tasks such as translation or question answering, and achieves impressive performance in conversational robots, machine translation, and speech recognition applications [21,22].

The main idea of the seq2seq model is to use two neural networks as an encoder and a decoder, respectively. The encoder is responsible for converting the input sequence into a vector of a specified length. The vector can be regarded as the semantic information of the input sequence. This process is called encoding. Generally, the hidden state of the last unit is taken as the semantic vector C. The decoder is responsible for generating an output sequence of a specified length based on the semantic vector C. This process is called decoding. The decoder input includes two forms: in Figure 1a, the semantic vector C obtained by the encoder is added to the decoder network as the initial state, the semantic vector C only participates in the operation as the initial state, and then the implicit state of the previous unit continues to propagate downward as the implicit state input of the unit. Figure 1b shows that the semantic vector C is involved in the operation of the decoder at all times, and the implicit state of the previous unit is still the implicit state input of the current unit.

Inspired by the seq2seq model, NER can be regarded as an equal length sequence to sequence ‘translation’ task. However, the information contained in semantic vector C is too simple and general. It is far from enough to rely on semantic vector C to accurately predict sequence labels. In order to give full play to the information obtained by the encoder and the characteristics of the decoder itself, the paper improves the original seq2seq model, recorded as Imseq2seq, and the model structure is shown in Figure 2.

The length of the input sequence in Figure 2 is 4, and the sequence vector is quantized and sent to the encoding layer, and the semantic vector and preliminary prediction results are obtained through the encoder. To solve the problem in which the semantic vector cannot support entity label prediction, this model adds the encoder prediction results to the decoder, and combines a Teacher-Forcing mechanism to randomly correct the labeling results of the previous unit, and then sends them to the current decoding unit. Compared with the original model, the improved seq2seq makes full use of the output information of the encoder, and adds a Teacher-Forcing mechanism to the decoder, which helps to improve the model speed and labeling accuracy. In the actual model design, in order to ensure the consistency of the decoder data input, a ‘START’ vector is inserted at the beginning of each sentence. BiLSTM is used as encoder to perform preliminary prediction and vector compression on input information. The decoder uses a unidirectional LSTM to process and label the input information orderly.

3.2. Model Design

The structure of the overall NER model is shown in Figure 3, which mainly includes the input layer, preprocessing layer, encoding layer, decoding layer, and CRF layer.

When the text is input into the model in batches, the input information is pre-trained with BERT, the context dependencies in the text are fully mined through the Transformer module. The representation vector of Chinese characters is obtained and input into the improved seq2seq model for sequence labeling. In the improved seq2seq model coding layer, BiLSTM is used to compress the input data vector and preliminarily annotate the input data to obtain the semantic vector C and the preliminary labeling result Y of the text, and then the semantic vector C and the labeling result Y are sent to the decoder together. The decoder uses unidirectional LSTM to label the input sequence in sequence, and considers the output result of the previous unit. The Teacher-Forcing mechanism is used to randomly correct the output result to avoid the continuous accumulation of errors and affect the labeling effect. Finally, the prediction results of the decoding layer are sent to the CRF layer. Based on the label constraints, the validity of the labeling results is checked to obtain the best recognition effect. Next, the key links in this model will be introduced gradually.

3.3. Preprocessing Layer

The preprocessing layer uses the BERT pre-trained language model to obtain the text representation vector. Different from the traditional unidirectional language model, BERT innovatively uses a Masked Language Model (MLM) to build a bilateral deep network, which can fully capture textual sentence-level features and perform extremely well in many Natural Language Processing (NLP) tasks [23,24].

Bidirectional Transformer is a key module in BERT, mainly including encoder and decoder, both of which have similar structures. Figure 4 shows the encoder module on the left side of Transformer, in which the multi-head self-attention mechanism plays a crucial role.

Multi-Head Attention is mainly used to mine the dependencies between each word, and trim the weight coefficient matrix to obtain the word representation based on the degree of association between the words:

A t t e n t i o n (Q, K, V) = S o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(1)

where Q indicates Query vector generated from word vector, K represents Key vector generated from word vector, V denotes Value vector generated from word vector,

d_{k}

is the input word vector dimension. At the same time, in order to capture the dependencies of multiple dimensions, the Transformer adopts the multi-head attention mechanism, and obtains many pairs of Query, Key and Value through h linear transformation, performs attention calculation on them respectively, and finally splices the results of h times to obtain multi-head. The attention result is calculated as follows:

\begin{matrix} h e a d_{i} & = & A t t e n t i o n (Q_{i}, K_{i}, V_{i}) \end{matrix}

(2)

\begin{matrix} M u l t i H e a d (Q, K, V) & = & C o n c a t (h e a d_{1}, \dots, h e a d_{h}) W^{o} \end{matrix}

(3)

Due to the characteristics of the algorithm, the model is not sensitive to the text order information. In order to distinguish the expression semantics of the same word in different order, the text position vector and segment vector are considered in the BERT model. The model input includes word vector, position vector, and segment vector. Through the multi-layer transformer structure, the text context information is fully mined to obtain the text representation vector, which is applied to different downstream tasks through parameter fine-tuning.

3.4. Encoder–Decoder Layer

The encoder and decoder use LSTM [25] network, which is an improved model of the RNN network. As shown in Figure 5, a forget gate

f_{t}

, a memory gate

i_{t}

, and an output gate

o_{t}

are added to the RNN unit, which can effectively capture the context information of the sentence. It is a commonly used network to solve the problem of information dependence in longer sequences.

The forget gate is responsible for deciding whether to discard the information in the unit. The gate reads the output of the previous unit

h_{t - 1}

and the input of this unit

x_{t}

. The output is a value between

(0, 1)

, ‘1’ means all are reserved, ‘0’ means all are discarded, and transmitted to the cell state.

f_{t} = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f})

(4)

The memory gate is responsible for determining whether to store new information in the cell unit. The input is still the output of the previous unit

h_{t - 1}

and the input of this unit

x_{t}

. The Sigmoid activation function generates a value between

(0, 1)

to determine the degree of information retention. At the same time, a new tanh layer is created to generate a temporary cell state

{\tilde{C}}_{t}

to update and replace the original cell state

C_{t}

:

\begin{matrix} i_{t} & = & σ (W_{i} [h_{t - 1}, x_{t}] + b_{i}) \end{matrix}

(5)

\begin{matrix} {\tilde{C}}_{t} & = & tanh (W_{c} [h_{t - 1}, x_{t}] + b_{c}) \end{matrix}

(6)

\begin{matrix} C_{t} & = & f_{t} \times C_{t - 1} + i_{t} \times {\tilde{C}}_{t} \end{matrix}

(7)

The output gate is responsible for determining the final information and outputting it. Based on the output of the previous unit

h_{t - 1}

, the input

x_{t}

and the cell state of this unit

C_{t}

obtain the final result

h_{t}

with the help of sigmoid and tanh activation functions:

\begin{matrix} o_{t} & = & σ (W_{o} [h_{t - 1}, x_{t}] + b_{o}) \end{matrix}

(8)

\begin{matrix} h_{t} & = & o_{t} \times tanh (C_{t}) \end{matrix}

(9)

3.5. CRF Layer

CRF is a Markov random field dedicated to word segmentation or part-of-speech tagging tasks. The model is characterized by the global normalization of sequence probability and the free setting of feature functions, which directly avoids the conditional independence assumption of hidden Markov models and solves the problem of label bias in directed graph models [26,27]. The CRF model requires a random output variable Y under the random variable X, and the conditional probability model satisfies the Markov property, that is:

Pr (Y_{v} | X, Y_{w}, w \neq v) = Pr (Y_{v} | X, Y_{w}, w \sim v)

(10)

where

w \sim v

represents all nodes that have edge connections with node v in the undirected graph,

w \neq v

represents all nodes except node v, and

Y_{v}, Y_{u}, Y_{w}

represent the random variables corresponding to nodes

v, u, w

.

Because the input and output data are linear sequences in NER task, the learning model is selected as the linear chain conditional random field. Given an input sequence

X = \{X_{1}, X_{2}, \dots, X_{T}\}

of length T, the corresponding labeling sequence is

Y = \{Y_{1}, Y_{2}, \dots, Y_{T}\}

, and the conditional probability distribution of Y satisfies Equation (11). For the linear chain conditional random field, the calculation formula of the conditional profile distribution

Pr (Y | X)

is Equation (12):

\begin{matrix} Pr (Y_{i} | X, Y_{i}, Y_{2}, \dots, Y_{i - 1}, Y_{i + 1}, \dots, Y_{T}) = Pr (Y_{i} | X, Y_{i - 1}, Y_{i + 1}) \end{matrix}

(11)

\begin{matrix} Pr (Y | X) = \frac{1}{Z (X)} exp (\sum_{i = 1}^{T} \sum_{k = 1}^{K} w_{k} f_{t} (t, Y_{t}, Y_{t - 1}, X)) \end{matrix}

(12)

where

i = 1, 2, \dots, T

,

f_{k} (t, Y_{t}, Y_{t - 1}, X)

indicates the

k_{t h}

eigenvalue that the current input sequence position is t and the input is X, and the current position dimension and the previous position dimension are

Y_{t}

and

Y_{t - 1}

.

w_{k}

indicates the corresponding feature weight, and

Z (X)

indicates the normalization factor:

Z (X) = \sum_{Y} exp (\sum_{i = 1}^{T} \sum_{k = 1}^{K} w_{k} f_{k} (t, Y_{t}, Y_{t - 1}, X))

(13)

The conditional random field model calculates the conditional probability and feature expectation of different positions through the forward and backward algorithm, and uses the Viterbi algorithm to perform dynamic programming decoding. This study uses the CRF model to learn the dependency between text labels, add constraints to predict labels, avoid illegal labels, and help to obtain better labeling results.

4. Experiment Analysis

4.1. Experimental Dataset

The experimental dataset of this research mainly comes from the project plan, work log and other text materials of the process accuracy of hot strip rolling, which contains a large amount of professional information related to the process accuracy of the equipment. The processed dataset is in the form of Chinese expression, and each Chinese character is used as the smallest unit of text output and entity labeling. At the same time, each complete content ends with a period, so as to facilitate the unified processing of the model. In addition, the source of Weibo dataset and News dataset for model validation is https://github.com/hspuppy/hugbert/tree/master/ner_dataset (accessed on 9 April 2022). The processed dataset adopts the same expression form as the process accuracy dataset of hot strip rolling.

The text of hot strip rolling includes 1513 process descriptions, which are marked in the form of BIOS. The labeled data are randomly scrambled with each sentence as the smallest unit, and divided into a test set and a training set according to 3:7. The entity types include Position, Index, Parameter, Area, Object, Defect Shape, Evaluation Aspect, and Defect. The marking form is shown in Table 1.

4.2. Evaluation Metrics and Experimental Configuration

In order to objectively and accurately reflect the performance of the model and avoid the impact of the data’s own characteristics on the model performance, this study uses Precision (%), Recall (%), and F1-Score (%) in the field of information extraction as Model performance evaluation index [12], and the calculation method of each index is as Equations (14)–(16):

\begin{matrix} P r e c i s i o n & = & \frac{T P}{T P + F P} \times 100 % \end{matrix}

(14)

\begin{matrix} R e c a l l & = & \frac{T P}{T P + F N} \times 100 % \end{matrix}

(15)

\begin{matrix} F 1 - S c o r e & = & 2 \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} \times 100 % \end{matrix}

(16)

where TP represents the positive sample with correct labeling, FP represents the positive sample with incorrect labeling, FN represents the negative sample with incorrect labeling, and F1-Score represents the arithmetic average evaluation index.

The experimental environment and parameter configuration of this study are shown in Table 2.

4.3. Analysis of Experimental Results

In order to verify the considerable feasibility of the model, the paper firstly uses a third-party domain data set (Weibo dataset and news dataset) to conduct preliminary experiments on the model. The data set is divided into training set and test set according to the ratio of 7:3, and several related models are selected as controls. The improved seq2seq model is recorded as Imseq2seq. The experimental results are shown in Table 3. The original seq2seq model only sends the semantic vector C to the decoder, which covers relatively little text information, and it is difficult to accurately predict labels. The experimental effect is even far worse than the CRF model, and the F1-Score is only about 15%. The improved seq2seq model makes full use of the output information and semantic vector of the encoder. Combined with the CRF model, the comprehensive experimental effect is more impressive than BiLSTM-CRF. After adding the BERT preprocessing model, the F1-Score increases to about 89.5%.

Because the entity names contained in the third-party dataset are only person name (PER), local name (LOC) and organization (ORG), the entity types are relatively rare and the sentence structure is simple. The recognition difficulty is not high, and it is difficult to reflect the advantages of the model. Next, the model is applied to the task of entity recognition in hot strip rolling. The text expression of hot strip rolling is specialized, including entity types and entity names in many fields. At the same time, there are some entities with long names or nested entities, such as ‘spindle main motor torque’, ‘steel shifting impact fluctuation’, ‘outlet anti arching roll set pressure’, and ‘measured flow deviation of rack cooling water’, which undoubtedly increases the difficulty of entity identification.

Similarly, in order to verify the feasibility and advantages of the model, relevant models are selected as the control experiment. The model identification results for the hot strip rolling data set are shown in Table 4. Combining WeiBo dataset, news dataset, and the hot strip rolling process accuracy dataset, it is not difficult to draw the following conclusions: CRF is a classic machine learning model, which realizes the information labelling by adding constraints. The angle of processing information is relatively single, and the F1-Score is only 77.30%; the seq2seq model only transmits the paragraph compression information obtained by the encoder to the decoding layer, and the amount of information is greatly reduced, which is far from satisfying the labeling task of a single Chinese character, and the recognition effect is extremely poor; in addition, the encoder part of the Imseq2seq model adopts BiLSTM. The decoding layer adopts unidirectional LSTM and combines the Teacher-Forcing mechanism to randomly correct the prediction results, and performs secondary prediction based on various information. Compared with the BiLSTM model, the model in this paper considers information from multiple angles, the overall recognition effect of the model is also better than other models of the same level, and the F1-Score reaches 94.47%.

The specific recognition results of this model are shown in Table 5. The mixed expression of Chinese characters, letters, and numbers in some sentences does not affect the prediction performance of the model, and almost all entity tags are accurately predicted as a whole. At the same time, the prediction results of the specific entity types of the hot strip rolling text by this model are shown in Figure 6. There is a phenomenon that the prediction results of individual entity types are lower than the average value. Through the verification text, it is found that the number of entity types of ‘defect type DEF’ is relatively small. Different statements express the entity type differently, which affects the prediction level of the model to a certain extent.

Inspired by the seq2seq model, this paper constructs the NER model applied to the hot strip rolling field. Except for some entity types, the F1-Score is more than 85%, and some experimental indexes are close to 100%. It improves the efficiency and quality of entity recognition and labeling, and provides solutions and technical support for information extraction and atlas construction of hot strip rolling.

5. Conclusions

Aiming at the blank research on information extraction of hot strip rolling and the needs of industry development, this paper proposes a text NER algorithm for hot strip rolling based on BERT-Imseq2seq-CRF:

(1) Based on the seq2seq model concept, this paper regards NER as a ‘translation’ task of equal-length sequences. In addition, the existing model is improved and optimized. Make full use of the encoder to achieve text information compression and preliminary prediction, and integrate the Teacher-Forcing mechanism into the decoder unit to improve the model recognition efficiency. Compared with the existing models, Imseq2seq outperforms the task of NER. Combined with the CRF model, the F1-Score increased from 13.72% to 87.38%; adding the preprocessing model BERT, the F1-Score increased to 91.47%, and the recognition performance was the best among all models.

(2) In this study, the text form and process development requirements of technological accuracy of hot strip rolling are considered. A professional domain NER model is provided for massive professional domain information management, which can efficiently and accurately locate and label the domain information in the text. To a certain extent, it fills the application gap of knowledge extraction in the field of hot strip rolling, and improves the information management and application level of strip manufacturing industry.

(3) Although this paper has made some progress in text recognition of hot strip rolling, there is still room for progress in recognition performance and scale. In the future, we will continue to experiment on model application, improve the generalization ability of model application, and pay attention to knowledge fusion, knowledge update, and knowledge reasoning, so as to speed up the construction process of steel manufacturing knowledge graph.

Author Contributions

Conceptualization, J.W.; Investigation, G.X.; Validation, J.L.; Writing—original draft, F.J.; Writing—review and editing, M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by Fundamental Research Funds for the Central Universities (FRF-AT-20-06).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, C.M.; Zhou, D.D.; Xu, K.; Zhang, H.N. Review of intelligent manufacturing techology in steel industry. China Metall. 2018, 28, 1–7. [Google Scholar]
Wang, G.; Jiao, L.J.; Wang, H.M.; Yang, Y.J.; Wang, H.; Zhu, X.H. Development status of intelligent manfacturing in iron and steel industry in china. Environ. Eng. 2020, 38, 173. [Google Scholar]
Liu, W.Z. Current situation and thinking if intelligent manufacturing in China’s iron and steel industry. China Metall. 2020, 30, 1–7. [Google Scholar]
Zhao, X.D.; Chen, Y.R.; Guo, J.; Zhao, D.B. A Spatial-Temporal Attention Model for Human Trajectory Prediction. IEEE/CAA J. Autom. Sin. 2020, 7, 965–974. [Google Scholar] [CrossRef]
Zhang, D.; Liu, Z.; Jia, W.; Liu, H.; Tan, J. A review on knowledge graph its application prospects to intelligent manufacturing. J. Mech. Eng. 2021, 57, 90. [Google Scholar]
Zhang, J.; Zhang, X.; Wu, C.; Zhao, Z. Survey of knowledge graph construction techniques. Comput. Eng. 2022, 48, 23. [Google Scholar]
Rawat, R.; Yadav, R. Big data: Big data analysis, issues and challenges and technologies. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1022, 012014. [Google Scholar] [CrossRef]
Ma, Z.G.; Ni, R.Y.; Yu, K.H. Recent advances, key techniques and future challenges of knowledge graph. Chin. J. Eng. 2020, 42, 1254. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, p. 4171. [Google Scholar]
Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. Albert: A lite bert for self-supervised learning of language representations. Int. Conf. Learn. Represent. 2019, 11942. [Google Scholar] [CrossRef]
Chen, Y.; Kuang, J.; Cheng, D.; Zheng, J.; Gao, M.; Zhou, A. AgriKG: An agricultural knowledge graph and its applications. In International Conference on Database Systems for Advanced Applications; Springer: Cham, Switerland, 2019; p. 533. [Google Scholar]
Wu, S.S.; Zhou, A.L.; Xie, N.F. Construction of visualization domain-specific knowledge graph of crop diseases and pets based on deep learning. Trans. Chin. Soc. Agric. Eng. 2020, 36, 177. [Google Scholar]
Xu, L.; Li, J.H. Biommedical named entity recognition based on BERT and BiLSTM-CRF. Comput. Eng. Sci. 2021, 43, 1873. [Google Scholar]
Hou, M.; Wei, R.; Lu, L.; Lan, X.; Cai, H. Research review of knowledge graph and its application in medical domain. J. Comput. Res. Dev. 2018, 55, 2587. [Google Scholar]
Zhang, D.Z.; Xie, Y.H.; Li, M.; Shi, C. Construction of knowledge graph of traditional Chinese medicine based on the ontology. Technol. Intell. Eng. 2017, 3, 35. [Google Scholar]
Hu, T.T.; Dan, Y.B.; Hu, J.; Li, X.; Li, S. News named entity recognition and sentiment classification based on attention-based bi-directional long short-term memory neural network and conditional random field. J. Comput. Appl. 2020, 40, 1879. [Google Scholar]
Pu, T.J.; Tan, Y.P.; Peng, G.Z.; Xu, H.; Zhang, Z. Construction and applicatin of knowledge graph in the electric power field. Power Syst. Technol. 2021, 45, 2080. [Google Scholar]
Wang, C.; Wang, Z.F.; Li, L.G.; Xu, Z.Y.; Tian, S.M.; Pan, M.M. Research on the improvement of the power customer service knowledge graph and model construction. Distrib. Util. 2020, 37, 3–8. [Google Scholar]
Guo, Y.; Li, Y.N.; Liu, A.L.; Fang, X.K. Entity recognition method for power safety operation based on deeping learning. Video Eng. 2022, 46, 67. [Google Scholar]
Pan, L.H.; Zhao, P.P.; Gong, D.L.; Yan, H.M.; Zhang, Y.J. Combined ALBERT for named entity recognition in coal mine accident cases. Comput. Technol. Dev. 2022, 32, 154. [Google Scholar]
Xiao, X.F.; Li, S.J.; Yu, W.; Liu, J.; Liu, B. English-Chinese translation based on an improved seq2seq model. Comput. Eng. Sci. 2019, 41, 1257. [Google Scholar]
Ma, Z.Q.; Du, B.X.; Shen, J.; Yang, R.; Wan, J. An Encoding Mechanism for Seq2Seq based Multi-Turn Sentimental Dialogue Generation Model. Procedia Comput. Sci. 2020, 174, 412–418. [Google Scholar] [CrossRef]
AbdelSalam, S.; Rafea, A. Performance Study on Extractive Text Summarization Using BERT Models. Information 2022, 13, 67. [Google Scholar] [CrossRef]
Li, H.C.; Ma, Y.; Ma, Z.S.; Zhu, H. Weibo Text Sentiment Analysis Based on BERT and Deep Learning. Appl. Sci. 2021, 11, 10774. [Google Scholar] [CrossRef]
Zhou, P.; Shi, W.; Tian, J.; Qi, Z.; Li, B.; Hao, H.; Xu, B. Attention-based Bidirectional Long Short-Term Memory Networds for Relation Classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; p. 207. [Google Scholar]
Yan, R.G.; Jiang, X.; Dang, D.P. Named Entity Recognition by Using XLNet-BiLSTM-CRF. Neural Process. Lett. 2021, 53, 3339–3356. [Google Scholar] [CrossRef]
Zhai, Y.J.; Tian, J.W.; Zhao, Y. Algorithm Term Extraction and Innovation Evolution Path Construction Based on BERT-BiLSTM-CRF Model. Inf. Sci. 2022, 40, 71. [Google Scholar]

Figure 1. Typical Seq2seq structure.

Figure 2. Improved seq2seq structure.

Figure 3. Model structure.

Figure 4. Encoder structure.

Figure 5. LSTM structure.

Figure 6. Identification results of entities in hot strip rolling.

Table 1. Marking form of hot strip rolling.

Tag	Head	Middle	End
Position	B-POS	I-POS	E-POS
Index	B-IND	I-IND	E-IND
Parameter	B-PAR	I-PAR	E-PAR
Area	B-ARE	I-ARE	E-ARE
Object	B-OBJ	I-OBJ	E-OBJ
Defect Shape	B-SHA	I-SHA	E-SHA
Evaluation Aspect	B-EVA	I-EVA	E-EVA
Defect	B-DEF	I-DEF	E-DEF
Else	O	O	O

Table 2. Experimental environment and parameter configuration.

Configuration	Version/Value
Python	3.6.1
Pytorch	1.5.1
Batch size	16
Seq_max_len	512
Dropout rate	0.5
Learning Rate	0.001
LSTM_dim	256

Table 3. Identification results of the third party dataset.

Model	WeiBo Dataset			News Dataset
Model	Precision	Recall	F1-Score	Precision	Recall	F1-Score
CRF	84.31%	69.54%	76.22%	82.51%	71.83%	76.80%
seq2seq-CRF	42.50%	8.63%	14.35%	45.12%	9.39%	15.55%
BiLSTM-CRF	87.47%	84.48%	85.95%	89.06%	85.58%	87.29%
Imseq2seq-CRF	85.14%	87.32%	86.21%	90.32%	87.32%	88.73%
Model I $^{a}$	88.51%	87.76%	88.13%	91.70%	87.84%	89.73%
Model II $^{b}$	90.02%	88.58%	89.30%	93.05%	88.01%	90.46%

^a The model I is BERT-BiLSTM-CRF. ^b The model II is BERT-Imseq2seq-CRF.

Table 4. Identification results of hot strip rolling.

Model	Precision	Recall	F1-Score
CRF	82.66%	72.59%	77.30%
seq2seq-CRF	37.93%	8.38%	13.72%
BiLSTM-CRF	84.17%	89.06%	86.56%
Imseq2seq-CRF	88.72%	86.08%	87.38%
BERT-BiLSTM-CRF	90.22%	87.04%	88.60%
BERT-Imseq2seq-CRF	93.16%	89.85%	91.47%

Table 5. Example of recognition results.

Words	Real Tag	Predicted Tag	Words	Real Tag	Predicted Tag	Words	Real Tag	Predicted Tag
Zha	B-DIR	B-DIR	Cu	B-OBJ	B-OBJ	Zu	I-OBJ	I-OBJ
Ji	I-DIR	I-DIR	Zha	I-OBJ	I-OBJ	‵ $^{a}$	O	O
Gang	I-DIR	I-DIR	R	I-OBJ	I-OBJ	Juan	B-OBJ	B-OBJ
Du	I-DIR	I-DIR	1	I-OBJ	I-OBJ	Qu	I-OBJ	I-OBJ
Ping	O	O	R	I-OBJ	I-OBJ	Ji	I-OBJ	I-OBJ
Jia	O	O	2	I-OBJ	I-OBJ	Deng	O	O
Dui	O	O	‵ $^{a}$			She	O	O
Xiang	O	O	Jing	B-OBJ	B-OBJ	Bei	O	O
Bao	O	O	Zha	I-OBJ	I-OBJ	∘ $^{b}$	O	O
Kuo	O	O	Ji	I-OBJ	I-OBJ

^a ‵ represents a mark in Chinese punctuation used to set off items in a seriesmark. ^b ⚬ represents full stop.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jing, F.; Zhang, M.; Li, J.; Xu, G.; Wang, J. A Novel Named Entity Recognition Algorithm for Hot Strip Rolling Based on BERT-Imseq2seq-CRF Model. Appl. Sci. 2022, 12, 11418. https://doi.org/10.3390/app122211418

AMA Style

Jing F, Zhang M, Li J, Xu G, Wang J. A Novel Named Entity Recognition Algorithm for Hot Strip Rolling Based on BERT-Imseq2seq-CRF Model. Applied Sciences. 2022; 12(22):11418. https://doi.org/10.3390/app122211418

Chicago/Turabian Style

Jing, Fengwei, Mengyang Zhang, Jie Li, Guozheng Xu, and Jing Wang. 2022. "A Novel Named Entity Recognition Algorithm for Hot Strip Rolling Based on BERT-Imseq2seq-CRF Model" Applied Sciences 12, no. 22: 11418. https://doi.org/10.3390/app122211418

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Named Entity Recognition Algorithm for Hot Strip Rolling Based on BERT-Imseq2seq-CRF Model

Abstract

1. Introduction

2. NER Application Status

3. Model Design

3.1. Seq2seq Model Improved

3.2. Model Design

3.3. Preprocessing Layer

3.4. Encoder–Decoder Layer

3.5. CRF Layer

4. Experiment Analysis

4.1. Experimental Dataset

4.2. Evaluation Metrics and Experimental Configuration

4.3. Analysis of Experimental Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI