Anomaly Detection of Liquid Level in Mold during Continuous Casting by Using Forecasting and Error Generation

Wu, Xiaojun; Kang, Hongjia; Yuan, Sheng; Jiang, Wenze; Gao, Qi; Mi, Jinzhou

doi:10.3390/app13137457

Open AccessArticle

Anomaly Detection of Liquid Level in Mold during Continuous Casting by Using Forecasting and Error Generation

by

Xiaojun Wu

^1,*

,

Hongjia Kang

¹,

Sheng Yuan

¹,

Wenze Jiang

¹

,

Qi Gao

^2,* and

Jinzhou Mi

²

¹

School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China

²

China National Heavy Machinery Research Institute Co., Ltd., Xi’an 710016, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2023, 13(13), 7457; https://doi.org/10.3390/app13137457

Submission received: 1 May 2023 / Revised: 13 June 2023 / Accepted: 19 June 2023 / Published: 23 June 2023

(This article belongs to the Section Applied Industrial Technologies)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Anomaly detection of liquid levels in molds is an important task in continuous casting. Data that consists of a series of liquid levels in mold during a continuous casting process can be viewed as a time series, on which Time Series Anomaly Detection (TSAD) methods can be applied. However, the abnormal and normal data in the liquid data in the mold sequence share similar features. And due to manual control limitations, the anomaly sequence in liquid level in mold data lasts longer. Therefore, using existing TSAD methods based on AutoEncoders (AEs) often results in high false positive rates. In this paper, a novel framework is proposed for anomaly detection of liquid level in mold by using unsupervised deep-learning-based TSAD. The framework decomposes a time series into normal and error sequences. A forecasting network reconstructs the normal sequence to solve the first issue, which allows the proposed method to consider the context. An error extraction network generates errors from the original sequence to solve the second issue. It removes anomalies from the original sequence during training to prevent anomaly pollution and allows the forecasting network’s training to be free from anomaly pollution. A new dynamic threshold method is proposed to identify anomalies. The proposed method is evaluated on the actual casting dataset by comparing it with baseline methods. The experiment results indicate that the proposed framework outperforms some of the best anomaly detection methods in terms of accuracy, precision, and F1 score.

Keywords:

anomaly detection of liquid level in mold; time series anomaly detection; continuous casting; error generation; autoencoder; dynamic threshold

1. Introduction

Continuous casting is one of the most common methods of producing metal products. This technique involves pouring liquid metal into a mold and continuously withdrawing the solidified product from the other end. The product can be a billet, a bloom, or a slab, depending on its shape and size. Continuous casting reduces the need for intermediate steps and saves energy and material costs [1]. It is the most frequently used process to cast steel, aluminum, and copper alloys.

Continuous casting involves several stages [2]: ladle treatments, tundish operation, mold filling and solidification, secondary cooling, strand support, cutting, and straightening. Each step requires careful monitoring and control of various parameters, such as temperature [3], pressure, flow rate, composition, liquid level, drawing speed, etc. Moreover, continuous casting is influenced by many factors that are difficult to measure or model accurately, for example, turbulence and mixing phenomena in the molten metal, heat transfer across different interfaces (metal-mold-water-air), phase transformations, microstructural evolution during solidification, thermal stresses, strains induced by temperature gradients, and metallurgical reactions between metal and slag or refractory materials, etc. The current continuous casting process also involves two phases, i.e., manual control and automatic control. Once the molten steel in the mold reaches a certain level, the casting machine will be activated, and the continuous casting goes into the automatic control phase from the manual control phase. But during the manual control phase, the casting operator must adjust the stopper position by manually observing the liquid level in the mold.

The anomalous detection of liquid level in molds is a crucial technique for improving the quality of steel products in continuous casting. The fluctuation of the liquid level in the mold is closely related to the casting speed, depth, condition of the Submerged Entry Nozzle (SEN), and argon gas injection [4]. Excessive fluctuations in the liquid level in the mold can cause slag entrainment, which leads to inclusions and surface defects on the cast billet. Therefore, monitoring and controlling the liquid level in the mold during casting is vital to ensuring a stable and uniform solidification process [5]. However, conventional methods such as eddy current sensors have limitations in detecting local fluctuations or capturing dynamic changes of the liquid level in mold, and current deep-learning-based techniques are heavily focused on the automatic control phase.

The temporal features of the liquid level in the manual and automatic control phases are different. The steel liquid enters the mold during the manual control phase and does not leave. Once the continuous casting machine is activated, the drawing machine starts drawing solidified steel out of the mold. During the manual control phase, if an anomaly occurs, such as an open error stopper, it may cause the liquid level in the mold to be higher or lower than it is supposed to be. To eliminate the anomaly, the casting operator must manually identify the liquid level anomaly and operate the stopper to eliminate the error. Due to the limitations of manual control, the error in liquid level during manual control lasts longer. Due to different casting requirements, the abnormal stopper sequence can be correctly applied under another requirement or in different casting periods. Therefore, the temporal features between anomaly and normal sequences can share similar features that AEs can mistake.

Therefore, advanced techniques based on deep learning are proposed to detect various types of anomalies in time-dependent process parameters and provide timely feedback for quality control. Figure 1 shows the required components in a continuous casting process. The liquid level in the mold is controlled by both the stopper and the withdrawal unit.

2. Related Work

Research using neural networks to predict and improve the properties and structure of steel has been conducted. Sarda et al. [6] proposed a multi-step anomaly detection strategy based on robust distances for predictive maintenance in steel-making industries. The proposed method achieved good results in detecting anomalies in the steel-making process. Acernese et al. [7] reported the outcome of an industrial research project on data-based anomaly detection in a steel-making production process. The study assesses a fault detection strategy for rotating machines in the hot rolling mill line. Chen et al. [8] discussed the dynamic bulging model, which captured the behavior of the 2-D longitudinal domain through interpolation of multiple 1-D moving slices. The model calculates the fluctuations of liquid level in the mold caused by unsteady bulging of the solidifying shell, which affect the quality of the steel and the stable operation of the continuous steel casting process. Yoon et al. [9] analyzed the Mold Level Hunching (MLH) phenomenon during a thin slab casting process. The mold’s liquid level variation and the strand’s bulging were measured and analyzed using Fast Fourier Transform (FFT) spectrum analysis. Mold-level hunching and bulging had the same frequency, and the specific frequency was 0.5 Hz. Zhou et al. [10] proposed a liquid level in mold anomaly detection method called Multi-scale Convolution Neural Network-Long Short-Term Memory (Multi-scale CNN-LSTM) to detect the anomalies in liquid level in mold multi-dimension time series actual casting dataset. Khalaj et al. [11] used an Artificial Neural Network (ANN) to predict the passivation current density and potential of microalloyed steels based on the experimental data from the potentiodynamic polarization of High-Strength Low-Alloy (HSLA) steels. The developed model showed a good capacity for modeling complex corrosion behavior and could accurately track the experimental data in a wide range of steel chemical compositions, microstructures, temperature ranges, and corrosion cell characteristics.

TSAD aims to identify unusual patterns or behaviors in sequential data [12,13,14,15,16,17]. TSAD has applications in various domains, such as smart grids, network security, finance, health care, and social media. However, TSAD is also challenging, as anomalies can have different types, scales, and contexts, and time series data can be noisy, high-dimensional, and non-stationary.

Neural-network-based TSAD demonstrates that it can achieve strong results on various datasets. These methods learn long-term, nonlinear temporal relationships in the data, outperforming existing non-deep methods based on similarity search [18] and density-based clustering [19].

The most popular TSAD framework is the AutoEncoder (AE). Recurrent Neural Network (RNN) AE, Long Short-Term Memory (LSTM) AE, and Gated Recurrent Unit (GRU) AE are three types of AE for TSAD. RNN-AE uses a simple recurrent unit as the encoder and decoder. LSTM-AE uses the LSTM unit as the encoder and decoder. GRU-AE uses GRUs as the encoder and decoder. These three models have different advantages and disadvantages regarding computational efficiency, memory capacity, and gradient flow. AEs can be used for TSAD by training them on normal data and measuring the reconstruction error on new data. A high reconstruction error indicates an anomaly, while a low reconstruction error indicates a normal data point. AE can also be extended to variational AE (VAE) [11], which imposes a probabilistic distribution on the latent space and can generate realistic data samples. Robust AE (RAE) [20], a method inspired by Robust Principal Component Analysis (RPCA) [21], used an AE and an error matrix to separate the error sequence from the original sequence. The LSTM-AE-ADVanced (LSTM-AE-ADV) method proposed by Kieu T et al. suggested that using static methods to enrich datasets and feed them into AE could achieve better results [22]. Geiger et al. [23] proposed the TadGAN method, an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs). It uses LSTM as a base model for generators and critics to capture the temporal correlations of time series distributions.

In this paper, one TSAD method for anomaly detection of liquid level based on error generation and forecasting, called Forecasting and Error Generation AutoEncoder (FEG-AE), is proposed, which integrates the advantages of RAE and the sliding window technique to accelerate the training process. The experiment results show that FEG-AE can achieve superior performance and robustness in TSAD.

The key contributions of this paper are as follows:

Propose a new TSAD architecture for anomaly detection of liquid level in mold that separates the time series into a normal sequence and an anomaly sequence to address the anomaly feature in the liquid level in mold during the manual control phase. Such architecture is easy to train;
Introduce a new dynamic threshold method to score the TSAD based on the proposed method;
Evaluation is conducted on the production dataset and demonstrates that the proposed method outperforms four other baselines on the tested dataset.

3. Preliminary

3.1. RNN-Based AE

AE learns to reconstruct the input data from a compressed representation in the hidden layers. Using RNN-AE allows time series data to pass through an RNN encoder and then output a compressed representation of the data, representing the encoded time series data. Then the encoded data is passed into another RNN decoder layer to reconstruct the encoded data from the original data. In a normal dataset, a time series that contains anomalies is rare and has a different pattern from the clean sequence. Therefore, an RNN-AE is more likely to reconstruct a normal sequence correctly and cannot reconstruct an abnormal one.

3.2. RPCA

PCA is a dimensionality reduction method. Given a matrix

X

, PCA can find a low-rank matrix representing the original data. During the dimensionality reduction, PCA uses Singular Value Decomposition (SVD) to find a low-rank matrix that can be significantly interfered with by anomalies. Thus, a more robust method is needed. RPCA [21] is a modification of the widely used statistical procedure of PCA, which works well for grossly corrupted observations. RPCA decomposes a matrix

X

into a low-rank matrix

L

that represents clean data and

S

that represents error data.

\underset{L, S}{arg min} r a n k (L) + λ {‖S‖}_{0} s . t . X = L + S

(1)

Item

r a n k (L)

in Equation (1) is the rank of the matrix

L

.

{‖S‖}_{0}

is the

l_{0}

norm of matrix

S

. RPCA can be used on a dataset with few anomalies, but it cannot be applied to TSAD directly and cannot be updated if a dataset is being updated. Because any update in the dataset requires reinitializing the error data and refitting the model.

4. Methodology

The overall process of the proposed method, FEG-AE, is shown in Figure 2.

Liquid level in mold sequence data is preprocessed to differential sequence, then a clean series forecasting network is used to reconstruct the normal data, and an error extraction network is used to extract the abnormal data in the series. Anomalies are determined by comparing reconstructed normal and original data using a dynamic threshold method. Both networks are trained using a joint training method. Such a method allows the error extraction network to extract anomalies from the differential sequence in the early training stage, preventing the clean series forecasting network’s training process from being affected by abnormal data.

4.1. Original Issues

The continuous casting machine is not yet activated during the manual control phase. An error in stopper operation can cause anomalies in the liquid level in the mold that remain until the solidified metal starts being withdrawn. The anomaly sequence is relatively longer than other regular time series data. And the anomaly area shares similar features as normal areas because the abnormal liquid level can be viewed as normal at other times of the same casting process or under casting conditions. Thus, applying an unsupervised learning method like traditional RNN-AE results in a high false-positive rate and fails to capture errors in the casting dataset. Therefore, a more robust approach is needed to detect anomalies in the liquid level in mold.

The RAE framework proposed by Tung Kieu et al. [20] separated the anomaly sequence from a normal sequence. Although it does make some improvements in time series detection, it does not solve the overfitting issue. The method also requires a set of normal time series data, which requires artificial classification of the training data. While RAE is trained on normal data, the issue still occurs due to anomaly data that has a similar pattern to the anomaly-free data displayed in Figure 3. Unfortunately, the similarity in pattern allows overtrained AE to reconstruct anomaly data correctly.

These issues can be solved by applying the following methods:

Introduce an error-generating model to generate an error sequence instead of directly initializing an error time series as a time series filled with 0, then update it during training. The method can make the model more flexible compared to RAE and RDAE models. Sliding windows and mini-batched training can be used in such a process;
Use a forecasting-based sequence generation model to generate the time sequence, which can avoid the overfitting problem shown in Figure 3;
Combine the forecasting network with the error extraction network. Such a method allows the detector to consider evaluating sequences’ previous sequences while avoiding the overfitting problem.

4.2. Preprocess

To highlight the anomaly in the data, we use a liquid level differential sequence that represents level changes to replace the liquid level sequence. Such action can significantly shorten the abnormal interval and prevent unsupervised learning methods from overfitting. Figure 4 shows the liquid level simulation results. A stopper operation mistake causes the liquid level to go above its’ normal level for a long duration. Converting the liquid level sequence to a differential sequence shortens the duration of the anomaly sequence. However, the abnormal part still has the same trend as the normal part, and it is still easy to be mistaken for normal data when using AE.

4.3. Architecture

The whole architecture is shown in Figure 5. The architecture detects anomalous sequences by using information gathered from its current under-detection and previous sequences.

A sliding window

W

with size

s_{w}

contains data from

i

th data to

i + s_{w}

th data in a time series. A sliding window

W_{p}

contains first

s_{w_{p}}

data in

W

. Sliding window

W_{l}

with size

s_{w}

contains last

s_{w}

data in

W

.

s_{w_{p}}

, and

s_{w_{l}}

are defined as follows:

\{\begin{matrix} s_{w} = s_{w_{p}} + s_{w_{l}} \\ 0 < s_{w_{p}} < s_{w} \\ 0 < s_{w_{l}} < s_{w} \end{matrix}

(2)

Sliding window

W

contains a time subsequence

T_{w}

. The forecasting network takes the first

n

data in

T_{w}

as input and forecasts the remaining time series

T_{f w_{l}} \in R^{B \times s_{w_{l}} \times D}

, where

B

is the dataset size, and

D

is the feature size. The error extraction network takes the entire

T_{w}

sequence as input and then outputs the error series

T_{e w_{l}} \in R^{B \times s_{w_{l}} \times D}

and an anomaly-free time sequence

T_{f w_{l}}

.

T_{w}

,

T_{f w_{l}}

,

T_{e w_{l}}

, and a reconstructed version of the original time series

{\hat{T}}_{w_{l}}

are defined as follows:

T_{w} = \{x_{i}, x_{i + 1}, \dots, x_{{i + s}_{w} - 1}\}

(3)

T_{f w_{l}} = \{x_{i + s_{w_{p}}}, x_{i + s_{w_{p}} + 1}, \dots, x_{i + s_{w} - 1}\}

(4)

T_{e w_{l}} = \{e_{i + s_{w_{p}}}, e_{i + s_{w_{p}} + 1}, \dots, e_{{i + s}_{w} - 1}\}

(5)

{\hat{T}}_{w_{l}} = T_{e w_{l}} + T_{f w_{l}}

(6)

The forecasting network and error extraction network can be implemented by using RNN. The two networks can be defined as follows:

F_{θ_{F}} (T_{w_{p}}) = T_{f w_{l}}

(7)

E_{θ_{E}} (T_{w}) = T_{e w_{l}}

(8)

The losses

L

,

L_{e}

and

L_{f}

of the proposed framework are designed as follows.

L

is defined by DIstortion Loss including shApe and TimE (DILATE) [24] and MSE loss. DILATE is a loss function design for time series data. It uses Soft Dynamic Time Warping (SoftDTW) to define shape loss and Time Distortion Index (TDI) to define temporal losses. MSE is also used to accelerate the fitting process.

L = D I L A T E ({\hat{T}}_{w_{l}}, T_{w_{l}}) + \frac{{‖(F_{θ_{F}} (T_{w_{p}}) + E_{θ_{E}} (T_{w})) - T_{w_{l}}‖}_{2}^{2} + ‖E_{θ_{E}} (T_{w})‖}{s_{w_{l}}}

(9)

L_{e} = \frac{{‖T_{e w_{l}} - (T_{w_{l}} - T_{f w_{l}})‖}_{2}^{2}}{s_{w_{l}}} = \frac{{‖T_{w e} - (T_{w} - F_{θ_{F}} (T_{w_{p}}))‖}_{2}^{2}}{s_{w_{l}}}

(10)

L_{f} = \frac{{‖T_{f w_{l}} - (T_{w_{l}} - T_{e w_{l}})‖}_{2}^{2}}{s_{w_{l}}} = \frac{{‖T_{f w_{l}} - (T_{w_{l}} - E_{θ_{E}} (T_{w}))‖}_{2}^{2}}{s_{w_{l}}}

(11)

Equation

(9)

defines the total loss for the entire framework. This equation makes sure that generated

T_{f w_{l}}

and

T_{e w_{l}}

are decomposed from

T_{w}

. Minimizing

‖E_{θ_{E}} (T_{w})‖

minimize the error sequence

T_{e w_{l}}

to make sure that the error sequence stays sparse. An appropriate value

λ_{2}

makes sure that item

{‖{\hat{T}}_{w_{l}} - T_{w_{l}}‖}_{2}^{2}

decrease if there is no error in a sliding window sequence. Equation (11) defines the forecasting loss, and Equation (10) defines the error extraction loss.

The optimization target for the framework is defined in Equation (12). Parameter

θ_{E}

and

θ_{F}

need to be updated during the training process to update the forecasting and error extraction networks.

\underset{θ_{E}, θ_{F}}{argmin} L + L_{f} + L_{e} s . t . T_{w_{l}} = T_{f w_{l}} + T_{e w_{l}}

(12)

The proposed method uses three static features: target level, steel width, and thickness. A vector combines these three static features with FEG-AE output and is fed into a fully connected layer.

One issue before detecting the entire time series is detecting the first sequence captured by the sliding window. A backward-directional forecasting network and an error extraction network are trained to resolve the issue. Therefore, another sliding window setup is not needed. By using the current sliding window methods, it is able to use the first

s_{w_{p}}

time series data to forecast the following

s_{w_{t}}

data and use the last

s_{w_{t}}

data to generate the previous

s_{w_{p}}

data.

4.4. Train the Model

The training algorithm of FEG-AE is shown in Algorithm 1.

Algorithm 1 FEG-AE.

Input: Time series captured by sliding window

T_{w}

Output:

{\hat{T}}_{w_{l}}

repeat

For every window

W

T_{e w_{l}} = E (T_{w})

T_{f w_{l}} = F (T_{w_{p}})

{\hat{T}}_{w_{l}} = T_{e w_{l}} + T_{f w_{l}}

L = D I L A T E ({\hat{T}}_{w_{l}}, T_{w_{l}}) + \frac{{‖(F_{θ_{F}} (T_{w_{p}}) + E_{θ_{E}} (T_{w})) - T_{w_{l}}‖}_{2}^{2} + ‖E_{θ_{E}} (T_{w})‖}{s_{w_{l}}}

Update both

θ_{e}

and

θ_{F}

by minimizing

L

.

//Update the forecasting network.

T_{e w_{l}} = E (T_{w})

T_{f w_{l}} = F (T_{w_{p}})

L_{f} = \frac{{‖T_{f w_{l}} - (T_{w_{l}} - T_{e w_{l}})‖}_{2}^{2}}{s_{w_{l}}} = \frac{{‖T_{f w_{l}} - (T_{w_{l}} - E_{θ_{E}} (T_{w}))‖}_{2}^{2}}{s_{w_{l}}}

Update

θ_{F}

by minimizing

L_{f}

.

//Update the error extraction network.

T_{e w_{l}} = E (T_{w})

T_{f w_{l}} = F (T_{w_{p}})

L_{e} = \frac{{‖T_{e w_{l}} - (T_{w_{l}} - T_{f w_{l}})‖}_{2}^{2}}{s_{w_{l}}} = \frac{{‖T_{w e} - (T_{w} - F_{θ_{F}} (T_{w_{p}}))‖}_{2}^{2}}{s_{w_{l}}}

Update

θ_{e}

by minimizing

L_{e}

.

Until

(L + L_{f} + L_{e}) < ϵ

The algorithm first updates the parameters of the forecasting network and the error extraction network together. Then each model is updated separately.

4.5. Dynamic Threshold

Using a fixed threshold to identify anomalies in the forecasting outputs usually results in many false positives in the anomaly-free sequence. The forecasting network can capture the general trend of the time series data but still has a high deviation value in the anomaly-free part if

{‖T_{f w_{l}} - T_{f w_{l}}‖}_{2}

is used to calculate deviation. The issue is that the original data is increasing too fast, and the forecasted data is not sharp enough to catch up. Due to the high slope in those areas, a slight differentiation results in a high deviation value, as shown in Figure 6b. Hence, a new way to calculate the threshold and identify the error is introduced to achieve higher precision in the anomaly detection of liquid level in mold.

The standard deviation and mean are used to calculate the threshold for every sliding window, marked as

σ (w)

and

μ (w)

. Dynamic thresholds

t h (T_{w_{l}})

are defined as follows:

t h (T_{w_{l}}) = β_{1} σ (T_{w_{l}}) + β_{2} μ (T_{w_{l}}) + ϵ

(13)

The first item

β_{1} σ (w)

is the threshold that controls the deviation. The second item controls the offset threshold. Figure 6b shows that the threshold changes according to the window. Compared to the fixed threshold RAE uses, a dynamic threshold guarantees a lower false-positive rate without using post-processing.

The Euclidean distance is used to calculate the differences between the forecasted sequence and the actual sequence in a sliding window.

d_{w} = {‖T_{f w_{l}} - T_{w_{l}}‖}_{2}

An anomaly is found if

d_{w}

is higher than the compared window’s threshold, meaning the distance deviation between the forecasted and original values is larger than the threshold. The method can identify the anomaly more precisely. A visual comparison between the fixed and dynamic thresholds is shown in Figure 6.

5. Experiment Results

5.1. Experiment Setup

Dataset. The experiment uses a dataset collected from the casting process to evaluate the proposed framework and compare experiment results with other methods. Part of the data in this dataset is shown in Figure 7. The dataset contains information about liquid levels in mold (measured in cm) captured by the sensor.

Architecture. In the experiment, the forecasting network is a GRU-AE model. The encoder part is a 20-dimensional latent space GRU with two hidden layers, followed by a linear function as a Fully Connected (FC) layer. The decoder part is a one-to-many GRU, followed by an FC layer that combines static and temporal features. No dropout is applied. The error extraction network uses a Seq2Seq model with a 20-dimensional latent space and two hidden layers. The sliding window size

s_{w}

is set to 10 and

s_{w_{p}}

is set to

4

.

Baselines. The experiment compares the proposed framework with several popular and state-of-the-art methods as the baseline.

GRU-AE [25]: GRU trains faster and performs better than LSTM on a smaller dataset. [26]. The decoder’s and encoder’s GRU layers both have two hidden layers. Encoder GRU outputs 10-D-encoded data and then passes it through an FC layer. The decoding process takes the encoded 10-D data and passes it through the decoder GRU and an FC layer. Drop-out is applied for both encoder and decoder GRU;
RAE [20]: Use LSTM-AE as an anomaly detector. The hyperparameter λ is set to $0.1,$ which has the best experiment result in their practice. A sliding window of size 10 is used to evaluate the anomaly score;
TadGAN [23]: A sliding window of size 10 is used to calculate the area difference for reconstruction error;
LSTM-AE-Advanced (LSTM-AE-ADV) [22]: use a sliding window size of 4 to perform the enrichment time series process.

All methods above use a fixed threshold group {0.3, 0.2, 0.1, 0.09, 0.07, 0.05, 0.03, and 0.01}. For every baseline method, a score is calculated for each threshold. The highest score is recorded in the process. A sliding window of size 10 is used to calculate F1 scores. Except for RAE, all other methods are implemented with sliding windows of size 10. The evaluation is also using a sliding window of the same size.

All baseline methods are implemented in Python 3.7 with the Pytorch 1.7.1 library [27] and CUDA 11.0. The Adam optimizer [28] is used to update the frameworks.

5.2. Score Metric

The conventional metrics of precision, recall, and F1-Score are used to assess the performance of various methods. The preferred outcome for end-users is to obtain prompt and precise alarms with few false positives (FP), which may consume time and resources. The following window-based rules are implemented to discourage excessive FPs and encourage prompt and precise alarms: A true positive (TP) is recorded if a predicted window overlaps with a labeled anomalous window. A false negative (FN) is recorded if a forecasted window does not overlap with a labeled anomalous window. An FP is recorded if a labeled anomalous window does not overlap with a forecasted window. This method is also adopted in Hundman’s method [29] and Alexander Geiger’s method [23].

5.3. Experiment Results

5.3.1. Comparison Experiment

The experiment results are displayed in Table 1. The results show that the F1 score is significantly higher when the FEG-AE method is used on actual production datasets. The cause is the proposed model’s ability to identify error sequences with lower false negatives and higher precision. In other AE-based methods, the lower score is mainly caused by overfitting. The experiment also shows that the proposed method has an overall slightly lower recall rate compared to the other methods. The dynamic threshold method can greatly improve the precision, but in some extreme cases, it will produce an extremely high threshold in a few abnormal areas, on which FEG-AE could produce more FNs.

5.3.2. Ablation Experiment

The ablation experiment separates three key components of the FEG-AE methods and then uses them separately. The result is shown in Table 2. Using a forecasting network to reconstruct the liquid level in mold sequence can greatly improve precision. But because anomalies last a long time, anomaly data occupies a relatively large portion of all data. During unsupervised training, the forecasting network does not know which data is normal. Therefore, if an error repeatedly appears in the training set, it will still mistake it for normal data. The error extraction network acts as a regular LSTM-AE. It achieves a super low precision of 0.111 by mistaking almost every abnormal data point for normal data. Combining the forecasting and error extraction networks and training them using the joint training algorithm in Algorithm 1 significantly improves the precision and F1 score. An error extraction network can extract anomaly data from the forecasting network’s training set in the early training stage, therefore keeping the forecasting network free from anomaly data pollution. The dynamic threshold further improves precision and causes high thresholds in some areas. Therefore, the dynamic threshold increases the FN count slightly and thus results in a slightly lower recall.

Figure 8 demonstrates the training speed when using single-batch and mini-batch methods. The results show that applying sliding windows and mini-batch training can fully utilize the GPU’s power and achieve better training speeds.

5.3.3. Parameter Experiment

The experiment results in Table 3 show the F1 score when different dynamic thresholding settings are used. In the experiment,

β_{1} = 0.3

and

β_{2} = 0.4

can achieve the highest F1 score on the actual production dataset.

Figure 9 shows how

S_{w_{p}}

affects the forecasting result and the error extraction output results. Figure 10 shows the F1 result using different

S_{w_{p}}

.

The experiment shows that a smaller value of

S_{w_{p}}

results in underfitting. Because the smaller

S_{w_{p}}

is, less data is needed to reconstruct the normal sequence. But meanwhile, the error extraction network uses the entire

S_{w}

data to generate

S_{w_{l}}

error sequences. The error extraction network will be trained faster than the forecasting network during training. Therefore, it is more likely to mistake the entire series as an anomaly to reduce the loss in the early stages. By contrast, a larger

S_{w_{p}}

value results in error extraction network underfitting. In the experiment, setting

S_{w_{l}} = 4

and

5

achieves the best F1 result on the tested dataset.

6. Conclusions

The traditional AE methods failed to identify anomalies in the liquid level in mold time series data. The proposed method, FEG-AE, is inspired by RAE and decomposes a time series of liquid level in a mold into a normal time sequence and an anomaly sequence. A forecasting network to reconstruct the normal sequence using the previous sequence’s features. An error extraction network to extract error from the free forecasting network from anomalies’ pollution improves precision compared to the traditional AE. A dynamic threshold method is proposed to identify the anomaly with higher precision. Compared to a fixed threshold, the proposed method significantly increases precision but slightly reduces recall. FEG-AE achieves the highest precision of 0.895 and an F1 score of 0.872 compared to baseline methods. The dynamic threshold method can significantly improve precision and F1 but has a minor recall reduction due to higher thresholds in some windows, resulting in more FNs.

Future work will investigate a more robust way to balance the forecasting network and error extraction network to reduce the effects of the hyperparameter

s_{W_{p}}

. Also, a more robust dynamic threshold method is under investigation to improve recall.

Author Contributions

Conceptualization, X.W.; data curation, X.W., H.K., Q.G. and J.M.; formal analysis, X.W. and S.Y.; funding acquisition, Q.G.; investigation, W.J.; methodology, H.K.; project administration, S.Y; resources, Q.G. and J.M.; software, H.K. and W.J.; validation, X.W., H.K. and S.Y.; visualization, X.W.; writing-original draft, H.K.; writing-review and editing, X.W., S.Y., Q.G. and J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Key Research and Development Project of Shaanxi Province under Grant 2021ZDLGY10-01 and the Self-Supporting Science and Technology Research and Development Project of SINOMACH-HE under Grant 2021ZLKY-02.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Acknowledgments

The authors gratefully acknowledge the financial support offered by the Key Research and Development Project of Shaanxi Province under Grant 2021ZDLGY10-01 and the Self-Supporting Science and Technology Research and Development Project of SINOMACH-HE under Grant 2021ZLKY-02.

Conflicts of Interest

The authors declare no conflict of interest.

References

Vynnycky, M. Continuous casting. Metals 2019, 9, 643. [Google Scholar] [CrossRef] [Green Version]
Cemernek, D.; Cemernek, S.; Gursch, H.; Pandeshwar, A.; Leitner, T.; Berger, M.; Klösch, G.; Kern, R. Machine learning in continuous casting of steel: A state-of-the-art survey. J. Intell. Manuf. 2021, 33, 1561–1579. [Google Scholar] [CrossRef]
Khalaj, G.; Pouraliakbar, H.; Mamaghani, K.R.; Khalaj, M.-J. Modeling the correlation between heat treatment, chemical composition and bainite fraction of pipeline steels by means of artificial neural networks. Neural Netw. World 2013, 4, 351–367. [Google Scholar] [CrossRef] [Green Version]
Lei, H.; Liu, J.; Tang, G.; Zhang, H.; Jiang, Z.; Lv, P. Deep Insight into Mold Level Fluctuation During Casting Different Steel Grades. JOM 2023, 75, 914–919. [Google Scholar] [CrossRef]
Yang, J.; Chen, D.; Long, M.; Duan, H. Transient flow and mold flux behavior during ultra-high speed continuous casting of billet. J. Mater. Res. Technol. 2020, 9, 3984–3993. [Google Scholar] [CrossRef]
Sarda, K.; Acernese, A.; Nolè, V.; Manfredi, L.; Greco, L.; Glielmo, L.; Del Vecchio, C. A Multi-Step Anomaly Detection Strategy Based on Robust Distances for the Steel Industry. IEEE Access 2021, 9, 53827–53837. [Google Scholar] [CrossRef]
Acernese, A.; Sarda, K.; Nole, V.; Manfredi, L.; Greco, L.; Glielmo, L.; Del Vecchio, C. Robust Statistics-based Anomaly Detection in a Steel Industry. In Proceedings of the 2021 29th Mediterranean Conference on Control and Automation (MED), Puglia, Italy, 22–25 June 2021; pp. 1058–1063. [Google Scholar] [CrossRef]
Chen, Z.L.; Olia, H.; Petrus, B.; Rembold, M.; Bentsman, J.; Thomas, B.G. Dynamic Modeling of Unsteady Bulging in Continuous Casting of Steel. In Materials Processing Fundamentals 2019; Minerals, Metals & Materials Series; Springer: Cham, Switzerland, 2019; pp. 23–35. [Google Scholar] [CrossRef]
Yoon, U.-S.; Bang, I.-W.; JH, R.; Kim, S.-Y.; Lee, J.-D.; Oh, K.H. Analysis of mold level hunching by unsteady bulging during thin slab casting. ISIJ Int. 2002, 42, 1103–1111. [Google Scholar] [CrossRef] [Green Version]
Zhou, Y.J.; Xu, K.; He, F.; Zhang, Z.Y. Application of Time Series Data Anomaly Detection Based on Deep Learning in Continuous Casting Process. ISIJ Int. 2022, 62, 689–698. [Google Scholar] [CrossRef]
Khalaj, G.; Pouraliakbar, H.; Arab, N.; Nazerfakhari, M. Correlation of passivation current density and potential by using chemical composition and corrosion cell characteristics in HSLA steels. Measurement 2015, 75, 5–11. [Google Scholar] [CrossRef]
Crépey, S.; Lehdili, N.; Madhar, N.; Thomas, M. Anomaly Detection in Financial Time Series by Principal Component Analysis and Neural Networks. Algorithms 2022, 15, 385. [Google Scholar] [CrossRef]
Wu, R.; Keogh, E. Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress. IEEE Trans. Knowl. Data Eng. 2021, 35, 2421–2429. [Google Scholar] [CrossRef]
Zhao, Z.; Zhang, Y.; Zhu, X.; Zuo, J. Research on time series anomaly detection algorithm and application. In Proceedings of the 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chengdu, China, 20–22 December 2019. [Google Scholar]
Zhang, C.; Zhou, T.; Wen, Q.; Sun, L. TFAD: A Decomposition Time Series Anomaly Detection Architecture with Time-Frequency Analysis. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022. [Google Scholar]
Kim, S.; Choi, K.; Choi, H.S.; Lee, B.; Yoon, S. Towards a Rigorous Evaluation of Time-Series Anomaly Detection. In Proceedings of the Thirty-Sixth Aaai Conference on Artificial Intelligence/Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence/Twelveth Symposium on Educational Advances in Artificial Intelligence, Virtual Event, 22 February–1 March 2022; pp. 7194–7201. [Google Scholar]
Zhang, Y.X.; Chen, Y.Q.; Wang, J.D.; Pan, Z.W. Unsupervised Deep Anomaly Detection for Multi-Sensor Time-Series Signals. IEEE Trans. Knowl. Data Eng. 2023, 35, 2118–2132. [Google Scholar] [CrossRef]
Yeh, C.-C.M.; Zhu, Y.; Ulanova, L.; Begum, N.; Ding, Y.; Dau, H.A.; Silva, D.F.; Mueen, A.; Keogh, E. Matrix profile I: All pairs similarity joins for time series: A unifying view that includes motifs, discords and shapelets. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016. [Google Scholar]
Breunig, M.M.; Kriegel, H.-P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 15–18 May 2000. [Google Scholar]
Kieu, T.; Yang, B.; Guo, C.; Jensen, C.S.; Zhao, Y.; Huang, F.; Zheng, K. Robust and explainable autoencoders for unsupervised time series outlier detection. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 9–12 May 2022. [Google Scholar]
Candès, E.J.; Li, X.; Ma, Y.; Wright, J. Robust principal component analysis? J. ACM (JACM) 2011, 58, 1–37. [Google Scholar] [CrossRef]
Kieu, T.; Yang, B.; Jensen, C.S. Outlier detection for multidimensional time series using deep neural networks. In Proceedings of the 2018 19th IEEE International Conference on Mobile Data Management (MDM), Aalborg, Denmark, 25–28 June 2018. [Google Scholar]
Geiger, A.; Liu, D.; Alnegheimish, S.; Cuesta-Infante, A.; Veeramachaneni, K. Tadgan: Time series anomaly detection using generative adversarial networks. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020. [Google Scholar]
Le Guen, V.; Thome, N. Shape and time distortion loss for training deep time series forecasting models. Adv. Neural Inf. Process. Syst. 2019, 32, 1–13. [Google Scholar]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
Yang, S.; Yu, X.; Zhou, Y. Lstm and gru neural network performance comparison study: Taking yelp review dataset as an example. In Proceedings of the 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI), Shanghai, China, 12–14 June 2020. [Google Scholar]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 1–12. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Hundman, K.; Constantinou, V.; Laporte, C.; Colwell, I.; Soderstrom, T. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018. [Google Scholar]

Figure 1. Continuous casting process. 1: Ladle. 2: Stopper. 3: Tundish. 4: Shroud. 5: Mold. 6: Roll support. 7: Turning zone. 8: Shroud. 9: Bath level. 10: Meniscus. 11: Withdrawal unit. 12: Slab. A: Liquid metal. B: Solidified metal. C: Slag. D: Water-cooled copper plates. E: Refractory material.

Figure 2. The proposed method’s process.

Figure 3. The overfitting issue is caused by AE−based anomaly detection. Red rectangles highlight the abnormal areas in the liquid level in mold data. Traditional AE is unable to reconstruct the anomaly part in (a) but is able to reconstruct the anomaly sequence in (b), which is identified as a false negative.

Figure 4. Liquid level in mold sequence (a) vs. liquid level differential sequence (b). In (b), each value represents the liquid level change since the previous sample. The anomaly sequence duration is significantly shorter in (b) and more suitable for unsupervised learning.

Figure 5. An example of sliding window techniques used in the framework with

s_{w} = 8

and

s_{w_{p}} = 4

. The first four data points in the sliding window are being used to forecast the next four. The error extraction network uses all the data in the window sequence to generate the error sequence of the last four data points.

Figure 5. An example of sliding window techniques used in the framework with

s_{w} = 8

and

s_{w_{p}} = 4

. The first four data points in the sliding window are being used to forecast the next four. The error extraction network uses all the data in the window sequence to generate the error sequence of the last four data points.

Figure 6. Dynamic threshold vs. fixed threshold. The red rectangle marks a normal area. The green rectangle marks an abnormal area. The dynamic threshold can identify anomalies more precisely than the fixed threshold. (a) shows the FEG−AE’s forecasting network output, and (b) shows the Euclidean distance between the forecasted result and the actual data. (c) shows the L1 data error between the original and forecasting data. A fixed threshold is more likely to identify a false negative result in the red rectangle and a false negative result simultaneously in the green rectangle area.

Figure 7. An example of actual data of the liquid level in the mold during the manual control phase.

Figure 8. Convergence speed comparison between sliding windows with the mini-batch and single-batch methods.

Figure 9. Output results using different

S_{w_{p}}

. (a,b) uses

S_{w_{p}} = 2

. (c,d) uses

S_{w_{p}} = 4

. (e,f) uses

S_{w_{p}} = 6

.

Figure 9. Output results using different

S_{w_{p}}

. (a,b) uses

S_{w_{p}} = 2

. (c,d) uses

S_{w_{p}} = 4

. (e,f) uses

S_{w_{p}} = 6

.

Figure 10. F1 results using different

S_{w_{p}}

.

S_{w_{l}} = 4

and

5

achieves the best F1 result.

Figure 10. F1 results using different

S_{w_{p}}

.

S_{w_{l}} = 4

and

5

achieves the best F1 result.

Table 1. Experiment results by using different baseline methods.

Methods	Precision	Recall	F1
GRU-AE	0.111	1.000	0.230
RAE	0.500	1.000	0.667
TadGAN	0.75	0.727	0.738
LSTM-AE-ADV	0.205	1.000	0.340
FEG-AE	0.895	0.850	0.872

Table 2. Ablation experiment result.

Forecasting Network	Error Extraction Network	Dynamic Threshold	Precision	Recall	F1
√			0.5	0.85	0.615
	√		0.111	1	0.230
√	√		0.75	1	0.857
√	√	√	0.895	0.85	0.872

Table 3. Experiment results by using different sets of

β_{1}

and

β_{2}

. The highest F1 score is highlighted in bold in the table.

Table 3. Experiment results by using different sets of

β_{1}

and

β_{2}

. The highest F1 score is highlighted in bold in the table.

$β_{1} / β_{2}$	0.4	0.3	0.2
0.5	0.500	0.667	0.4
0.4	0.333	0.872	0.550
0.3	0.286	0.250	0.220
0.2	0.250	0.250	0.133
0.1	0.167	0.142	0.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, X.; Kang, H.; Yuan, S.; Jiang, W.; Gao, Q.; Mi, J. Anomaly Detection of Liquid Level in Mold during Continuous Casting by Using Forecasting and Error Generation. Appl. Sci. 2023, 13, 7457. https://doi.org/10.3390/app13137457

AMA Style

Wu X, Kang H, Yuan S, Jiang W, Gao Q, Mi J. Anomaly Detection of Liquid Level in Mold during Continuous Casting by Using Forecasting and Error Generation. Applied Sciences. 2023; 13(13):7457. https://doi.org/10.3390/app13137457

Chicago/Turabian Style

Wu, Xiaojun, Hongjia Kang, Sheng Yuan, Wenze Jiang, Qi Gao, and Jinzhou Mi. 2023. "Anomaly Detection of Liquid Level in Mold during Continuous Casting by Using Forecasting and Error Generation" Applied Sciences 13, no. 13: 7457. https://doi.org/10.3390/app13137457

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Anomaly Detection of Liquid Level in Mold during Continuous Casting by Using Forecasting and Error Generation

Abstract

1. Introduction

2. Related Work

3. Preliminary

3.1. RNN-Based AE

3.2. RPCA

4. Methodology

4.1. Original Issues

4.2. Preprocess

4.3. Architecture

4.4. Train the Model

4.5. Dynamic Threshold

5. Experiment Results

5.1. Experiment Setup

5.2. Score Metric

5.3. Experiment Results

5.3.1. Comparison Experiment

5.3.2. Ablation Experiment

5.3.3. Parameter Experiment

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI