Next Article in Journal
Hybrid Particle Swarm Optimization for High-Dimensional Latin Hypercube Design Problem
Next Article in Special Issue
Blood Pressure Monitoring Based on Flexible Encapsulated Sensors
Previous Article in Journal
Predicting Rutting Development of Pavement with Flexible Overlay Using Artificial Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Information Extraction from Industrial Sensor Data Using Time Series Meta-Features

1
Institute for Automation/Computer Science, University of Wuppertal, Rainer-Gruenter-Str. 21, 42119 Wuppertal, Germany
2
WSW Wuppertaler Stadtwerke GmbH, Bromberger Str. 39-41, 42281 Wuppertal, Germany
3
TRIMET Aluminium SE, Aluminiumallee 1, 45356 Essen, Germany
4
Alcoa Nederland Holding B.V., Weena 798, 3014 DA Rotterdam, The Netherlands
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(12), 7065; https://doi.org/10.3390/app13127065
Submission received: 4 May 2023 / Revised: 26 May 2023 / Accepted: 27 May 2023 / Published: 12 June 2023
(This article belongs to the Special Issue New Challenges in Machine Learning for Industrial Applications)

Abstract

:
In the smart manufacturing sector, analyzing time series data is essential for monitoring plants and machinery to prevent costly failures or shutdowns. In order to gain new insights and make better control decisions, new methods are needed for extracting information and interpreting sensor data from hundreds of systems. In this paper, we present an approach for visualizing and interpreting sensor data from TRIMET Aluminium SE Essen (TAE) using time series meta-features and principal component analysis (PCA). We describe our general approach of generating multiple two-dimensional feature spaces to identify salient and implausible sensor data. Using a set of 20 time series meta-features, we applied our approach to sensor data from TAE which were generated by thermocouples. Each step of the approach was integrated into a dashboard to ensure a user-friendly and approachable interaction in finding salient and implausible sensor data.

1. Introduction

Due to advances in digitalization and the growing market in the smart manufacturing sector, most plants and machinery are now equipped with new sensor technology. The possible use cases of sensors can vary greatly: in the aluminum-production industry, sensors are used to monitor the state of reduction cells [1] or to measure individual anode currents [2] while in the food-refrigeration industry sensor data are analyzed to predict over-temperature disturbances [3]. To gain new insights about a system, the sensor data have to be preprocessed, analyzed and visualized in a compact way. However, manually analyzing the data from each sensor is a time consuming process. To tackle this problem new methods have to be developed that are capable of quickly extracting information and interpreting the sensor data from hundreds of systems.
Calculating certain features is a promising solution for quantifying the characteristics of a time series [4] which reduces the amount of data to a specific number of meta-features and can be understood as a dimensionality reduction technique [5]. According to [4], a representation of time series and similarity metric can be used in different fields, i.e., query by content, anomaly detection, motif discovery, clustering and classification. Furthermore, calculating meta-features is an essential part in the field of meta-learning to improve learning systems by incorporating already collected knowledge [6].
Meta-features should describe the “global picture” of a time series [5]. Specific time series meta-features, such as the trend (or trend-cycle) and seasonal strength, can be computed just like the mean or median. Examples of these meta-features can be seen in Figure 1, which shows different time series from the Makridakis Competitions, i.e., M1- and M3-Competition [7]. Strength values close to 1 indicate a strong trend or seasonal component in the corresponding time series. A strength value close to 0 indicates a weak trend or seasonal component in the corresponding time series.
Beside the trend and seasonal strength, more time series meta-features, e.g., the spectral entropy, seasonal period or autocorrelation coefficients, can be computed. Time series meta-features should be selected according to the type of the time series and the problem in question [9,10]. Expert knowledge about a specific domain, in our case the aluminum electrolysis, can help to find suitable meta-features that describe the different characteristics of time series.
The core business of TRIMET Aluminium SE is the development and production of aluminum products. Figure 2 shows one of the three pot rooms containing 120 reductions in an end-to-end configuration at the Essen site in Germany. In total, TRIMET Aluminium SE Essen (TAE) operates 360 reduction cells that produce liquid aluminum. The industrial production of aluminum is based on the Hall–Héroult process, in which alumina (Al2O3) is dissolved in liquid cryolite (Na3AlF6) with an excess of aluminum fluoride (AlF3) [11,12]. The Hall–Héroult process is performed with an almost constant energy supply to ensure stable aluminum production [12,13].
Due to the energy transition, the energy supply in Germany is increasingly reliant on renewable energy sources, which means that the energy supply for the industrial aluminum production is no longer constant but variable [12]. In order to ensure a stable operation of the reduction cells with a variable energy supply, TAE has started to equip the reduction cells with magnetic field compensation and shell heat exchangers [14]. The magnetic field compensation is used to prevent bulging of the liquid aluminum at high current levels, while the shell heat exchangers are designed to control the heat loss from the side walls of the cells, maintaining a protective side ledge. Additional thermocouples are mounted on some cells to monitor the heat balance. Due to the corrosive environment, loose or faulty thermocouples are unavoidable and can result in incorrect heat balance calculations. Abnormal thermocouples are identified by manual evaluation of the recorded temperature signals (time series). However since hundreds of temperature signals have to be checked, this procedure is time consuming.
In order to reduce the time required for the inspection of salient temperature signals, we present an approach to visualize and analyze a sensor data set from TAE using time series meta-features. We used the work of Hyndman et al. [15] as a basis and extended their approach in three ways: First, we transferred the approach to the aluminum electrolysis process to analyze TAE sensor data generated by thermocouples. Second, we generated two-dimensional feature spaces in multiple iterations to identify salient and implausible sensor data. Third, we picked up the idea of a dashboard, which enables a user to visualize and analyze the characteristics of the sensor data in a simple and approachable way.
Section 2 summarizes the use of time series meta-features in different settings, e.g., anomaly detection or meta-learning. The approach is presented in Section 3 and then applied to real sensor data from TAE in Section 4. Section 5 describes the main features of the dashboard. Section 6 gives a conclusion about the future work using time series meta-features.

2. Literature Overview

We reviewed some of the time series meta-features literature and divided it into six domains, i.e., regression, classification, clustering, anomaly detection, visualization, and meta-learning, which are listed in Table 1. Grabowski et al. [16] calculated meta-features, i.e., the mean for numeric data and the sum for binary data, of several process variables in a rolling window to predict the bath temperature of reduction cells using a random forest. Kremser et al. [17] used several time series meta-features from [18,19] in a rolling window to predict anode effects in reduction cells using logistic regression, linear support vector machine, random forest and eXtreme Gradient Boosting.
Nanopoulos et al. [20] used a multilayer perceptron (MLP) and eight time series meta-features to classify patterns in control charts. The authors compared the performance of the MLP with another MLP that is solely based on the values of the time series. In further experiments, the performance of both MLPs was analyzed with noise corrupted time series and time series at varying length. Horvath et al. [21] used hierarchical clustering and seven parameters (meta-features) to divide reduction cells with similar behavior into control groups.
Hyndman et al. [15] computed 18 meta-features from time series representing the performance of servers at the internet company Yahoo. With the help of a principal component analysis (PCA), α -hulls and highest density regions, the authors used the calculated meta-features to identify entire time series that differed from other time series in the data set. In [22], the extreme value theory was used to identify anomalous streaming time series data. The presented framework is based on the calculation of 14 time series meta-features, which are used in an offline and an online phase. The offline phase is used to train a model on the typical behavior of the system. Afterward, the model was deployed in the online phase to identify anomalous time series using a rolling window. Furthermore, the authors present an algorithm that updates the model if a significant change in the distribution of the typical behavior of the system is detected. In a practical example, the authors used the framework to identify anomalous sensor data.
Kang et al. [23] computed six time series meta-features to visualize and analyze the M3-Competition data set [24]. The authors performed a PCA using the calculated meta-features, then visualized the first and second principal component (PC) to obtain an overview of the time series characteristics of the M3-Competition data set. Furthermore, the two-dimensional feature space and a genetic algorithm were used to generate new time series that extend the original data set. The authors then compared the feature space with the performance of selected forecasting methods. Talagala et al. present in [9] the FFORMS framework which includes the training of a classifier with time series meta-features to predict an algorithm that might be appropriate in forecasting a time series.
The authors in [25] applied the meta-learning approach to automate the process of selecting forecasting models for individual time series using meta-features. In their first case study decision trees with 10 meta-features calculated from stationary time series were trained to choose between two forecasting models. The second case study considers NOEMON [26] and five meta-features, which were calculated for each time series in the yearly M3-Competition data set [24], to rank forecasting models, i.e., random walk, Holt’s linear exponential smoothing, and auto-regressive model.
Table 1. Literature addressing the calculation of time series meta-features in six different domains.
Table 1. Literature addressing the calculation of time series meta-features in six different domains.
DomainLiterature
Regression[16]
Classification[17,20]
Clustering[21]
Anomaly detection[15,22]
Visualization[23]
Meta-learning[9,25]
The presented literature shows that time series meta-feature are widely used in different domains. Our approach presented in this paper contributes to the field of visualization and anomaly detection. We classify our paper in the category of visualization, since we show two-dimensional representations of meta-feature instances in Section 4 and describe a dashboard that simplifies working with time series meta-features in Section 5. It differentiates itself from the presented literature by applying the combination of time series meta-features and PCA to the aluminum electrolysis process to identify data outliers indicating faulty thermocouples.

3. Approach

The iterative approach visualized in Figure 3 is not specifically tailored to the aluminum electrolysis process and can be applied to different fields of the smart manufacturing sector. With our approach, we wanted to assess the suitability of PCA and time series meta-features in the aluminum industry. This gives us a baseline that we can use for a comparison with future methods. At first, several time series meta-features are calculated for each time series in a data set and are saved as a meta-feature data set. Although it is presented in the context of meta-learning, the guideline in [25] is also helpful to find appropriate meta-features. The guideline points out that a manageable amount of meta-features already used in the literature should be considered in the analysis. Additionally the type of each time series in the data set should be incorporated in the selection of meta-features.
In the next step, a PCA is conducted using the z-transformed meta-features. PCA has already been successfully applied in [15,23] to identify data patterns by projecting time series meta-features into a lower feature space. Therefore, we decided to apply PCA to the aluminum electrolysis process. Afterward, the first two principal components are used to generate a two-dimensional feature space that visualizes the characteristics of the time series data set. We chose a two-dimensional representation because it is easier to analyze for salient time series than three- or four-dimensional feature spaces. Higher dimensions lead to increased complexity, for which suitable representations should be reviewed. Each point in the two-dimensional feature space represents a time series. The feature space helps to find time series with abnormal characteristics, e.g., implausible ranges of values, compromising the feature space. Hence, the corresponding instances (rows) should be removed from the meta-feature data set.
It is not necessary to identify all abnormal time series and to remove the corresponding meta-feature instances from the data set in the first iteration because further iterations are conducted (see Figure 3). The iteration process can be stopped if one is convinced that all compromising meta-feature instances have been removed from the data set. This constitutes a soft breaking parameter: the more compromising instances are removed, the more accurate the resulting insights are to be expected. Conversely, this means: Stopping too early only influences the details of the insights but not the overall results. The identification of abnormal time series in the two-dimensional plot depends on the human operator using our approach. We are aware that this step is not optimal and should be addressed in further research.
Generating multiple feature spaces has several advantages: Implausible time series indicate detached or faulty sensors, which can be quickly identified by analyzing the feature spaces. This can help process engineers save time because they will not have to check hundreds of time series to find salient sensors manually. Moreover, the feature space can be used to compare the different characteristics of the sensor data, which can be useful to monitor the behavior of several systems.
PCA and extended versions of it are already used in the aluminum electrolysis to monitor the conditions of reduction cells [27,28,29,30]. By considering several time series meta-features in the PCA, the monitoring of reduction cells could be further enhanced. For example, TAE has started to modulate the power input of reduction cells in their first pot room due to the increasing market of renewable energy resulting in a variation of electrical energy supply [12,14]. By modulating the power input, the behavior of each cell could be visualized and monitored with the help of time series meta-features and feature spaces. In the following Section 4, we exclusively focus on the first part of finding salient sensor data from TAE that indicate detached or faulty sensors. The monitoring of the cells behavior using meta-features will be the subject of future research.

4. Experiment and Results

For the experiment, we used Python 3.9.12, R 4.0.5, scikit-learn 0.24.1, pandas 1.2.1, Matplotlib 3.3.4, and feasts 0.1.6. We applied the presented approach to real sensor data from TAE to find salient sensor data that indicate detached or faulty sensors. The original sensor data set under analysis contained 748 time series between 8 October 2020 07:00 and 9 October 2020 07:00, generated by 22 thermocouples mounted to 34 reduction cells in the first pot room. All 120 cells in the first pot room are fitted with shell heat exchangers to maintain the ledge of each cell during current modulation [14]. The mounting positions of the thermocouples were not considered in the analysis, as we were only interested in finding salient sensor data. The data have been resampled with a resolution of 1 min to create evenly spaced time series. For the calculation of time series meta-features, the temperature data were converted to Kelvin. We conducted three iterations, following each step in Figure 3. To create a meta-feature data set, we calculated several time series meta-features taken from [8,15,22,31], which are listed in Table 2.
A total of 432 time series had at least one meta-feature for which no value could be calculated. A visual inspection showed a constant value for each of the time series, which resulted in an exclusion of the corresponding meta-feature instances in further analyses. In the next step, a PCA was conducted using the z-transformed meta-feature instances from 316 time series. The first and second principal component were used to generate a two-dimensional feature space, which is visualized in Figure 4. Each point represents a time series, some of which are plotted as a black curve next to their corresponding point in the figure. Several points, e.g., TCWT2-1068, TCWT5-1118, TCWT3-1118, TCWT6-1118 and TCWT2-1118, diverge from the left-hand side cluster and therefore show an implausible range of temperature values. The corresponding meta-feature instances were removed from the meta-feature data set.
After removing the meta-features in the first iteration, an additional PCA was conducted. The resulting first two principal components are visualized in Figure 5. The time series TCWT5-1012, TCWT4-1033 and TCWT6-1116 showed implausible characteristics. The corresponding meta-feature instances were not considered in the third iteration. The first two principal components from the third iteration are shown in Figure 6. Some time series displayed in Figure 6 exhibit a curved progress, some of which can be explained by an influence of the shell heat exchangers airflow, due to an increase in current that took place over four hours of the analysis period. Other time series, e.g., TCWT3-1061, TCWT3-1001, TCWT6-1012, and TCWT2-1115, are salient in their characteristics and should be further analyzed. In summary, Figure 6 gives an overview about the different time series characteristics of the sensor data set. Due to the generation of a two-dimensional feature space in multiple iterations, salient and implausible time series could be identified. In Section 5, a dashboard is introduced which focuses on a user-friendly handling of the individual steps in Figure 3. To reproduce our experimental results, the source code and data set were published on Github at: https://github.com/ngrabows/ts-information-extraction (accessed on 18 April 2023).

5. Dashboard

Picking up the idea from [15], we conceptualized and developed a dashboard to interact with the steps in Figure 3 in a user-friendly way without having to run actual code. The dashboard was developed with Dash [32] and is connected to a database that is hosted at the Institute for Automation/Computer Science, University of Wuppertal. It receives the sensor data for a specific period one wants to analyze, such as a current modulation experiment. As shown in Figure 7, the user can choose between several time series meta-features, which can be selected and deselected in a multi-value drop-down menu. This menu can be helpful to check different combinations of time series meta-features quickly. Moving the cursor over a point on the two-dimensional feature space triggers a popup with additional information about the corresponding time series while the progress of the time series is also plotted in an additional figure below the feature space. Clicking on a point on the feature space adds it to an additional checklist that can be used for ignoring corresponding meta-feature instances in further iterations (see Figure 3). Accidentally added points can be reconsidered by removing the corresponding check. Furthermore zooming in and out on the feature space helps analyzing the sensor data regarding their characteristics.

6. Conclusions

In this paper we present an approach based on the work of Hyndman et al. [15] using time series meta-features in conjunction with a PCA to extract and visualize the characteristics of a sensor data set from TAE in multiple two-dimensional feature spaces. The first part of the paper provides an overview about the beneficial use of time series meta-features in different settings, i.e., regression, classification, clustering, anomaly detection, visualization and meta-learning. We introduce a flow chart that generates multiple two-dimensional feature spaces to find salient sensor data. The feature spaces can be used to detect detached or faulty sensors easily. Furthermore, we present a dashboard to interact with the steps in Figure 3. The dashboard can enables employees at TAE to find sensor data outliers, which makes our research applicable in practice.
The work described in this paper opens up the potential for the following extensions. Currently the identification of implausible time series is based on assessment, which could be improved with the help of additional methods, e.g., α -hulls or highest density regions [15]. Another idea is to enhance the manual selection of meta-features by using an algorithm to recommend several time series meta-features for an individual data set [5]. For further research, we plan to investigate strategies that use time series meta-features to enhance the monitoring of reduction cells.
In Section 3 we state that our approach can be applied to different fields of the smart manufacturing sector. However, we cannot guarantee that the results will always be of practical use. Further research should be conducted to validate the performance of our approach on other data sets, including the TAE data. The choice of suitable time series meta-features plays an essential role and should be performed with the help of expert knowledge from the relevant field in order to obtain meaningful and interpretable insights into the sensor data. So, a different choice of time series meta-features may need to be used if our approach is transferred to another sector of the smart manufacturing. Furthermore, a survey should be conducted with TAE employees to verify the usability of our developed dashboard. In summary, the presented work is an essential step for the smart manufacturing sector. The approach enables process engineers at TAE to extract and visualize information from reduction cells in order to detect salient sensor data.

Author Contributions

Conceptualization, N.G. and R.K.; Data curation, N.G. and A.M.; Formal analysis, N.G.; Funding acquisition, N.G., R.K. and D.T.; Investigation, N.G.; Methodology, N.G.; Project administration, D.T.; Resources, N.G., R.D. and A.M.; Software, N.G.; Supervision, D.T.; Validation, N.G.; Visualization, N.G.; Writing—original draft, N.G.; Writing—review & editing, N.G., R.K., R.D., A.M. and D.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the European Union and the European Regional Development Fund grant number EFRE-0200490. This publication is funded from the Open Access Publication Fund of the University of Wuppertal.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The TAE data presented in this paper are openly available at: https://github.com/ngrabows/ts-information-extraction (accessed on 18 April 2023).

Acknowledgments

N.G. thanks Philipp Kaiser and Jan-Philipp Wiese for developing the presented dashboard and Raphaele Bartels for manuscript editing.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
TAETRIMET Aluminium SE Essen
PCAprincipal component analysis
MLPmultilayer perceptron
PCprincipal component

References

  1. Majid, N.A.A.; Taylor, M.P.; Chen, J.J.J.; Stam, M.A.; Mulder, A.; Young, B.R. Aluminium process fault detection by Multiway Principal Component Analysis. Control Eng. Pract. 2011, 19, 367–379. [Google Scholar] [CrossRef]
  2. Kremser, R.; Grabowski, N.; Düssel, R.; Kessel, K.; Tutsch, D. Investigation of Different Measurement Techniques for Individual Anode Currents in Hall-Héroult Cells. In Proceedings of the 12th Australasian Aluminium Smelting Technology Conference, Queenstown, New Zealand, 2–7 December 2018. [Google Scholar]
  3. Pursche, T.; Grabowski, N.; Nowitzki, J.; Claub, R.; Patryarcha, L.; Dreisbach, H.; Tibken, B. Identification of Overtemperature Disturbances in Industrial Food Refrigeration Processes. In Proceedings of the 2018 57th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE), Nara, Japan, 11–14 September 2018; IEEE: New York, NY, USA, 2018. [Google Scholar] [CrossRef]
  4. Fulcher, B.D. Feature-Based Time-Series Analysis. In Feature Engineering for Machine Learning and Data Analytics; CRC Press/Taylor & Francis Group: Boca Raton, FL, USA, 2018; pp. 87–116. [Google Scholar]
  5. Wang, X.; Smith, K.; Hyndman, R. Characteristic-Based Clustering for Time Series Data. Data Min. Knowl. Discov. 2006, 13, 335–364. [Google Scholar] [CrossRef]
  6. Vilalta, R.; Drissi, Y. A Perspective View and Survey of Meta-Learning. Artif. Intell. Rev. 2002, 18, 77–95. [Google Scholar] [CrossRef]
  7. Hyndman, R. Mcomp: Data from the M-Competitions, R package Version 2.8. 2018. Available online: https://cran.r-project.org/package=Mcomp (accessed on 21 April 2023).
  8. O’Hara-Wild, M.; Hyndman, R.; Wang, E.; Feasts: Feature Extraction and Statistics for Time Series. R package Version 0.1.6 (conda-forge). 2020. Available online: https://cran.r-project.org/package=feasts (accessed on 21 April 2023).
  9. Talagala, T.S.; Hyndman, R.J.; Athanasopoulos, G. Meta-learning how to forecast time series. J. Forecast. 2023. [Google Scholar] [CrossRef]
  10. Kang, Y.; Hyndman, R.J.; Li, F. GRATIS: GeneRAting TIme Series with diverse and controllable characteristics. Stat. Anal. Data Mining ASA Data Sci. J. 2020, 13, 354–376. [Google Scholar] [CrossRef]
  11. Grjotheim, K.; Kvande, H. Introduction to Aluminium Electrolysis: Understanding the Hall-Héroult Process, 2nd ed.; Alu Media GmbH: Düsseldorf, Germany, 1993. [Google Scholar]
  12. Düssel, R. Entwicklung eines Regelungskonzepts für Aluminium-Elektrolysezellen unter Berücksichtigung einer Variablen Stromstärke und eines Regelbaren Wärmeverlusts. Ph.D. Thesis, Bergische Universität Wuppertal, Wuppertal, Germany, 2016. [Google Scholar]
  13. Depree, N.; Düssel, R.; Patel, P.; Reek, T. The ‘Virtual Battery’—Operating an Aluminium Smelter with Flexible Energy Input. In Light Metals 2016; Springer: Cham, Switzerland, 2016; pp. 571–576. [Google Scholar] [CrossRef]
  14. Düssel, R.; Mulder, A.; Bugnion, L. Transformation of a Potline from Conventional to a Full Flexible Production Unit. In Light Metals 2019; Chesonis, C., Ed.; Springer: Cham, Switzerland, 2019; pp. 533–541. [Google Scholar] [CrossRef]
  15. Hyndman, R.J.; Wang, E.; Laptev, N. Large-Scale Unusual Time Series Detection. In Proceedings of the 2015 IEEE International Conference on Data Mining Workshop (ICDMW), Atlantic, NJ, USA, 14–17 November 2015; IEEE: New York, NY, USA, 2015. [Google Scholar] [CrossRef]
  16. Grabowski, N.; Kremser, R.; Düssel, R.; Mulder, A.; Tutsch, D. Using Random Forest Regression for Predicting and Analysing Reduction Cell Behaviour. In Proceedings of the 12th Australasian Aluminium Smelting Technology Conference, Queenstown, New Zealand, 2–7 December 2018. [Google Scholar]
  17. Kremser, R.; Grabowski, N.; Düssel, R.; Mulder, A.; Tutsch, D. Anode Effect Prediction in Hall-Héroult Cells Using Time Series Characteristics. Appl. Sci. 2020, 10, 9050. [Google Scholar] [CrossRef]
  18. Lubba, C.H.; Sethi, S.S.; Knaute, P.; Schultz, S.R.; Fulcher, B.D.; Jones, N.S. catch22: CAnonical Time-series CHaracteristics. Data Min. Knowl. Discov. 2019, 33, 1821–1852. [Google Scholar] [CrossRef] [Green Version]
  19. Garza, F.; Gutierrez, K.; Challu, C.; Moralez, J.; Olivares, R.; Mergenthaler, M. tsfeatures. Python Package. Available online: https://github.com/Nixtla/tsfeatures (accessed on 21 April 2023).
  20. Nanopoulos, A.; Alcock, R.; Manolopoulos, Y. Feature-Based Classification of Time-Series Data. In Information Processing and Technology; Nova Science Publishers, Inc.: Hauppauge, NY, USA, 2001; pp. 49–61. [Google Scholar]
  21. Horvath, M.; Vircikova, E. Data Mining For Quality Control of Primary Aluminium Production Process. In Management and Production Engineering Review; Production Engineering Committee of the Polish Academy of Sciences, Polish Association for Production Management: Warsaw, Poland, 2012; Volume 3, pp. 47–53. [Google Scholar]
  22. Talagala, P.D.; Hyndman, R.J.; Smith-Miles, K.; Kandanaarachchi, S.; Muñoz, M.A. Anomaly Detection in Streaming Nonstationary Temporal Data. J. Comput. Graph. Stat. 2020, 29, 13–27. [Google Scholar] [CrossRef]
  23. Kang, Y.; Hyndman, R.J.; Smith-Miles, K. Visualising forecasting algorithm performance using time series instance spaces. Int. J. Forecast. 2017, 33, 345–358. [Google Scholar] [CrossRef] [Green Version]
  24. Makridakis, S.; Hibon, M. The M3-Competition: Results, conclusions and implications. Int. J. Forecast. 2000, 16, 451–476. [Google Scholar] [CrossRef]
  25. Prudêncio, R.B.C.; Ludermir, T.B. Meta-learning approaches to selecting time series models. Neurocomputing 2004, 61, 121–137. [Google Scholar] [CrossRef]
  26. Kalousis, A.; Theoharis, T. NOEMON: Design, implementation and performance results of an intelligent assistant for classifier selection. Intell. Data Anal. 1999, 3, 319–337. [Google Scholar] [CrossRef]
  27. Manolescu, P.; Duchesne, C.; Tessier, J.; Saevarsdottir, G. On the Use of Multivariate Statistical Methods to Detect, Diagnose and Mitigate Abnormal Events in Aluminium Smelters. In Light Metals; Springer: Cham, Switzerland, 2018; pp. 475–483. [Google Scholar] [CrossRef]
  28. LaJambe, D.; Poulin, É; Duchesne, C.; Tessier, J. Anodic Incident Detection through Multivariate Analysis of Individual Anode Current Signals. In Light Metals; Springer: Cham, Switzerland, 2020; pp. 535–542. [Google Scholar] [CrossRef]
  29. Yao, Y.; Cheung, C.Y.; Bao, J.; Skyllas-Kazacos, M.; Welch, B.J.; Akhmetov, S. Detection of Local Cell Conditions Based on Individual Anode Current Measurements. In Light Metals; Springer: Cham, Switzerland, 2016; pp. 595–600. [Google Scholar] [CrossRef]
  30. Majid, N.A.A. Cascade Fault Detection and Diagnosis for the Aluminium Smelting Process using Multivariate Statistical Techniques. Ph.D. Thesis, University of Auckland, Auckland, New Zealand, 2011. [Google Scholar]
  31. Fulcher, B.D.; Jones, N.S. hctsa: A Computational Framework for Automated Time-Series Phenotyping Using Massive Feature Extraction. Cell Syst. 2017, 5, 527–531.e3. [Google Scholar] [CrossRef] [PubMed]
  32. The Dash Development Team. Dash; Python package Version 1.20.0 (conda-forge); The Dash Development Team: Milwaukee, WI, USA, 2021; Available online: https://github.com/plotly/dash (accessed on 21 April 2023).
Figure 1. Several raw time series from the M1- and M3-Competition [7], which have different characteristics regarding the trend and seasonal strength. Values were rounded to two decimal places. Trend strength and seasonal strength were computed using feasts 0.1.6 [8]. Further examples can be found in [9].
Figure 1. Several raw time series from the M1- and M3-Competition [7], which have different characteristics regarding the trend and seasonal strength. Values were rounded to two decimal places. Trend strength and seasonal strength were computed using feasts 0.1.6 [8]. Further examples can be found in [9].
Applsci 13 07065 g001
Figure 2. One of the three pot rooms operated at the Essen site in Germany by TRIMET Aluminium SE. One pot room contains 120 reduction cells in an end-to-end configuration.
Figure 2. One of the three pot rooms operated at the Essen site in Germany by TRIMET Aluminium SE. One pot room contains 120 reduction cells in an end-to-end configuration.
Applsci 13 07065 g002
Figure 3. The flow chart shows each step of the approach. A set of time series meta-features is calculated, which should describe the characteristics of the time series data set. Using the calculated meta-features, a principal component analysis (PCA) is conducted to create a two-dimensional feature space, which is used to identify implausible time series. The corresponding instances (rows) are removed from the meta-feature data set and further iterations are performed until all implausible time series have been identified.
Figure 3. The flow chart shows each step of the approach. A set of time series meta-features is calculated, which should describe the characteristics of the time series data set. Using the calculated meta-features, a principal component analysis (PCA) is conducted to create a two-dimensional feature space, which is used to identify implausible time series. The corresponding instances (rows) are removed from the meta-feature data set and further iterations are performed until all implausible time series have been identified.
Applsci 13 07065 g003
Figure 4. The visualization of the two-dimensional feature space created in the first iteration. Each point represents a time series. Some time series, e.g., TCWT2-1068, TCWT5-1118, TCWT3-1118, TCWT6-1118 and TCWT2-1118, are further away from the left-hand side cluster and exhibit an implausible range of values. The corresponding meta-feature instances were not considered in the next iteration.
Figure 4. The visualization of the two-dimensional feature space created in the first iteration. Each point represents a time series. Some time series, e.g., TCWT2-1068, TCWT5-1118, TCWT3-1118, TCWT6-1118 and TCWT2-1118, are further away from the left-hand side cluster and exhibit an implausible range of values. The corresponding meta-feature instances were not considered in the next iteration.
Applsci 13 07065 g004
Figure 5. The visualization of the two-dimensional feature space created in the second iteration. The time series TCWT5-1012, TCWT4-1033 and TCWT6-1116 exhibit implausible characteristics. The corresponding meta-feature instances were not considered in the final iteration.
Figure 5. The visualization of the two-dimensional feature space created in the second iteration. The time series TCWT5-1012, TCWT4-1033 and TCWT6-1116 exhibit implausible characteristics. The corresponding meta-feature instances were not considered in the final iteration.
Applsci 13 07065 g005
Figure 6. The visualization of the two-dimensional feature space created in the final iteration gives an overview about the characteristics of the sensor data set. Some time series are salient, e.g., TCWT3-1061, TCWT3-1001, TCWT6-1012, and TCWT2-1115, which should be further analyzed and monitored.
Figure 6. The visualization of the two-dimensional feature space created in the final iteration gives an overview about the characteristics of the sensor data set. Some time series are salient, e.g., TCWT3-1061, TCWT3-1001, TCWT6-1012, and TCWT2-1115, which should be further analyzed and monitored.
Applsci 13 07065 g006
Figure 7. A screenshot of the dashboard portraying the feature space that is also displayed in Figure 4. The time series TCRL5-1026, which exhibits an implausible range of values, is plotted below the feature space. Several time series meta-feature can be selected and deselected using the multi-value drop-down menu. The checklist on the left side is used to exclude sensor data in further iterations.
Figure 7. A screenshot of the dashboard portraying the feature space that is also displayed in Figure 4. The time series TCRL5-1026, which exhibits an implausible range of values, is plotted below the feature space. Several time series meta-feature can be selected and deselected using the multi-value drop-down menu. The checklist on the left side is used to exclude sensor data in further iterations.
Applsci 13 07065 g007
Table 2. 20 time series meta-features were calculated for the experiment.
Table 2. 20 time series meta-features were calculated for the experiment.
ACF1Burst
CurvatureLinearity
Longest flat spotMax
MeanMin
MomentRatio high low mean
Ratio mean iqmeanShift level index
Shift level maxShift var index
Shift var maxSpectral entropy
SpikinessVar tiled mean
Var tiled varVariance
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Grabowski, N.; Kremser, R.; Düssel, R.; Mulder, A.; Tutsch, D. Information Extraction from Industrial Sensor Data Using Time Series Meta-Features. Appl. Sci. 2023, 13, 7065. https://doi.org/10.3390/app13127065

AMA Style

Grabowski N, Kremser R, Düssel R, Mulder A, Tutsch D. Information Extraction from Industrial Sensor Data Using Time Series Meta-Features. Applied Sciences. 2023; 13(12):7065. https://doi.org/10.3390/app13127065

Chicago/Turabian Style

Grabowski, Niclas, Ron Kremser, Roman Düssel, Albert Mulder, and Dietmar Tutsch. 2023. "Information Extraction from Industrial Sensor Data Using Time Series Meta-Features" Applied Sciences 13, no. 12: 7065. https://doi.org/10.3390/app13127065

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop