Algorithms for Automatic Data Validation and Performance Assessment of MOX Gas Sensor Data Using Time Series Analysis

Hammer, Christof; Sporrer, Sebastian; Warmer, Johannes; Kaul, Peter; Thoelen, Ronald; Jung, Norbert

doi:10.3390/a15100360

Open AccessArticle

Algorithms for Automatic Data Validation and Performance Assessment of MOX Gas Sensor Data Using Time Series Analysis

by

Christof Hammer

^1,2,*,

Sebastian Sporrer

²,

Johannes Warmer

¹

,

Peter Kaul

¹

,

Ronald Thoelen

³

and

Norbert Jung

¹

Institute of Safety and Security Research ISF, University of Applied Sciences Bonn-Rhine-Sieg, Grantham Allee 20, 53757 Sankt Augustin, Germany

²

Institute for the Protection of Terrestrial Infrastructures, German Aerospace Center, Rathaus Allee 12, 53757 Sankt Augustin, Germany

³

Institute for Materials Research, Hasselt University, Wetenschapspark 1, B-3590 Diepenbeek, Belgium

^*

Author to whom correspondence should be addressed.

Algorithms 2022, 15(10), 360; https://doi.org/10.3390/a15100360

Submission received: 29 August 2022 / Revised: 23 September 2022 / Accepted: 26 September 2022 / Published: 28 September 2022

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The following work presents algorithms for semi-automatic validation, feature extraction and ranking of time series measurements acquired from MOX gas sensors. Semi-automatic measurement validation is accomplished by extending established curve similarity algorithms with a slope-based signature calculation. Furthermore, a feature-based ranking metric is introduced. It allows for individual prioritization of each feature and can be used to find the best performing sensors regarding multiple research questions. Finally, the functionality of the algorithms, as well as the developed software suite, are demonstrated with an exemplary scenario, illustrating how to find the most power-efficient MOX gas sensor in a data set collected during an extensive screening consisting of 16,320 measurements, all taken with different sensors at various temperatures and analytes.

Keywords:

time series analysis; MOX gas sensors; slope based signature; automatic measurement validation; prioritizable ranking; feature extraction

1. Introduction

Since metal-oxide (MOX) gas sensors are cheap, easy to acquire and available in large quantities, they have become popular in different measurement scenarios such as leakage detection in chemical factories or air quality measurements in central venting systems [1,2]. The sensors can detect gas concentrations down to the ppb level, but suffer from the disadvantage of not being selective enough. Hence, researchers continuously create and test new material combinations with the goal of building sensors that are very selective and sensitive to a specific target [3]. In addition to the actual composition of the sensitive layer, the sintering parameters used for the process of applying the metal-oxide onto the empty sensor carrier impacts the sensor’s performance immensely. Therefore, custom-made sensors are manufactured in batches with the same metal oxide composition, but individual sinter parameters. In order to test the achieved individual sensitivity and selectivity of the sensors in a batch, all sensors are exposed simultaneously but sequentially to different gases whilst being operated at different substrate temperatures. This procedure is called a sensor screening [4,5].

Depending on the granularity, a screening can be a very time-consuming task (i.e., several days) and should ideally be highly automatized. In our previous work, we presented hardware solutions for automated batch sintering [6] and a sensor readout system to carry out highly automated sensor screenings [7]. Since the acquired data has to be analyzed and interpreted to achieve the final goal of finding the best fitting sensor and its optimal operating temperature for a given target, an automated measurement hardware for sensor screenings is only half way to the goal. Due to the large amount of raw data captured during an automated screening with many parameter combinations, a manual interpretation can also be very time-consuming and will therefore benefit greatly from a high degree of automation itself.

There are two main challenges identified for the automatized processing and analysis of the data acquired during an automated screening that have to be addressed algorithmically:

Validation
Since manufacturing and operating parameters are still under research, some sensors may show a malformed or no response at all. The occurrence of such invalid measurements in a screening is therefore very likely. These measurements need to be sorted out, to only include measurements from proper sensors for the final assessment.
Ranking
To identify the best sensor for a given application, a performance metric is required. It should be based on quantifiable and individually prioritizable features extracted from the time series measurements. The ability to tune the metric through feature-wise prioritization will help to model the scenario, for which the ranking is performed, in greater detail.

In the following work, we will present method combinations and algorithms needed to address the identified challenges. To find invalid or unusual measurements, we propose an automatic validation method based on curve similarity that compares new measurements against a well-known reference to determine how correlative they are. An algorithm calculates a numeric similarity value between the given reference and the curves under test. A threshold can then be used to automatically sort out measurements that are too dissimilar from the reference. Furthermore, we propose a signature extraction algorithm that significantly enhances the performance of the well-established curve metrics, directly improving the numeric similarity results. The solution for the sensor ranking is split into feature extraction and the ranking algorithm itself. Sensor-expert interviews led to the identification of several MOX sensor-specific features. A relevant set of features, extractable from the sensors’ time series, was mathematically formalized. Finally, we can use the resulting feature vectors as input for our proposed ranking algorithm, which is based on multiplicative arithmetic. The ranking can individually prioritize a freely selectable combination of features from the vector, to best possibly adapt the ranking process to the target application for the sensor.

We will conclude by showing the developed algorithms and software on an exemplary ranking performed on a data set obtained during an extensive sensor screening, to automatically find the most sensitive sensor for a given analyte, while consuming as little power for its heater element as possible.

2. Related Work and Data Origin

Since this works primary contributions are algorithms and methods for automatic data validation and ranking of newly manufactured sensors for application of specific detection tasks, we looked at similar work in this field.

Many research groups like Leo et al. [8] mention their data processing as using several individual pieces of heterogeneous, commercial software tools like LabView or Matlab, or scripts. Often, the used sensor-features and statistical methods are presented informally as incomplete textual expressions or as black boxes entirely [9]. This makes a reproduction of the algorithms difficult. The software Dave3 [10], is a toolbox with a graphical user interface, which comes close to the idea, we want to convey. The tool is, however, specialized for the evaluation of data obtained during temperature cyclic operation of gas sensors and not applicable for the validation and performance ranking for sensors according to their screening data. Another unnamed software for the evaluation of data obtained from an electronic nose could be found. The software is limited to a specific subset of sensors and also is not suitable for ranking or validating data [11]. Both tools have the major shortcoming of being based on commercial software like MatLab or LabView which requires additional licences.

Our goal is to present the algorithms for validation and ranking as well as the needed features in a mathematically formalized way so that can they can be implemented in a variety of (open source) languages of choice, such as Python or R.

2.1. Sensor Screening Method

The data used in this work is the result of a detailed screening of 64 sensors exposed to nine different analytes. The sensors under test differ in their substrate composition as well as their sinter times and sinter temperatures used during their production. As the operating temperature greatly impacts the sensor performance, a single physical sensor operated at different temperatures can be regarded as multiple virtual sensors with very different sensitivity and selectivity [12]. The operating temperatures were therefore varied during the screening, to record the resulting impact on the sensor performance. Sensor resistance, heater voltage and heater current were sampled with 1 Hz during the entire screening. A single measurement for a sensor and an analyte at a given temperature is repeated at least three times before the temperature is changed and the cycle starts over. The resulting time series for each measurement in the described data set always consists of the following three segments and durations as defined by the screening procedure:

Figure 1 is an exemplary depiction of the result from a single measurement. The dotted vertical lines indicate the analyte exposure to the sensor, while the toned down color of the curve, left and right of the dotted lines, is used to visualize the baseline and clearing segments as described in Table 1.

2.2. Preprocessing

An out of range (OOR) detection algorithm checks that each sample lies within a range defined by a fixed lower boundary of 0 Ω and a customizable upper boundary

τ_{r}

, whereas

τ_{r}

ideally coincides with the maximal measurable resistance value of the measurement equipment. If more than 10 consecutive samples are outside these boundaries, the measurement is flagged as invalid.

If optional information for heater voltage and current is available, it can be used to detect and remove measurements that were performed with sensors that presumably have broken or malfunctioning heater elements. These extended checks are:

Regulation Deviation
The recorded heater voltage is compared to the targeted voltage. The algorithm counts the occurrences of deviations of $\pm 5 %$ to the target voltage. If this occurs more than 10 times, the measurement is flagged as erroneous.
Continuous Current Flow
Checks that the current is actually flowing through the heater element throughout the entirety of the measurement.
Heater-Characteristic
Using the parameters from the technical information bulletin provided by UST Umweltsensortechnik GmbH [13], the resistance characteristic of the integrated platinum heater element of the sensors at different temperatures can be validated. Sensors that were exposed to long sintering times at high temperatures are especially prone to damage to their heater element. Such sensors can be flagged with a warning.

The system presented in our previous work [7] provides this additional data and was used for all screenings. Therefore, the extended preprocessing is applied to all measurements in the available data set. According to the upper limit of the used measurement system, the upper boundary for the range checks is

τ_{r} = 4 G Ω

. Finally, an outlier detection was performed to correct for single-sample signal anomalies.

3. Algorithms for Validation and Ranking

Based on the challenges described in Section 1, the following solutions are proposed:

Slope-based signature calculation as additional curve similarity metric to enhance a distance-based, semi-automatic measurement validation process;
Feature-based and prioritizable ranking metric to sort the sensors according to their performance towards a given analyte.

Before getting started, the formal conventions are introduced. In this work a vector is denoted with

x \in R^{| x |}

where

| x |

is defined as the amount of the vector’s elements. The vector element at index i is referenced by

x [i]

with

1 \leq i \leq | x |

. A window with size

w \in N

around an index i can be interpreted as a vector itself containing only values from the original vector

x

with the limits

l \leq i \leq r

.

\begin{matrix} x 〈 i, w 〉 with l = \{\begin{matrix} i - w & if i - w > 1, \\ 1 & else \end{matrix} and r = \{\begin{matrix} i + w & if i + w < | x |, \\ | x | & else \end{matrix} \end{matrix}

(1)

Let

v = (\begin{matrix} 1 & 2 & 3 & 4 & 5 \end{matrix})

be a vector with five elements. An exemplary window could then be

v 〈 3, 1 〉 = (\begin{matrix} 2 & 3 & 4 \end{matrix})

. Furthermore, the last n elements of the vector could also be addressed via the window

v 〈 | v |, n - 1 〉

. For

n = 3

, this gives

v 〈 5, 2 〉 = (\begin{matrix} 3 & 4 & 5 \end{matrix})

.

A measurement is a vector

m \in M

with M being the set of all resistance measurements of one or multiple sensor screenings, as introduced in Section 2.1. The elements of this vector with their corresponding indices represent the sampled values and the time base of the measurement. Following this notation, the different segments of a measurement are defined as

m_{a}

for the analyte exposure,

m_{b}

for the baseline and

m_{c}

for the clearing phase.

The residual standard deviation used in the work is defined as [14]:

\begin{matrix} s (x, r) = \sqrt{\frac{\sum_{i = 1}^{| x |} {(x [i] - r [i])}^{2}}{| x | - 2}}, | x | > | r | \end{matrix}

(2)

All notation conventions, including those introduced above, are summarized in Table 2 as a compact overview.

3.1. Slope-Based Curve Signature

As mentioned in the motivation, one challenge to be solved algorithmically is to provide support for validating the screening measurements. To be as efficient as possible, without having detailed knowledge about the behavior of the sensor itself, the fastest way is to search for similar curves to a given reference time series.

Since the proposed signature is based on the curve’s slope, the algorithm works on the first derivative of the Savitzky–Golay [15] filtered measurement time series. It assigns each measurement a sequence comprised of the symbols +, − and ∗, representing its slope characteristic.

In preparation for the signature, a threshold t for the measurement’s noise is needed. It is calculated using the residual standard deviation s of the baseline’s last 200 s before analyte exposure and multiplying it with a customizable tolerance factor

τ_{v}

according to:

\begin{matrix} t = s (x, \hat{x}) \cdot τ_{v}, with x = {\dot{\tilde{m}}}_{b} 〈 | {\dot{\tilde{m}}}_{b} |, 199 〉 \end{matrix}

(3)

Each sample from the first derivative is then assigned a symbol as follows:

\begin{matrix} \begin{matrix} sgn (m, i) = \{\begin{matrix} + & if {\dot{\tilde{m}}}_{a} [i] > t, \\ - & if {\dot{\tilde{m}}}_{a} [i] < - t, \\ * & else \end{matrix} \end{matrix} \end{matrix}

(4)

If the absolute value of

{\dot{\tilde{m}}}_{a} [i]

is smaller than the noise threshold t, it is assigned the ∗ sign, indicating that the slope is caused by noise. Else the sgn function values are coded as either + or −. The values assigned with the ∗ symbol are not important to the signature itself, but are needed for correct hysteresis filtering. The resulting symbols from the sgn function are concatenated into a sequence, resulting in the measurement slope signature. The final signature is created, hysteresis filtered and simplified as follows:

Build $sig$ by concatenating results of $sgn (m, i)$ for each sample.
Delete all leading ∗ from $sig$
Replace each remaining ∗ with the immediately preceding + or − symbol
Delete all symbols that are not part of an at least $ω_{s}$ long sub-string of the same symbol
Reduce all identical consecutive occurrences of the same symbol to one occurrence

Following is a non-exhaustive list of well-known curve similarity measures that can be extended by the proposed signature.

Area Method [16]
Discrete Fréchet Distance [17]
Partial Curve Mapping (PCM) [17]

We decided to use a simple difference-based approach for our application example, since it is fast and sufficient. This calculation is performed on the analyte segment of the reference curve

r \in M

and the curve under test

m \in M

, where

{m_{a}}^{'}

represents a min-max normalized analyte segment of a measurement.

\begin{matrix} d (m, r) = \frac{\sum_{i = 1}^{| m_{a} |} | m_{a}^{'} [i] - r_{a}^{'} [i] |}{| m_{a} |} \end{matrix}

(5)

Because all calculations are performed on the min-max normalized curves and the resulting sum is divided by

| m_{a} |

, identical curves have a distance of 0, whereas the maximum distance is limited to 1.

3.2. Feature Extraction

Before detailed definitions of the actual features are given, the helper function u is introduced. It yields the smallest index of a measurement

m \in M

, at which the average of a surrounding window reaches a relative amount

τ_{u}

of the reaction’s peak. The threshold

τ_{u}

and the window size

ω_{u}

can be chosen as needed. The arithmetic mean of all values in a measurement is denoted with

\bar{m}

. The definition of u is based on a case differentiation regarding the main direction of the reaction’s slope:

\begin{matrix} \begin{matrix} u (m, τ_{u}) = \{\begin{matrix} min {i ∣ z \geq τ_{u}} & if m has a positive - slope reaction, \\ min {i ∣ z \leq 1 - τ_{u}} & else, \end{matrix} \\ with z = \bar{m_{a}^{'} 〈 i, ω_{u} 〉}, i \in N, 1 \leq i \leq | m_{a} | and τ_{u} \in R, 0 < τ_{u} < 1 . \end{matrix} \end{matrix}

(6)

Moreover, some of the features depend on the slope

m_{\hat{x}}

of a linear regression model, defined in the following sample-wise definition of

\hat{x}

.

\begin{matrix} \hat{x} [i] = m_{\hat{x}} \cdot i + b_{\hat{x}} \end{matrix}

(7)

The overall performance indicator for each measurement is calculated based on quantifiable features, which are each defined as a function

f_{j} : M \to R

. All features are part of the feature set F and can be referenced using an index

j \in N

with

1 \leq j \leq | F |

.

The following initial set of features was identified after interviewing a domain expert in the field of MOX gas sensors. The features were then formalized in the following list. All features marked with ⁻¹ need to be inverted after normalization because a higher value will always be considered better for the performance metric introduced later in this section.

1.: Sensitivity
The sensitivity quantifies how strong a sensor reacts to the analyte it is exposed to [18,19].
It is calculated by subtracting the mean-value of a window $a = \bar{m_{a} 〈 | m_{a} |, 119 〉}$ containing the samples of the last 120 s of analyte exposure from the mean value of a window b that contains the samples of the last 120 s before gas exposure (baseline), divided by the latter.

$\begin{matrix} \begin{matrix} f_{1} (m) = \frac{b - a}{b} with b = \bar{m_{b} 〈 | m_{b} |, 119 〉} and a = \bar{m_{a} 〈 | m_{a} |, 119 〉} \end{matrix} \end{matrix}$

(8)
2.: Reaction Speed I ⁻¹
This measure is an indicator of how fast the sensor reacts to the analyte it is exposed to. It covers the time from the start of the exposition to the analyte until the reaction reaches 50 % of its overall strength.

$\begin{matrix} \begin{matrix} f_{2} (m) = u (m, 0.5) \end{matrix} \end{matrix}$

(9)
3.: Reaction Speed II ⁻¹
The time between reaching 50% and 90% of the maximum reaction is used as a second measure for the reactivity of the sensor.

$\begin{matrix} f_{3} (m) = u (m, 0.9) - u (m, 0.5) \end{matrix}$

(10)
4.: Plateau Quality ⁻¹
Ideally, after a transient response, the sensor signal will reach a plateau. Therefore the slope of a linear regression curve between the point where 90% of the maximum signal is reached and the end of the analyte segment can be used to quantify the quality of this plateau.

$\begin{matrix} \begin{matrix} f_{4} (m) = | m_{\hat{m_{a} (l, r)}} | with l = u (m, 0.9) and r = | m_{a} | \end{matrix} \end{matrix}$

(11)
5.: Drift ⁻¹
The slope of a linear regression curve fitted through the baseline segment shows a possible drift of the sensor resistance. While a small slope might be acceptable, higher drift leads to possible unstable sensor behaviour in the field.

$\begin{matrix} f_{5} (m) = | m_{{\hat{m}}_{b}} | \end{matrix}$

(12)
6.: Repeatability ⁻¹
The similarity between all measurements of the same sensor/analyte combination is an indicator of the repeatability. The average of the curve distances d, introduced with Equation (5), is calculated for all possible combinations. The following equation is an example, defining the feature for the three valid measurements $m, n, o \in M$ per sensor/analyte pair.

$\begin{matrix} f_{6} (m) = f_{6} (n) = f_{6} (o) = \frac{d (m, n) + d (m, o) + d (n, o)}{3} \end{matrix}$

(13)
7.: Dynamic Range ⁻¹
It is beneficial for the later integration of the read-out electronics that the sensor work in a low dynamic range. Therefore, the span of the analyte segment of the measurement can be extracted as a feature.

$\begin{matrix} f_{7} (m) = max m_{a} - min m_{a} \end{matrix}$

(14)
8.: Power Consumption ⁻¹
The MOX sensors contain a heating element that needs to be heated up to a specific temperature. As mentioned before, the operating temperature has a big influence on the response and the power consumption of the sensor. A goal could be to minimize the power consumption by still maintaining a feasible response. The feature is the average of the heater voltage during the entire measurement. Let $v_{m}$ be the heater voltage values for measurement $m$ , if available.

$\begin{matrix} f_{8} (m) = v_{m} \end{matrix}$

(15)
9.: Signal to Noise Ratio (SNR)
To compare different sensors to each other, the ratio of signal strength to its baseline noise is a good indicator. To obtain the signal strength, the reaction phase of the measurement is segmented into rolling mean-valued windows of size 41 samples. Depending on the reaction type, the signal strength is then calculated with:

$\begin{matrix} \begin{matrix} p_{s} (m) = \{\begin{matrix} max a - z & if m_{a} has a positive - slope reaction, \\ z - min a & else, \end{matrix} \\ with z = \bar{m_{b} 〈 | m_{b} |, 40 〉} and a [i] = \bar{m_{a} 〈 i, 20 〉}, i \in N, 1 \leq i \leq | m_{a} | \end{matrix} \end{matrix}$

(16)

Finally, the SNR is calculated as:

$\begin{matrix} f_{9} (m) = \frac{p_{s} (m)}{s (m_{b}, \hat{m_{b}})} \end{matrix}$

(17)

Let

m_{k} \in M

be the measurement with the corresponding index

k \in N

for which is claimed

1 \leq k \leq | M |

. With the features defined in this section, a feature vector

g_{j}

for each feature is calculated.

\begin{matrix} g_{j} [k] = f_{j} (m_{k}) \end{matrix}

(18)

For further use, the features are min-max normalized and inverted, if needed. The final feature vector

f_{j}

for each feature is defined as:

\begin{matrix} f_{j} [k] = \{\begin{matrix} g_{j}^{'} [k] & if feature j does not need to be inverted, \\ 1 - g_{j}^{'} [k] & else . \end{matrix} \end{matrix}

(19)

3.3. Quantifiable and Individually Prioritizable Ranking Metric

To rank the sensors according to the selected features, an overall performance value for each measurement is calculated with

\begin{matrix} p [k] = \prod_{j = 1}^{| F |} p_{j} (f_{j} [k]) \end{matrix}

(20)

and the linear feature-specific priority function

p_{j} (x)

\begin{matrix} p_{j} (x) = ϕ_{j} \cdot x - ϕ_{j} + 1 with ϕ_{j} \in R, 0 \leq ϕ_{j} \leq 1 \end{matrix}

(21)

where the priority value

ϕ_{j}

can be chosen by the user for each feature to prioritize it individually during the calculation. To simplify things, we specified a set of five priority values, resembling the following priority levels:

\begin{matrix} Lowest : & ϕ_{j} = 0.1 \\ Lower : & ϕ_{j} = 0.3 \\ Normal : & ϕ_{j} = 0.5 \\ Higher : & ϕ_{j} = 0.7 \\ Highest : & ϕ_{j} = 0.9 \end{matrix}

Figure 2 shows the influence of

ϕ_{j}

for these predefined levels. It is important to understand how the priority value steers the influence of a feature within the performance indicator. With each feature value

f_{j} [k] < 1

involved in the product, the performance indicator

p [k]

for measurement

m_{k}

will decrease. This demotion capability is restricted by

p_{j}

to

1 - ϕ_{j} \leq p_{j} (f_{j} [k]) \leq 1

. Hence, a lower

ϕ_{j}

will give the feature a lower priority compared to features with a higher

ϕ_{j}

and vice versa.

Consider, for example, the worst measurement

m_{w}

for feature j with

f_{j} [w] = 0

; then, setting

ϕ_{j} = 1

demonstrates the feature’s full demotion influence on the performance indicator by annihilating

p [w]

completely:

\begin{matrix} p_{j} (f_{j} [w]) = 0 \Rightarrow p [w] = 0 \end{matrix}

(22)

By selecting

ϕ_{j} = 0.5

instead,

f_{j} [w]

is now only capable of decreasing the performance indicator for

m_{w}

to 0.5.

The min-max normalized performance indicator

p^{'}

now holds the respective performance value for each measurement, where

p^{'} [x] = 1

applies to the best overall performing measurement

m_{x}

for the selected feature set. The final ranking of the measurements can be achieved by sorting

p^{'}

.

4. Application Example, Results and Discussion

An important design target for mobile applications is to minimize power consumption. Because MOX gas sensors utilize a significant amount of power for heating their sensitive layer to a suitable working temperature, researchers are continuously trying to optimize substrate compositions that do not require high operating temperatures while still performing adequately for a specific application. In the following, we will therefore illustrate the suitability of the proposed algorithms for finding the most energy-efficient sensor for Acetone detection based on the data of the sensor screening described in Section 2.1. Initially, the software which was developed for this work will be briefly introduced as the platform used for the application example.

4.1. Software

To support a user in all tasks related to data processing and evaluation, graphic user interface (GUI) software, depicted in Figure 3, was developed. To display and navigate through the data, the GUI implements a tree based navigation with filtering functionality that is always visible on the left side of the software. To sort and structure the data, the measurements are hierarchically arranged top-down starting with the analytes, followed by the virtual sensors which group the associated measurements for the specific combination together.

The software is divided into tabs, according to the introduced algorithms: View + Manual Validation, Auto Validation and Sensor Ranking. Each tab encapsulates the controls and views needed for the respective use case. Depending on the active tab, the navigation is either used to browse through all measurements, choose a reference curve for the similarity algorithms or select the combinations of sensors and analytes for the ranking.

The filter enables the user to specify the following parameters:

Analyte
Sensor Substrate
Sinter Temperature
Sinter Time
Sensor Operating Temperature
Validation Status

Furthermore, the user can add measurements to a list of favorites or use the reference checkbox to obtain a list of all measurements that have been used as references in the curve similarity algorithm.

Visual inspection is realized with four interactive graph views divided into two subgroups. The upper graphs are used to display all measurements for the selected combination of virtual sensor and analyte, whereas the lower ones show the specific measurement selected in the navigation tree. The user is able to inspect the data by applying several filters and standardizations (e.g., first derivative, baseline normalized resistance, etc.).

After an in-depth inspection, a validity status can be assigned to the measurement manually by the user. A measurement can have three validation states:

Valid
Invalid
Not Validated

All measurements are initially in the Not Validated state. Manual validation and annotation is realized with four numbered radio buttons, a commentary field and two buttons. Remarks and textual annotations can be added to the Comment text field. In addition to the mentioned states, Reevaluate marks the measurement for later inspection, whereas Skip/Reset either resets its validation state back to Not Validated or skips the measurement if it is Not Validated. Care was taken to minimize the amount of clicks by adding keyboard shortcuts and effective tabbing. Using the shortcuts, the validation and textual comments are saved and the software automatically navigates to the next curve for inspection without any needed mouse interaction.

4.2. Automatic Measurement Validation

The first step before the measurements can be ranked is to remove those without useful information. This is done automatically as mentioned before by using established curve similarity metrics in conjunction with the presented slope-based signature calculation algorithm. The software implements several similarity methods, all of which compare two time series to each other. The complete functionality is encapsulated in a separate software tab and depicted in Figure 4.

All metrics calculate and assign a score to each measurement and afterwards rank them with descending similarity in the middle list (yellow rectangle in Figure 4). The user can inspect candidate curves and reference together in an interactive graph view (purple) and afterwards apply the final validation with the buttons and the following list selection: The measurements moved to the upper list (red) are set to invalid, the status of those in the middle list are not changed and finally the lower list (green) marks its contents as valid. The option Remove low SNR (blue) automatically proposes measurements as invalid that do not show enough signal amplitude by calculating the measurements signal to noise ratio (SNR) and comparing it to the threshold given in the spin box.

For validation of the proposed signature algorithm, we created a test subset including the runs 52, 261, 267, 343 and 374 and calculated the distance with respect to the reference run 270 for all available curve distance methods. The proposed slope-based signature algorithm yields the same signature + for the reference and all runs of the subset except for run 52, which was assigned the signature

- +

. In Figure 5 run 52 shows a significant drop and therefore a different slope characteristic compared to the other runs, which is represented by the signature value. Referring to Table 3, the calculated curve similarities based on the four curve distance methods listed in Section 3.1 reveal that run 52 has a very similar distance to the reference compared to at least one of the other runs for the respective method. If the signature would not be used to sort out run 52, it would be on the same similarity level as the other curves, ignoring the significant difference in slope characteristics.

4.3. Features and Raking

After the measurements are validated, the scenarios question needs the following features from the set introduced in Section 3.2 to find the most power-efficient sensor: Power, Sensitivity, Reaction Speed I, Reaction Speed II and Repeatability. The priorities were set as listed in Table 4.

The resulting list in Table 5 shows the most power-efficient sensor for the task of measuring Acetone and is depicted in Figure 6. The second and third best sensors are shown in Figure 7. Furthermore an exemplary midfield sensor and the worst sensor from the ranking can be found in Figure 8.

The first three sensors are very similar concerning their sensitivity (approximately 0.7) and reaction speed toward the analyte as shown in the baseline-normalized depiction in Figure 6 and Figure 7. Yet, the performance value of the sensor with the smallest power consumption of these three was chosen to be first due to the selected feature prioritization. To put the best sensor into perspective, a sensor from the midfield and the worst performing sensor of the ranking are depicted in Figure 8. While the midfield sensor is operated at the same temperature as the best sensor, it is demoted due to its lower sensitivity towards Acetone of only 0.48. The last and therefore worst sensor in the ranking delivers a much lower sensitivity of just 0.18 whilst consuming more power to operate at the higher temperature of 550

^{\circ} C

. It is therefore the least favorable choice for this particular scenario.

5. Conclusions

In this work algorithms for validating measurements and a feature-based sensor ranking have been presented. To address the challenge of automatic validation of the extensive screening data, a slope-based signature calculation has been proposed as an addition to established curve similarity metrics. Using the newly presented signature-extraction algorithm, curves that differ in slope (shape) are now much more clearly separated, which directly leads to much faster post-processing time for the measurement validation. For the other major challenge, a sensor performance ranking, a set of features and a ranking metric have been introduced. The features, obtained by interviews with experts in the domain of gas sensor screening, were first of all mathematically formalized and afterwards algorithms were implemented to extract and optionally normalize quantifiable information from the time series. The performance metric offers individual prioritization of features and allows to rank the measurements according to their overall performance on all features used.

Finally, the proposed algorithms were used to validate and rank various sensors in a large data set obtained during an extensive screening. It was shown that the additional use of the proposed slope-based signature delivers better results compared to the established curve distance methods that do not take slope characteristics into account. This new algorithm combination can help validate many measurements more efficiently. The ranking and feature extraction algorithms were tested by taking on the question of which sensor has the highest sensitivity towards a specific analyte under low-power constraints. A prioritization method for the quantifiable features was developed and implemented to be able to adapt the ranking to multiple scenarios of interest.

The software suite implemented for this work can be used as a solid foundation for future measurement campaigns, as it provides not only an extensible feature extraction, but also offers a structured storage model and can be used as a general management platform for screening data. Future goals are improving the outlier detection, extending and refining the current feature set and integrating the control and acquisition protocols for the automatized sensor screening into the software suite.

Author Contributions

C.H. is the lead author and was responsible for the data processing, software conception, the semi-automatic curve similarity implementation and writing of the final paper. S.S. is co-author of this work. He implemented and conceptualized the software and co-wrote the paper. J.W. designed and performed the measurements, he was interviewed as expert for the implemented feature set. P.K. and N.J. are the directors of the Institute of Safety and Security Research (ISF). P.K. was also interviewed as expert for the features. R.T. and N.J. are the referees in the PhD proceedings of C.H. All Authors contributed by providing advice, experimental guidance, project coordination and iterations of paper review. All authors have read and agreed to the published version of the manuscript.

Funding

This research received internal funding from the Institute of Safety and Security Research (ISF) at BRS-U.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Mandal, D.; Banerjee, S. Surface Acoustic Wave (SAW) Sensors: Physics, Materials and Applications. Sensors 2022, 22, 820. [Google Scholar] [CrossRef] [PubMed]
Binions, R.; Naik, A. 13-Metal oxide semiconductor gas sensors in environmental monitoring. In Semiconductor Gas Sensors; Jaaniso, R., Tan, O.K., Eds.; Woodhead Publishing Series in Electronic and Optical Materials; Woodhead Publishing: Cambridge, UK, 2013; pp. 433–466. [Google Scholar] [CrossRef]
Yaqoob, U.; Younis, M. Chemical Gas Sensors: Recent Developments, Challenges and the Potential of Machine Learning—A Review. Sensors 2021, 21, 2877. [Google Scholar] [CrossRef] [PubMed]
Hu, W.; Wan, L.; Jian, Y.; Ren, C.; Jin, K.; Su, X.; Bai, X.; Haick, H.; Yao, M.; Wu, W. Electronic Noses: From Advanced Materials to Sensors Aided with Data Processing. Adv. Mater. Technol. 2018, 4, 1800488. [Google Scholar] [CrossRef]
Zhang, J.; Qin, Z.; Zeng, D.; Xie, C. Metal-oxide-semiconductor based gas sensors: Screening, preparation, and integration. Phys. Chem. Chem. Phys. 2017, 19, 6313–6329. [Google Scholar] [CrossRef] [PubMed]
Hammer, C.; Warmer, J.; Sporrer, S.; Kaul, P.; Thoelen, R.; Jung, N. A Compact, Reliable and Efficient 16 Channel Power Supply for the Automated Screening of Semiconducting Metal Oxide Gas Sensors. Electronics 2019, 8, 882. [Google Scholar] [CrossRef]
Hammer, C.; Warmer, J.; Maurer, S.; Kaul, P.; Thoelen, R.; Jung, N. A Compact 16 Channel Embedded System with High Dynamic Range Readout and Heater Management for Semiconducting Metal Oxide Gas Sensors. Electronics 2020, 9, 1855. [Google Scholar] [CrossRef]
Leo, M.; Distante, C.; Bernabei, M.; Persaud, K. An Efficient Approach for Preprocessing Data from a Large-Scale Chemical Sensor Array. Sensors 2014, 14, 17786–17806. [Google Scholar] [CrossRef] [PubMed]
Morati, N.; Contaret, T.; Seguin, J.; Bendahan, M.; Djedidi, O.; Djeziri, M. Data Analysis-Based Gas Identification with a Single Metal Oxide Sensor Operating in Dynamic Temperature Regime. In Proceedings of the ALLSENSORS 2020, the Fifth International Conference on Advances in Sensors, Actuators, Metering and Sensing, Valencia, Spain, 21–25 November 2020; pp. 20–23. Available online: https://hal-amu.archives-ouvertes.fr/hal-02575436 (accessed on 23 September 2022).
Bastuck, M.; Baur, T.; Schütze, A. DAV3E—A MATLAB toolbox for multivariate sensor data evaluation a MATLAB toolbox for multivariate sensor data evaluation. J. Sens. Sens. Syst. 2018, 7, 489–506. [Google Scholar] [CrossRef]
Djelouat, H.; Ait Si Ali, A.; Amira, A.; Bensaali, F. An interactive software tool for gas identification. J. Nat. Gas Sci. Eng. 2018, 55, 6129–6624. [Google Scholar] [CrossRef] [Green Version]
Zhang, G.; Xie, C.; Zhang, S.; Zhao, J.; Lei, T.; Zeng, D. Temperature-Programmed Technique Accompanied with High-Throughput Methodology for Rapidly Searching the Optimal Operating Temperature of MOX Gas Sensors. ACS Comb. Sci. 2014, 16, 459–465. [Google Scholar] [CrossRef] [PubMed]
Umweltsensortechnik GmbH: Technical Information for Calculating the Sensor Temperature. Available online: https://www.umweltsensortechnik.de/fileadmin/assets/downloads/gassensoren/single/TechInfo_MOX-gas-sensors_Calculation_of_the_operating_temperature_Rev2204.pdf (accessed on 25 May 2022).
Guthrie, W. NIST/SEMATECH e-Handbook of Statistical Methods (NIST Handbook 151). National Institute of Standards; 2020; Chapter 5.5.9.9. Available online: https://www.itl.nist.gov/div898/handbook/pri/section5/pri599.htm (accessed on 21 September 2022).
Savitzky, A.; Golay, M. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Anal. Chem. 1964, 36, 1627–1639. [Google Scholar] [CrossRef]
Wannesm, K.; Yurtman, A.; Robberechts, P.; Vohl, D.; Ma, E.; Verbruggen, G.; Rossi, M.; Shaikh, M.; Yasirroni, M.; Todd, Z.W.; et al. Wannesm/Dtaidistance: v2.3.5; Zenodo: Genève, Switzerland, 2022. [Google Scholar] [CrossRef]
Jekel, C.; Venter, G.; Venter, M.; Stander, N.; Haftka, R. Similarity measures for identifying material parameters from hysteresis loops using inverse analysis. Int. J. Mater. Form. 2019, 12, 355–378. [Google Scholar] [CrossRef]
Hubble, L.; Chow, E.; Cooper, J.; Webster, M.; Müller, K.; Wieczorek, L.; Raguse, B. Gold nanoparticle chemiresistors operating in biological fluids. Lab Chip 2012, 12, 3040–3048. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Yin, L.; Zhang, L.; Xiang, D.; Gao, R. Metal Oxide Gas Sensors: Sensitivity and Influencing Factors. Sensors 2010, 10, 2088–2106. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Measurement of a Cr₂O₃ sensor, sintered with

800^{\circ} C

for 720 min, operated at

450^{\circ} C

, exposed to 5 ppm Acetone. The section between the dotted lines is the analyte exposition. The slightly toned down color before and after the analyte are the baseline and clearing parts of the measurement.

Figure 1. Measurement of a Cr₂O₃ sensor, sintered with

800^{\circ} C

for 720 min, operated at

450^{\circ} C

, exposed to 5 ppm Acetone. The section between the dotted lines is the analyte exposition. The slightly toned down color before and after the analyte are the baseline and clearing parts of the measurement.

Figure 2. Graphs of boosting function

p_{j}

for different

ϕ_{j}

.

Figure 2. Graphs of boosting function

p_{j}

for different

ϕ_{j}

.

Figure 3. The software showing the View + Manual Validation tab.

Figure 4. Exemplary use of the auto validation function. Validation of all CO measurements performed with In₂O₃-based sensors. The blue curve is the user-supplied reference, the green curves are the candidates as selected in the middle (yellow rectangle) and lower list (green rectangle). The upper list (red rectangle) holds all measurements that are greater than the selected reference threshold (light blue rectangle) and are therefore sorted out.

Figure 5. Curve similarity of a subset of test curves with respect to the reference run 270 (blue). The curve of run 54 (red) yields a different signature (

- +

) to the other curves (green) (+) which is therefore filtered out although it has very similar distances (see Table 3) to the remaining curves.

Figure 5. Curve similarity of a subset of test curves with respect to the reference run 270 (blue). The curve of run 54 (red) yields a different signature (

- +

) to the other curves (green) (+) which is therefore filtered out although it has very similar distances (see Table 3) to the remaining curves.

Figure 6. The final ranking, showing the measurement belonging to the best sensor for the given problem.

Figure 7. Second and third place of the final ranking.

Figure 8. Midfield place of worst sensor in the ranking.

Table 1. Basic structure of an individual measurement.

Segment	Duration	Action
(B) Baseline	5 min	Get sensor value in synthetic air before analyte
(A) Analyte	20 min	Get sensor value during gas exposition
(C) Clearing	120 min	Flush sensor and piping with synthetic air

Table 2. Summary of notation conventions for this work.

Notation	Meaning
$x$	Vector.
$\| x \|$	Amount of elements in Vector $x$
$x [i]$	Element with index i.
$x 〈 i, w 〉$	Vector defined by window of size w around index i
$x^{'}$	Min-max normalized vector elements
$\dot{x}$	First derivative
$\bar{x}$	Arithmetic mean of all vector elements
$\tilde{x}$	Savitzky Golay [15] filtered vector elements
$\hat{x}$	Linear regression model, based on index and elements of $x$
$s (x, r)$	Residual standard deviation
$m$	Complete measurement with all segments
$m_{a}, m_{b}, m_{c}$	Analyte, baseline and clearing segment
M	Set of all measurements of a sensor screening

Table 3. Distance values for the four implemented distance metrics for the runs in the selected test set, compared to the reference run 270.

Run	Method	Distance to 270	Distance of 52 to 270
261	PCM	68.82963	68.62570
267	Area	268.05069	264.02023
343	Fréchet	0.62149	0.62126
374	Point	0.23268	0.22901

Table 4. Priorities of the features used for the exemplary ranking.

Feature		Priority		Comment
Power	( $f_{8}$ )	Highest	( $ϕ_{8} = 0.9$ )	Cooler sensors need less power.
Sensitivity	( $f_{1}$ )	High	( $ϕ_{1} = 0.7$ )	Better for small amounts of the gas.
Reaction Speed I	( $f_{2}$ )	Normal	( $ϕ_{2} = 0.5$ )	Hot sensors have higher speeds.
Reaction Speed II	( $f_{3}$ )	Normal	( $ϕ_{3} = 0.5$ )	$ϕ_{3}, ϕ_{2} = 0.5$ are a good trade of.
Repeatability	( $f_{5}$ )	Normal	( $ϕ_{5} = 0.5$ )	Consider sensor stability ov. time.

Table 5. The top 3, midfield and worst sensors from the available validated dataset ranked according to the most power-efficient (coldest sensor operation) detection of the analyte Acetone.

Rank	$p^{'}$	Substrate	Sinter Temp. ( $^{\circ} C$ )	Sinter Time (minutes)	Op. Temp. ( $^{\circ} C$ )	Sensitivity (Arb. Units)
1	1	Cr₂O₃	1000	1140	350	0.77
2	0.95	Cr₂O₃	900	10	400	0.79
3	0.86	Cr₂O₃	1000	1440	400	0.66
⋯	⋯	⋯	⋯	⋯	⋯	⋯
240	0.4	Cr₂O₃	700	720	350	0.48
⋯	⋯	⋯	⋯	⋯	⋯	⋯
480	0	Cr₂O₃	1000	10	550	0.18

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hammer, C.; Sporrer, S.; Warmer, J.; Kaul, P.; Thoelen, R.; Jung, N. Algorithms for Automatic Data Validation and Performance Assessment of MOX Gas Sensor Data Using Time Series Analysis. Algorithms 2022, 15, 360. https://doi.org/10.3390/a15100360

AMA Style

Hammer C, Sporrer S, Warmer J, Kaul P, Thoelen R, Jung N. Algorithms for Automatic Data Validation and Performance Assessment of MOX Gas Sensor Data Using Time Series Analysis. Algorithms. 2022; 15(10):360. https://doi.org/10.3390/a15100360

Chicago/Turabian Style

Hammer, Christof, Sebastian Sporrer, Johannes Warmer, Peter Kaul, Ronald Thoelen, and Norbert Jung. 2022. "Algorithms for Automatic Data Validation and Performance Assessment of MOX Gas Sensor Data Using Time Series Analysis" Algorithms 15, no. 10: 360. https://doi.org/10.3390/a15100360

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Algorithms for Automatic Data Validation and Performance Assessment of MOX Gas Sensor Data Using Time Series Analysis

Abstract

1. Introduction

2. Related Work and Data Origin

2.1. Sensor Screening Method

2.2. Preprocessing

3. Algorithms for Validation and Ranking

3.1. Slope-Based Curve Signature

3.2. Feature Extraction

3.3. Quantifiable and Individually Prioritizable Ranking Metric

4. Application Example, Results and Discussion

4.1. Software

4.2. Automatic Measurement Validation

4.3. Features and Raking

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI