# Unsupervised Transformer-Based Anomaly Detection in ECG Signals

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. Materials and Methods

#### 3.1. ECG Time Series Data

#### 3.1.1. The ECG5000 Dataset

#### 3.1.2. The MIT-BIH Arrhythmia Database

#### Preprocessing

- Median Filter: We used a median filter with a 200-ms sliding window. Then, using a 600-ms window, we applied a second median filter. The baseline of the raw signals was contained in the second filter’s output. The second filter output was subtracted from the unprocessed ECG data to eliminate the baseline wander (see Figure 2). This step enhanced the baseline correction and eliminated some artifacts [34].
- Heartbeat Extraction: This entails picking a neighborhood around each beat. This interval was estimated using R-peak annotation with ±50 ms before and after the beat.

#### 3.2. Proposed Unsupervised Transformer Architecture

_{k}. Before calculating attention, multihead attention maps Q, K and V onto various lower-dimensional feature subspaces using various linear dense layers. Using an additional dense layer, the outputs from h heads are concatenated and projected onto a final hidden representation, as shown below:

_{i}is the ith hidden state produced by the multi-head attention layer, ${\mathrm{W}}_{1}$ and ${\mathrm{W}}_{2}$ are weight matrices, and b

_{1}and b

_{2}are bias terms of the inner and output dense layers, respectively.

#### 3.3. Anomaly Score and Threshold

## 4. Results and Discussion

#### 4.1. Experimental Setup

#### 4.2. Performance Metrics

- AUC: The AUC is computed by building the receiver operating characteristic (ROC) curve based on the false positive (FP) and the true positive (TP).

#### 4.3. ECG 5000 Dataset Results

#### 4.4. MIT-BIH Arrhythmia Dataset Results

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Chatterjee, A.; Ahmed, B.S. IoT anomaly detection methods and applications: A survey. Internet Things
**2022**, 19, 100568. [Google Scholar] [CrossRef] - Li, H.; Boulanger, P. Structural Anomalies Detection from Electrocardiogram (ECG) with Spectrogram and Handcrafted Features. Sensors
**2022**, 22, 2467. [Google Scholar] [CrossRef] - Schmidl, S.; Wenig, P.; Papenbrock, T. Anomaly Detection in Time Series: A Comprehensive Evaluation. Proc. VLDB Endow.
**2022**, 15, 1779–1797. [Google Scholar] [CrossRef] - Mehrotra, K.G.; Mohan, C.K.; Huang, H. Introduction. Anomaly Detection Principles and Algorithms; TSC; Springer: Cham, Switzerland, 2017; pp. 3–19. [Google Scholar] [CrossRef]
- Karim, F.; Majumdar, S.; Darabi, H.; Chen, S. LSTM Fully Convolutional Networks for Time Series Classification. IEEE Access
**2017**, 6, 1662–1669. [Google Scholar] [CrossRef] - Ariyaluran Habeeb, R.A. Clustering-based real-time anomaly detection—A breakthrough in big data technologies. Trans. Emerg. Telecommun. Technol.
**2022**, 33, 8. [Google Scholar] [CrossRef] - Thudumu, S.; Branch, P.; Jin, J.; Jack Singh, J. A comprehensive survey of anomaly detection techniques for high dimensional big data. J. Big Data
**2020**, 7, 42. [Google Scholar] [CrossRef] - Contreras, J.; Espinola, R.; Nogales, F.J.; Conejo, A.J. ARIMA models to predict next-day electricity prices. IEEE Trans. Power Syst.
**2003**, 18, 1014–1020. [Google Scholar] [CrossRef] - Gao, J.; Liang, F.; Fan, W.; Wang, C.; Sun, Y.; Hann, J. On community outliers and their efficient detection in information networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 25–28 July 2010; pp. 813–822. [Google Scholar]
- Barnett, V.; Lewis, T. Outliers in Statistical Data. Wiley Series in Probability and Mathematical Statistics. In Applied Probability and Statistics, 2nd ed.; Wiley: Chichester, UK, 1984. [Google Scholar]
- Audibert, J.; Michiardi, P.; Guyard, F.; Marti, S.; Zuluaga, M.A. USAD: UnSupervised Anomaly Detection on Multivariate Time Series. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Virtual, 6–10 July 2020; pp. 3395–3404. [Google Scholar] [CrossRef]
- Kaur, H.; Singh, G.; Minhas, J. A review of machine learning based anomaly detection techniques. arXiv
**2013**, arXiv:1307.7286. [Google Scholar] [CrossRef] - Laxhammar, R. Anomaly Detection. Conform. Predict. Reliab. Mach. Learn. Theory Adapt. Appl.
**2014**, 14, 71–97. [Google Scholar] [CrossRef] - Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survy. ACM Comput. Surv.
**2009**, 41, 1–58. [Google Scholar] [CrossRef] - Ejay, N.; Oluwarotimi, W.S.; Mojisola, G.A.; Guanglin, L. Intelligence Combiner: A Combination of Deep Learning and Handcrafted Features for an Adolescent Psychosis Prediction using EEG Signals. In Proceedings of the 2022 IEEE International Workshop on Metrology for Industry 4.0 & IoT (MetroInd4.0&IoT), Trento, Italy, 7–9 June 2022. [Google Scholar] [CrossRef]
- Chen, Z.; Yeo, C.K.; Lee, B.S.; Lau, C.T. Autoencoder-based network anomaly detection. In Proceedings of the Wireless Telecommunications Symposium, Phoenix, AZ, USA, 17–20 April 2018; pp. 1–5. [Google Scholar] [CrossRef]
- Nanduri, A.; Sherry, L. Anomaly detection in aircraft data using Recurrent Neural Networks (RNN). In Proceedings of the ICNS 2016: Securing an Integrated CNS System to Meet Future Challenges, Herndon, VA, USA, 19–21 April 2016; pp. 1–8. [Google Scholar] [CrossRef]
- Malhotra, P.; Ramakrishnan, A.; Anand, G.; Vig, L.; Agarwal, P.; Shroff, G. LSTM-based Encoder-Decoder for Multi-Sensor Anomaly Detection. arXiv
**2016**, arXiv:1607.00148. [Google Scholar] - Lu, L.; Krause, B.; Murray, I.; Renals, S. Multiplicative LSTM for sequence modelling. arXiv
**2017**, arXiv:1609.07959. [Google Scholar] - Vaswani, I.P.A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5999–6009. [Google Scholar]
- Chauhan, S.; Vig, L. Anomaly detection in ECG time signals via deep long short-term memory networks. In Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, Paris, France, 19–21 October 2015. [Google Scholar] [CrossRef]
- Sugimoto, K.; Lee, S.; Okada, Y. Deep learning-based detection of periodic abnormal waves in ECG data. Lect. Notes Eng. Comput. Sci.
**2018**, 2233, 35–39. [Google Scholar] - Malhotra, P.; Vig, L.; Shroff, G.; Agarwal, P. Long Short Term Memory networks for anomaly detection in time series. In Proceedings of the 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2015, Bruges, Belgium, 22–23 April 2015; pp. 89–94. [Google Scholar]
- Zhu, G.; Zhao, H.; Liu, H.; Sun, H. A Novel LSTM-GAN Algorithm for Time Series Anomaly Detection. In Proceedings of the 2019 Prognostics and System Health Management Conference, PHM-Qingdao 2019, Qingdao, China, 25–27 October 2019. [Google Scholar] [CrossRef]
- Schlegl, T.; Seeböck, P.; Waldstein, S.M.; Schmidt-Erfurth, U.; Langs, G. Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery; Springer: Cham, Switzerland, 2017. [Google Scholar]
- Thill, M.; Däubener, S.; Konen, W.; Bäck, T. Anomaly Detection in Electrocardiogram Readings with Stacked LSTM Networks. In Proceedings of the 19th Conference Information Technologies—Applications and Theory (ITAT 2019), Donovaly, Slovakia, 20–24 September 2019. [Google Scholar]
- Xu, L. TGAN-AD: Transformer-Based GAN for Anomaly Detection of Time Series Data. Appl. Sci.
**2022**, 12, 8085. [Google Scholar] [CrossRef] - Chen, Z.; Chen, D.; Zhang, X.; Yuan, Z.; Cheng, X. Learning Graph Structures with Transformer for Multivariate Time Series Anomaly Detection in IoT. IEEE Internet Things J.
**2021**, 9, 9179–9189. [Google Scholar] [CrossRef] - Rui, H.; Jie, C.; Li, Z. A transformer-based deep neural network for arrhythmia detection using continuous ECG signals. Comput. Biol. Med.
**2022**, 144, 105325. [Google Scholar] - PhysioBank PhysioToolkit. PhysioNet: Components of a new research resource for complex physiologic signals. Circulation
**2000**, 101, 215–220. [Google Scholar] - Moody, G.B.; Mark, R.G. The impact of the MIT-BIH arrhythmia database. IEEE Eng. Med. Biol. Mag.
**2001**, 20, 45–50. [Google Scholar] [CrossRef] - Chen, Y.; Hao, Y.; Rakthanmanon, T.; Zakaria, J.; Hu, B.; Keogh, E. A general framework for never-ending learning from time series streams. Data Min. Knowl. Discov.
**2015**, 29, 1622–1664. [Google Scholar] [CrossRef][Green Version] - Luz, E.J.D.S.; Schwartz, W.R.; Cámara-Chávez, G.; Menotti, D. ECG-based heartbeat classification for arrhythmia detection: A survey. Comput. Methods Programs Biomed.
**2016**, 127, 144–164. [Google Scholar] [CrossRef] - Lee, M.; Song, T.-G.; Lee, J.-H. Heartbeat classification using local transform pattern feature and hybrid neural fuzzy-logic system based on self-organizing map. Biomed. Signal Process. Control.
**2020**, 57, 101690. [Google Scholar] [CrossRef] - He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 630–645. [Google Scholar]
- Lei Ba, J.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv
**2016**, arXiv:1607.06450. [Google Scholar] - Wang, Y.; Lu, W. Analysis of the mean absolute error (MAE) and the root mean square error (RMSE) in assessing rounding model. IOP Conf. Ser. Mater. Sci. Eng.
**2018**, 324, 012049. [Google Scholar] [CrossRef] - Van Rossum, G.; Drake, F.L. Python 3 Reference Manual:(Python Documentation Manual Part 2); CreateSpace: Scotts Valley, CA, USA, 2009. [Google Scholar]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv
**2016**, arXiv:1603.04467. [Google Scholar] - Chollet, F. Keras. Available online: https://github.com/fchollet/keras (accessed on 12 January 2015).
- Pereira, J.; Silveira, M. Learning Representations from Healthcare Time Series Data for Unsupervised Anomaly Detection. In Proceedings of the 2019 IEEE International Conference on Big Data and Smart Computing, BigComp 2019, Kyoto, Japan, 27 February–2 March 2019. [Google Scholar] [CrossRef]
- Matias, P.; Folgado, D.; Gamboa, H.; Carreiro, A.V. Robust anomaly detection in time series through variational AutoEncoders and a local similarity score. In Proceedings of the BIOSIGNALS 2021—14th International Conference on Bio-Inspired Systems and Signal Processing; Part of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2021, Virtual, 11–13 February 2021; Volume 4, pp. 91–102. [Google Scholar] [CrossRef]
- Oluwasanmi, A.; Aftab, M.U.; Baagyere, E.; Qin, Z.; Ahmad, M.; Mazzara, M. Attention Autoencoder for Generative Latent Representational Learning in Anomaly Detection. Sensors
**2021**, 22, 123. [Google Scholar] [CrossRef] [PubMed] - Khandual, A.; Dutta, K.; Lenka, R.; Nayak, S.R.; Bhoi, A.K. MED-NET: A novel approach to ECG anomaly detection using LSTM auto-encoders. Int. J. Comput. Appl. Technol.
**2021**, 65, 343. [Google Scholar] [CrossRef] - Sivapalan, G.; Nundy, K.K.; Dev, S.; Cardiff, B.; John, D. ANNet: A Lightweight Neural Network for ECG Anomaly Detection in IoT Edge Sensors. IEEE Trans. Biomed. Circuits Syst.
**2022**, 16, 24–35. [Google Scholar] [CrossRef]

**Figure 4.**Illustration of the scaled dot-product attention (

**left**) and multi-head attention consisting of several attention layers running in parallel (

**right**).

**Figure 6.**(

**a**) Anomalous heartbeat (blue) with corresponding predicted heartbeat (orange); (

**b**) normal heartbeat (blue) with corresponding predicted heartbeat (orange).

**Figure 8.**(

**a**) Anomalous heartbeat (blue) with corresponding predicted heartbeat (orange); (

**b**) normal heartbeat (blue) with corresponding predicted heartbeat (orange).

Dataset | Normal | Anomalous | Total |
---|---|---|---|

Train data | 2335 | 0 | 2335 |

Validation data | 292 | 0 | 292 |

Test data | 266 | 234 | 500 |

Dataset | Normal | Anomalous | Total |
---|---|---|---|

Training data | 12,045 | 0 | 12,045 |

Validation data | 3012 | 0 | 3012 |

Test data | 3767 | 2115 | 5882 |

No. Encoder Blocks | No. Heads | Hidden Size | F1 | Accuracy | Recall | Precision |
---|---|---|---|---|---|---|

1 | 16 | 32 | 96.7% | 96.8% | 97.7% | 96.2% |

1 | 16 | 64 | 97.6% | 97.6% | 96.9% | 98.4% |

1 | 16 | 128 | 98% | 98% | 96.9% | 99.2% |

1 | 16 | 256 | 98.2% | 98.2% | 96.6% | 100% |

1 | 32 | 32 | 98.4% | 98.4% | 96.9% | 100% |

1 | 32 | 64 | 97% | 97% | 96.6% | 97.7% |

1 | 32 | 128 | 98.4% | 98.4% | 97.7% | 99.2% |

1 | 32 | 256 | 98.8% | 98.8% | 97.7% | 100% |

2 | 16 | 32 | 98.2% | 98.2% | 97.3% | 99.2% |

2 | 16 | 64 | 98.4% | 98.4% | 97.3% | 96.6% |

2 | 16 | 128 | 98.6% | 98.6% | 97.3% | 100% |

2 | 16 | 256 | 98.4% | 98.4% | 97.3% | 99.6% |

2 | 32 | 32 | 98.2% | 98.2% | 96.9% | 99.6% |

2 | 32 | 64 | 98.6% | 98.6% | 97.3% | 100% |

2 | 32 | 128 | 99% | 99% | 98.1% | 100% |

2 | 32 | 256 | 98.2% | 98.2% | 96.9% | 99.6% |

Model | S/U | Accuracy | Recall | Precision | F1-Score |
---|---|---|---|---|---|

Hierarchical [41] | U | 95.5% | 94.6% | 95.8% | 94.6% |

Spectral [41] | U | 95.8% | 95.1% | 94.7% | 94.7% |

VRAE + Wasserstein [41] | U | 95.1% | 94.6% | 94.6% | 94.6% |

VRAE + k-Means [41] | U | 95.9% | 95.3% | 95.4% | 95.2% |

VAE [42] | U + S | 96.8% | — | — | 95.7% |

VAE [43] | S | 95.2% | 92.5% | 98.4% | 95.4% |

AE-Without-Attention [43] | S | 97% | 95.5% | 98.8% | 97.1% |

CAT-AE [43] | S | 97.2% | 95.6% | 99.2% | 97.4% |

LSTM AE [44] | U | 97.93% | — | — | — |

This work | U | 99% | 98.1% | 100% | 99% |

No. Encoder Blocks | No. Heads | Hidden Size | F1 | Accuracy | Recall | Precision |
---|---|---|---|---|---|---|

1 | 16 | 32 | 91.6% | 88.6% | 97.5% | 86.3% |

1 | 16 | 64 | 91.9% | 89% | 98.3% | 86.3% |

1 | 16 | 128 | 91.8% | 88.7% | 98.1% | 86.2% |

1 | 16 | 256 | 91.6% | 88.4% | 98.4% | 85.6% |

1 | 32 | 32 | 92.1% | 89.2% | 98.3% | 86.6% |

1 | 32 | 64 | 92.1% | 89.2% | 98.3% | 86.6% |

1 | 32 | 128 | 92% | 89.1% | 98.2% | 86.6% |

1 | 32 | 256 | 91.73% | 88.6% | 98.4% | 85.9% |

2 | 16 | 32 | 92.1% | 89.2% | 98.2% | 86.69% |

2 | 16 | 64 | 91.8% | 88.8% | 98% | 86.1% |

2 | 16 | 128 | 91.8% | 88.8% | 98.5% | 86% |

2 | 16 | 256 | 91.71% | 88.6% | 98.4% | 85.8% |

2 | 32 | 32 | 92.2% | 89.4% | 98% | 87.1% |

2 | 32 | 64 | 92.31% | 89.5% | 98.2% | 87.1% |

2 | 32 | 128 | 91.8% | 88.7% | 98.4% | 86% |

2 | 32 | 256 | 91.8% | 88.8% | 98.6% | 85.99% |

Model | S/U | Dataset Splitting | F1 | Accuracy | Recall (Sensitivity) | Precision |
---|---|---|---|---|---|---|

Stacked LSTM [26] | U | 80% training, 20% testing | 81% | - | 87% | 82% |

(LSTM) with (MLP) [45] | S | 70% training, 30% testing | 87% | 95% | 75% | - |

VAE [42] | U | AAMI Dataset splitting | 76.55% | 87.77% | - | - |

This work | U | 80% training, 20% testing | 92.3% | 89.5% | 98.2% | 87.1% |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Alamr, A.; Artoli, A.
Unsupervised Transformer-Based Anomaly Detection in ECG Signals. *Algorithms* **2023**, *16*, 152.
https://doi.org/10.3390/a16030152

**AMA Style**

Alamr A, Artoli A.
Unsupervised Transformer-Based Anomaly Detection in ECG Signals. *Algorithms*. 2023; 16(3):152.
https://doi.org/10.3390/a16030152

**Chicago/Turabian Style**

Alamr, Abrar, and Abdelmonim Artoli.
2023. "Unsupervised Transformer-Based Anomaly Detection in ECG Signals" *Algorithms* 16, no. 3: 152.
https://doi.org/10.3390/a16030152