1. Introduction
The heart is a muscle that pumps blood throughout the body, and contracts rhythmically. The atrial sine node, which functions as a natural pacemaker, initiates this contraction, which then spreads to the rest of the muscle. There is a pattern to the way that this electrical pulse spreads [
1]. This action causes fluctuations in the skin’s surface’s electrical potential by producing electric currents on the body’s surface. Electrodes and proper tools can be used to record or measure these signals, known as an electrocardiogram (ECG) [
2].
An ECG signal is composed of three major components explained in
Figure 1 [
3]: P-wave; QRS complex, which contains three waves, i.e., Q, R, and S; and the T-wave [
4]. The P-wave is a small flexure wave indicating atrial depolarization, ventricular depolarization is represented by the QRS complex, and the T-wave is indicative of ventricular repolarization (atrial repolarization is hidden by the large QRS complex) [
5]. The amplitudes and frequencies of these waves are shown in
Table 1 below [
4,
6].
When there is no disease or abnormality in the waveform of the ECG signal, the heart’s regular rhythm is known as a normal sinus rhythm (NSR). Typically, the heart rate of NSRs ranges from 60 to 100 beats per minute. The breathing cycle causes a small change in the R-R interval’s regularity. Sinus tachycardia is the name for the rhythm when the heart rate rises above 100 beats per minute and the R-R interval decreases. This is the heart’s normal response to the need for increased blood circulation; it is not an arrhythmia. However, overly rapid heartbeats result in incomplete filling of the ventricles before contraction, which lowers pumping effectiveness and negatively impacts perfusion. Bradycardia, which occurs when the heartbeat is extremely slow, can have a significant negative impact on important organs and the heart rate drops down to 60 beats per minute, and the R-R interval increases [
7].
The paper is organized as follows:
Section 2 provides details about the state-of-the-art-related works.
Section 3 presents an explanation of the used dataset, the preprocessing and segmentation of the ECG, iris spectrogram, and scalogram.
Section 4 shows the results, including the performance of the proposed CNN classifiers, and the discussion about the proposed method results. Finally,
Section 5 represents the conclusion of the work.
2. Literature Review
Classification of normal and arrhythmia-associated ECG is an important goal to achieve better detection and proper identification of various cardiovascular diseases (CVDs). However, the small amplitude and duration of the ECG arrhythmia can make it difficult to classify. With the rise of deep learning techniques, several recent studies have used very deep networks for ECG classification. Here, we will attempt to detail the latest related works using time-frequency methods for ECG classification.
Rashed Al-Mahfuz et al. proposed a novel ECG beat classifier using a customized VGG16-based Convolution Neural Network (CNN) with two advanced time-frequency representation techniques, Continuous Wavelet Transforms (CWT) and Hilbert-Huang transform (HHT), to identify the best time-frequency representation of ECG beats. The proposed adopted CNN with CWT scalogram achieved 100% classification accuracies on MIT-BIH arrhythmia database for 2–4 classes and 99.90% for 5 classes, and the CWT scalogram outperformed the HHT spectrum in all the cases [
8].
Swain et al. introduced an automated identification of myocardial infarction (MI) using a modified Stockwell transform (MST)-based time-frequency analysis and a phase information distribution pattern method. Both healthy and MI ECG signals are collected from the PTB diagnostic ECG database with 12 lead ECG signals; the results of the proposed method can detect the MI successfully with an accuracy, sensitivity, and specificity of 99.93%, 99.97%, and 99.30% respectively [
9]. Additionally, Lekhal et al. introduced an ECG beat classifier system based on features observed in time–frequency analysis using a variant of the Stockwell transform, and then the SVM with asymmetric costs (AS3VM) was applied for assessment of the feature performance. The proposed method has been evaluated on the MIT-BIT arrhythmia database, using four types: normal beats (N), left and right bundle branch blocks (L and R), and premature ventricular contractions (V). The obtained results show accuracies of 99.35%, 98.73%,98.57%, and 99.44% respectively, for N, L, R, and V beats [
10].
However, a suitable method for telemedicine systems provided by Kayikcioglu et al. to classify ST segment using time-frequency distribution based on features from multi-lead ECG signals of four-class and tested them on three different databases, MIT-BIH Arrhythmia database, European ST-T database, and Long-Term ST database. The weighted k-NN algorithm achieved the best average performance with an accuracy of 94.23%, a sensitivity of 95.72%, and a specificity of 98.15% using the Choi–Williams time-frequency distribution features, in addition to the other classification algorithms SVM and Ensemble [
11].
Kłosowski et al. proposed an effective method for ECG classification using the deep neural long-short-term memory (LSTM) network and feature extraction consists of converting the ECG signal into a series of spectral images using short-term Fourier transformation. Then, the images were converted using Fourier transform again to two signals, which include instantaneous frequency and spectral entropy, which are used to train the LSTM network [
12].
In 2021, Wang et al. provided a simple and accurate method, which can be used as a clinical auxiliary diagnostic tool, and is an automatic ECG classification method based on Continuous Wavelet Transform (CWT) to obtain different time-frequency components and Convolutional Neural Network (CNN) to extract features from the 2D scalogram composed of the time-frequency components. The method achieved an accuracy of 98.74%, a sensitivity of 67.47%, an F1-score of 68.76%, which compared with existing methods is increased by 4.75~16.85%, and a positive predictive value of 70.75% [
13].
In the same year, Hussein et al. presented a novel method to extract ST and PR features from the Choi–Williams time-frequency distribution proposed for myocardial ischemia identification. With the use of these extracted features, a multi-class SVM classifier is trained to detect unknown circumstances and assess whether they are ischemic or normal. Improved detection performance is the result of using multi-lead ECG for classification and 1 min intervals rather than beats or frames. The proposed strategy produced a final result that had an overall accuracy, sensitivity, and specificity of 99.09%, 99.49%, and 98.44%, respectively [
14].
Furthermore, in 2022, Alqudah et al. published a paper in which they present a method that is efficient, simple, fast, and deployable on mobile devices. A deep learning methodology was developed to detect up to 17 classes of cardiac arrhythmia based on analyzing a single ECG beat and calculating the iris spectrogram to feed the convolutional neural network. The results show that the proposed methodology has an overall recognition accuracy of 99.13% ± 0.25, 98.223% ± 0.85, and 97.494% ± 1.26 for 13, 15, and 17 arrhythmia classes, respectively. The training/testing is performed using tenfold cross-validation [
15].
Faraget et al. provided a short-time Fourier Transform (STFT) Convolutional Neural Network (CNN) model for ECG classification in real-time at the edge. To extract the spectrogram from the input ECG signal, they developed an STFT-based 1D convolutional (Conv1D) layer and then reshaped it into a 2D heat-map image to feed the 2D convolutional (Conv2D) neural network (CNN) for classification. The proposed classifier achieved 99.1% accuracy and a 95% F1-score at the edge with a maximum model size of 90 KB, an average inference time of 9 ms, and a maximum memory usage of 12 MB [
16].
This study aims to propose a comparison between two different advanced time-frequency methods, i.e., iris spectrogram and scalogram, to categorize the previous types of ECG, i.e., Normal, Tachycardia, and Bradycardia, using deep learning with Resnet101 and ShuffleNet convolutional neural networks.
4. Results
The resulting images were utilized to build deep learning models either using ResNet101 or ShuffleNet. ECG signal is segmented into three waves P, QRS, and T. Each segment proceeds with irisgram and scalogram, separately. For each wave, there are two generated colored images; one for scalogram and the other for irisgram. The labeled data are recognized based on ECG diagnosis, normal, bradycardia, or tachycardia. For each class, six datasets are achieved; three ECG segments for each category in both signals’ representations, i.e., irisgrams and scalograms. The classification is performed using two pre-trained deep-learning structures ResNet and ShuffleNet. The resulting representation images are divided into 70% training and 30% testing. The corresponding sections demonstrate the analysis of the results.
4.1. Irisgram Representation
4.1.1. ResNet
The iris image classification is executed using pre-trained ResNet101 structures, and the corresponding matrices illustrate its performance. The first one shows the performance of the irisgram of the P-waves, and the second one illustrates the capabilities of the irisgram of the T-waves. The third represents the performance of the irisgram of the QRS waves.
For the irisgram P-wave, as shown in
Figure 10a, the sensitivity of bradycardia is 85.4%, where 35 out of 41 cases are classified correctly. The precision of bradycardia is 94.6%. Meanwhile, 87 segments were discriminated from 95 P-waves for normal subjects, with a sensitivity of 91.6% and a positive predictive value of 98.9%. A true positive rate of tachycardia is the highest value at 98.9%. On the other hand, its precision is the lowest because 14 cases were misrecognized as tachycardia. The overall accuracy is 92.7% for all classes.
The confusion matrix describes the outputs of T-waves. The results are not promising. Seven segments are misclassified as healthy and 18 cases of bradycardia are misclassified as tachycardia. Therefore, the sensitivity is too low for bradycardia cases, at 39%. Moreover, 14 cases from other classes are misclassified as bradycardia by the worst precision of 57%. The performance of ResNet is better regarding normal class recognition. Nine samples are misclassified, which is one as bradycardia and the rest as tachycardia, with a sensitivity of 90.5% and a misclassification rate of 9.5%. Regarding discrimination between tachycardia and normal classes, there are seven classes of bradycardia classified as normal and the precision is 92.5%. Tachycardia’s sensitivity reaches 95.9%, and the misclassification rate is 4.1%. Furthermore, 19 cases of bradycardia cases are recognized as tachycardia. Therefore, the precision was reduced to 82.2%. The overall accuracy is the lowest, and it does not exceed 83.3%.
For the QRS wave irisgram, results show an improvement in discrimination between three classes in terms of sensitivity, accuracy, and precision. In the bradycardia class, 37 cases are distinguished correctly from 41 and six misclassified samples as bradycardia. That is why the precision reduced to 86% from what it was in the P-waves case. Recall and PPV are the highest for healthy segments by almost 97% for both performance terms. Tachycardia obtains a high level of precision using the QRS-waves, in which just three samples from the whole data were misclassified as tachycardia, and the sensitivity is 94.9%. The overall accuracy of the proposed approach regarding QRS-waves irisgram is 94.9%.
4.1.2. ShuffleNet
The irisgram images are proceed using ShuffleNet. The process is started by splitting the dataset into 70% training and 30% tests. That operation is executed on each ECG segment. The following confusion matrices characterize the performance of the test phase of the whole database.
The first confusion matrix in
Figure 11a represents the P-waves irisgram images. The sensitivity of bradycardia is 90.2%, where 37 cases are classified correctly from 41. The precision of bradycardia is 78.7%. Meanwhile, 94 segments were distinguished from 95 p-waves for normal subjects, with the highest sensitivity of 98.9% and a best positive predictive value reaching 100%. A true positive rate of tachycardia is 89.8%. On the other hand, its precision is the lowest because 14 cases were misrecognized as tachycardia. The overall accuracy is 93.6% for all classes.
The second confusion matrix represents the outputs of T-waves representation. The results are not adequate. Nine segments are misclassified as healthy, and seven cases of bradycardia are misclassified as tachycardia. Therefore, the sensitivity is too low for bradycardia cases at 61%. However, nine cases from other classes are misrecognized as bradycardia by the worst precision of 73.5%. The accomplishment of ShuffleNet is better than normal class discrimination. Seven samples are misclassified as bradycardia, and zero samples as tachycardia, with high sensitivity of 92.9%, and a misclassification rate of 7.1%. There are seven classes of bradycardia classified as normal for discrimination between tachycardia and normal classes, and the precision does not exceed 92.5%. Tachycardia’s sensitivity reaches 95.9%, and the misclassification rate is 4.1%. Moreover, 19 cases of bradycardia cases are classified as tachycardia. Therefore, the precision was reduced to 82.2%. The overall accuracy is the lowest, and it does not exceed 83.3%
For the QRS waves irisgram, results show an improvement in discrimination between three classes in terms of sensitivity, accuracy, and precision. In the bradycardia class, 31 cases are classified correctly from 41, and four misclassified samples as bradycardia. That is why the precision reduced to 88.6% from what it was in the P waves case. Recall and PPV are the highest for normal ECG segments by almost 94.8% for both performance evaluation terms. Tachycardia obtains a high level of precision using the QRS-waves, in which just five segments from all the data were misclassified as tachycardia, and the highest sensitivity reaches 100%. The overall accuracy of the proposed approach regarding QRS-waves irisgram is 94.0%.
4.2. Scalogram Representation
4.2.1. ResNet
The scalogram image recognition is performed utilizing pre-trained ResNet101 architecture, and the following matrices represent its performance. The first matrix describes the accomplishment of the scalogram of the P-waves, while the second one shows the capability level of the scalogram of the T-waves. On the other hand, the third one demonstrates the performance of the scalogram of the QRS-waves.
The first confusion matrix in
Figure 12a indicates P-waves scalogram images. The sensitivity of bradycardia is 85.4%, where 35 cases are discriminated correctly from 41. The precision of bradycardia is almost moderate by 87.5%. While 93 segments were distinguished from 95 P-waves for healthy subjects, with the highest sensitivity of 97.9% and a best positive predictive value reaching 100%. The true positive rate of tachycardia is 94.9%. The precision is 92.1% because 14 cases were misclassified as tachycardia. The overall accuracy is 94.4% for all classes.
The second confusion matrix describes the output of T-waves representation. The results are better than the P-waves and QRS-waves scalogram. Two segments are misclassified as healthy, and no cases of bradycardia are misclassified as tachycardia. Therefore, the sensitivity is too high for bradycardia cases at 95.1%. However, two cases from other classes are misrecognized as bradycardia by the highest precision of 95.1%. The performance of ShuffleNet is better than normal class discrimination. Two samples are misrecognized as bradycardia, and zero samples as tachycardia, with high sensitivity of 97.9%, and a misclassification rate of 2.1%. For discrimination between tachycardia and normal classes, there are no misclassification segments with the highest precision reaches to 100%. Tachycardia’s sensitivity is the best by 100%. Moreover, only two cases of bradycardia cases are misclassified as tachycardia. Therefore, the precision is high at 98%. The overall accuracy is the highest and reaches 98.3%.
QRS-waves SG is presented in the third confusion matrix. In the bradycardia class, 18 cases are classified correctly from 41, and 19 segments are misclassified as bradycardia. That is why the precision is too low reaching 48.6% from what it was in the P-waves case. Recall and PPV are the highest for normal ECG segments by almost 92.8% for both performance evaluation terms. Tachycardia obtains an acceptable level of precision using QRS-waves, in which 18 segments from the whole data were misclassified as tachycardia, while the sensitivity reaches 83.7%%. The overall accuracy of the proposed approach regarding the QRS-waves SG is 81.0%.
4.2.2. ShuffleNet
The scalogram image recognition is executed using a pre-trained ShuffleNet structure, and the corresponding matrices illustrate its performance. The first matrix represents the performance of the scalogram of the P-waves, while the second one shows the capability level of the scalogram of the T-waves. On the other hand, the last confusion demonstrates the performance of the scalogram of the QRS waves.
The first confusion matrix in
Figure 13a indicates P-waves scalogram images. The sensitivity of bradycardia is 2.4%, where just one case is discriminated correctly from 41. The precision of bradycardia is too low at 14.3%. While 80 segments were distinguished from 95 P-waves for healthy subjects, with moderate sensitivity of 84.2% and a positive predictive value reaching 89.9%. A true positive rate of tachycardia is 91.8%. The precision is 65.2%, because eight cases were misclassified as tachycardia. The overall accuracy is 73.1% for all classes.
The second confusion matrix describes the outputs of T-waves representation. The results are almost like a P-waves SG. Eight segments are misclassified as healthy and five cases of bradycardia are misclassified as tachycardia. Therefore, the sensitivity is too low for bradycardia cases at 80.5%. However, 13 cases from other classes are misrecognized as bradycardia by the lowest precision of 38.1%. The performance of ShuffleNet is better than normal class discrimination. Eight samples are misrecognized as bradycardia and five samples as tachycardia, with moderate sensitivity of 86.3% and a misclassification rate of 13.7%. For discrimination between tachycardia and normal classes, there are five misclassification segments and 32 segments misclassified for bradycardia with a low precision reach of 71.1%. Tachycardia’s sensitivity is the best at 93.9%. Moreover, only five cases of bradycardia cases are misclassified as tachycardia. Therefore, the precision is high at 98%. The overall accuracy is the highest and reaches 77.6%.
Table 2,
Table 3,
Table 4 and
Table 5 summarize the obtained results. Exploiting the scalogram representation of T-waves and the pre-trained ResNet101 yields a high accuracy of 98.3%.
QRS-waves SG is presented in the third confusion matrix. In the bradycardia class, 36 cases are classified correctly from 41, and 4 segments are misclassified as bradycardia. That is why the precision is almost high reaching 90.0% from what it was in the P-wave case. Recall and PPV are the highest for normal ECG segments by almost 96.8% and 100%, respectively. Tachycardia obtains the best level of precision using the QRS-waves, in which just one segment from the whole data was misclassified as tachycardia, while the sensitivity is the best too, reaching 99.0%. The overall accuracy of the proposed approach regarding the QRS-waves SG is 96.2%.
To check the validity of features extracted using the models, we check the class activation maps (CAMs) and we find that all the trained models have selected the most significant regions of the scalogram and irisgram.
Figure 14,
Figure 15 and
Figure 16 show a sample if CAM using ShuffleNet using the last ReLU layer and scalogram for the three classes Normal, Bradycardia, and Tachycardia, respectively.
4.3. K-Fold Results
To ensure the validity of the proposed methodology, the datasets’ evaluation was performed using a 5 K-fold technique. These techniques were applied on the highest two results gained using each ShuffleNet and ResNet101. The overall confusion matrix and ROC of ResNet101 with Scalogram T-waves are shown in
Figure 17. The overall confusion matrix and ROC of ResNet101 with Scalogram T-waves are shown in
Figure 18.
Then, the performance results of 5 K-fold using the two scenarios are shown in
Table 6. Using these results, we can conclude that the performance of the proposed model is stable over different sets of training and testing, which make it robust.