A CNN Approach for Emotion Recognition via EEG

Mahmoud, Aseel; Amin, Khalid; Al Rahhal, Mohamad Mahmoud; Elkilani, Wail S.; Mekhalfi, Mohamed Lamine; Ibrahim, Mina

doi:10.3390/sym15101822

Open AccessArticle

A CNN Approach for Emotion Recognition via EEG

by

Aseel Mahmoud

¹,

Khalid Amin

¹,

Mohamad Mahmoud Al Rahhal

^2,*

,

Wail S. Elkilani

²,

Mohamed Lamine Mekhalfi

³ and

Mina Ibrahim

¹

Department of Information Technology, Faculty of Computers and Information, Menoufia University, Shebin El-Kom 32511, Egypt

²

Applied Computer Science Department, College of Applied Computer Science, King Saud University, Riyadh 11543, Saudi Arabia

³

Digital Industry Center, Technologies of Vision Unit, Fondazione Bruno Kessler, 38123 Trento, Italy

^*

Author to whom correspondence should be addressed.

Symmetry 2023, 15(10), 1822; https://doi.org/10.3390/sym15101822

Submission received: 31 August 2023 / Revised: 22 September 2023 / Accepted: 23 September 2023 / Published: 25 September 2023

(This article belongs to the Special Issue Symmetry in Mechanical and Biomedical Mechanical Engineering II)

Download

Browse Figures

Versions Notes

Abstract

:

Emotion recognition via electroencephalography (EEG) has been gaining increasing attention in applications such as human–computer interaction, mental health assessment, and affective computing. However, it poses several challenges, primarily stemming from the complex and noisy nature of EEG signals. Commonly adopted strategies involve feature extraction and machine learning techniques, which often struggle to capture intricate emotional nuances and may require extensive handcrafted feature engineering. To address these limitations, we propose a novel approach utilizing convolutional neural networks (CNNs) for EEG emotion recognition. Unlike traditional methods, our CNN-based approach learns discriminative cues directly from raw EEG signals, bypassing the need for intricate feature engineering. This approach not only simplifies the preprocessing pipeline but also allows for the extraction of more informative features. We achieve state-of-the-art performance on benchmark emotion datasets, namely DEAP and SEED datasets, showcasing the superiority of our approach in capturing subtle emotional cues. In particular, accuracies of 96.32% and 92.54% were achieved on SEED and DEAP datasets, respectively. Further, our pipeline is robust against noise and artefact interference, enhancing its applicability in real-world scenarios.

Keywords:

electroencephalography; emotion recognition; convolutional neural network; spatio-temporal features; deep learning; brain–computer interface

1. Introduction

Emotions are recognized as a collection of innate cognitive states within the human mind. They imbue human actions with significance and hold a pivotal role in human interaction, intellect, and perception. A distinct emotion can be elicited by a particular sentiment and subsequently influence behavioral shifts. The study explores symmetry in EEG data, touches upon pattern analysis, offers practical applications, introduces methodological innovations, and fosters potential cross-disciplinary collaboration [1]. This alignment positions our research as a pertinent contribution to symmetry, enriching its exploration of emotion recognition and neural symmetry. Emotion recognition (ER) is a fundamental aspect of interaction and communication between human and machine, playing a crucial role in understanding human behavior and enhancing user experiences in varied applications, such as virtual reality, affective computing, and human–computer interaction [2]. Traditional approaches to ER have primarily focused on analyzing facial expressions and vocal patterns [3,4]. However, these methods may not capture the rich and intricate nuances of human emotions, particularly in scenarios where facial expressions and vocal cues may be limited or ambiguous [5].

Emotions are inherently multifaceted, varying across individuals and contexts. CNNs are renowned for their capacity to generalize patterns across diverse inputs, offering adaptability to a wide range of emotional scenarios [6]. This adaptability is pivotal, especially in applications where emotions are dynamic and subject to significant variation [7]. Within recent years, the advancement of neuroscience and brain–computer interface (BCI) technologies has paved the way for a new paradigm in ER using EEG signals [8]. EEG records the electrical signals produced by neural activities, offering a direct measurement of brain function [9]. Signals reflect the cognitive and affective processes associated with various emotional states, making EEG-ER a promising avenue for more robust and accurate emotion analysis [7]. EEG-ER has profound implications for diverse fields, including psychology, neuroscience, and mental health care [10,11,12].

EEG recordings are susceptible to various forms of noise and artifacts, such as muscle artifacts, eye blinks, and external interference [13]. CNNs possess intrinsic noise tolerance as they can automatically learn relevant features while disregarding irrelevant noise [12]. This robustness is indispensable for reliable ER in real-world settings where noisy EEG data are the norm [14]. Many applications, including affective computing and brain–computer interfaces, demand real-time or near-real-time ER [15]. CNNs are well-suited for such applications due to their inherent parallel processing capabilities. This makes them ideal for on-the-fly emotion analysis, expanding the possibilities for real-time interaction and feedback systems [16].

Traditional EEG-ER methods often struggle to achieve high accuracy and sensitivity in discerning subtle emotional nuances [10]. EEG signals carry complex spatial and temporal patterns that are not easily captured by handcrafted features and traditional machine learning algorithms [17]. CNNs, with their ability to automatically learn hierarchical features, offer the potential to extract more informative representations from raw EEG data [18]. This promises improved accuracy in ER. Moreover, emotions are complex and multifaceted, often varying significantly among individuals and situations [19]. CNNs have demonstrated their ability to generalize across different input patterns and adapt to various emotional contexts. This adaptability is crucial in real-world applications where emotions are dynamic and diverse [20]. The combination of CNNs and EEG signals for ER offers an exciting opportunity to leverage the power of deep learning and neuroscience to decode and understand emotional states directly from brain activity [21]. By transforming EEG signals into image-like representations, such as spectrograms, CNNs can effectively capture the complex spatiotemporal patterns associated with different emotions, enabling more accurate and comprehensive ER [13].

In this paper, we explore the application of CNN-ER depending on EEG signals by proposing a novel approach that utilizes CNN for recognizing emotions. Two extensively employed EEG datasets (i.e., DEAP and SEED) are used to evaluate the proposed approach. Our approach starts with preprocessing the employed datasets and then converting them into spectrogram-like representations. After that, our CNN model is trained to learn discriminative features for ER. The experimental results show the effectiveness of our approach as it achieves 96.32% accuracy on SEED dataset and 92.54% on DEAP dataset. Our main motivation is to investigate the performance and generalization ability of the CNN-ER model across these datasets and demonstrate the feasibility of using CNNs as a powerful tool for EEG-ER. The main contributions of our approach are:

Advancing ER Technology: The research contributes to the field of ER by applying CNNs to EEG data. This approach leverages the power of deep learning to improve the accuracy and sensitivity of ER, potentially surpassing traditional methods.
Utilizing EEG Signals: EEG data offers a unique window into the human brain’s emotional states. By focusing on EEG signals, the research taps into valuable physiological data that can reveal subtle emotional nuances, providing a deeper understanding of emotions.
Dataset Selection—SEED and DEAP: The choice of using the SEED and DEAP datasets adds credibility and relevance to the research. SEED contains labeled EEG data related to specific emotional states, while DEAP includes multimodal data, allowing you to explore the integration of EEG with other physiological signals.
Improving Model Generalization: The research aims to improve the generalization of models. By applying CNNs to EEG data, it develops models that can adapt to a broader range of emotional contexts, enhancing their real-world applicability.
Potential Real-time Applications: The application ER of CNNs to EEG data opens the door to real-time or near-real-time ER. This has implications for affective computing, human–computer interaction, and other domains where timely emotional feedback is valuable.

Comparison and Evaluation: The research contributes by conducting a comparative evaluation of CNN-based emotion recognition on both SEED and DEAP datasets. This comparative can provide insights into the model’s performance across different datasets and help identify its strengths and limitations.

The subsequent sections of this paper are structured in the following manner: Section 2 provides a review of related works in EEG-based ER and the application of CNNs in this domain. Section 3 outlines the proposed work, encompassing data acquisition, data preprocessing, feature extractions, and classification. Section 4 showcases the experimental outcomes and performance assessment of the CNN-ER model across the DEAP and SEED datasets. Concluding this, Section 5 delves into the discussion of findings, limitations, and prospects, highlighting CNN-ER’s potential implications in diverse real-world contexts.

2. Related Work

EEG-ER has garnered substantial interest in recent times, driven by the growing interest in affective computing, human–computer interaction, and virtual reality applications [8,22]. Numerous investigations have delved into the utilization of CNNs for ER based on EEG data, with a notable focus on the DEAP and SEED datasets. This section offers a synopsis of the primary discoveries and contributions within this domain of related research.

In the study by Elham et al. [23], a novel methodology was introduced employing a 3D-CNN to extract spatiotemporal features from EEG signals. This approach encompassed both spatial and frequency features. The authors harnessed data from diverse channels to capture inter-channel correlations, and subsequently applied this method to the DEAP database to assess its efficacy. The results indicated significant success, with achieved recognition accuracies of 87.44% for valence and 88.49% for arousal classes, affirming the potency of the proposed approach. The research successfully applies 3D CNNs to EEG data, which offer improved accuracy and sensitivity in recognizing emotions, potentially outperforming traditional methods. Moreover, 3D CNNs are well-suited to capture spatiotemporal patterns in EEG data, allowing for a more comprehensive understanding of how emotions manifest over time and across brain regions. In addition, by using 3D CNNs, the need for extensive feature engineering may be reduced, simplifying the preprocessing pipeline and making the model more adaptable to diverse emotional contexts. The limitations of the research lie in data limitations; the success of the research may depend on the availability of high-quality and diverse EEG datasets for training and testing. Limited or biased data can impact the model’s generalizability. Additionally, model complexity of 3D CNNs can be computationally intensive and may require significant computational resources, making them less practical for some applications.

In [24], Tengfei et al. introduced a novel approach to multichannel EEG-ER by leveraging a dynamical graph convolutional neural network (DGCNN). This method involved modeling multichannel EEG traits through a graph structure, facilitating EEG emotion categorization. Unlike GCNN methods, the proposed DGCNN dynamically learned the intrinsic relationships among various EEG channels via a neural network. The use of DGCNNs offers the potential for enhanced accuracy in recognizing emotions from EEG signals. DGCNNs are designed to capture complex relationships and dependencies among EEG channels, enabling a more comprehensive understanding of emotional states. Finally, DGCNNs reduce the need for extensive feature engineering, simplifying the preprocessing pipeline and potentially making the model more adaptable to diverse emotional contexts. Notably, the achieved average accuracy for the SEED dataset reached 90.4%. However, the research has some limitations related to data dependency. Moreover, DGCNNs are computationally intensive and may require substantial resources, making them less practical for some applications or low-resource environments. Finally, like many deep learning models, DGCNNs can be challenging to interpret, which can be a limitation in applications where understanding the model’s decision-making process is crucial.

Wei-Long Zheng et al. [25] presented deep belief networks (DBNs) that were trained using features from multichannel EEG data which had differential entropy. They investigated the crucial frequency bands and channels and evaluated the weights of the trained DBNs. Four distinct profiles including 4, 6, 9, and 12 channels each were chosen. The DBN’s average accuracy on the SEED dataset was 86.08%. The research identifies critical frequency bands within data that are most informative for ER. This feature selection enhances the efficiency of the model, reducing computational complexity. In addition, DBNs have demonstrated a capacity to learn intricate patterns and representations from raw EEG data. They can uncover complex relationships, which is pivotal for capturing subtle emotional nuances. Moreover, the research minimizes the need for extensive feature engineering, simplifying the preprocessing pipeline and rendering the model adaptable to a variety of emotional contexts. However, there is a limitation regarding over-fitting risk which means that deep learning models, if not carefully regularized and validated, may be susceptible to over-fitting, especially with limited training data. In addition, the research may lack external validation on independent datasets or real-world scenarios, which is crucial for assessing the model’s generalizability and real-world applicability.

R. Gill and G. Singh [16] proposed a method to use EEG-based signals to merely think about qualities and the power spectral density to determine the emotional states of the client. During the data acquisition phase, the DEAP dataset was employed. When compared to support vector machines (SVMs) and k-nearest neighbors (k-NN), the deep neural network (DNN) exhibited elevated recall, enhanced precision, and higher accuracy, as evidenced by the outcomes. Notably, the average accuracy reached 93% for the DNN model. One of the research limitations is the inability to assess the robustness and real-world performance of the deep learning model; external validation on independent datasets or in real-world scenarios is crucial. Moreover, the research should demonstrate the generalizability of the deep learning approach across various consumer scenarios and populations to ensure its broader applicability.

Rofiqoh et al. [20] used 2D CNN in a variety of designs and configurations. Additionally, a wavelet was used to extract the EEG signal, which represented the emotion variable first. Instead of 62 channels, just 12 channels from the SEED dataset were used. Compared to the one-dimensional CNN’s accuracy of 75.97%, the two-dimensional CNN’s accuracy was 83.44%. The use of wavelet filtering can help extract relevant features from EEG data, which may enhance the discriminative power of the model by focusing on specific frequency components. In addition, combining wavelet filtering and CNNs may pave the way for multimodal ER, where EEG data are integrated with other modalities such as facial expressions or speech for more robust results. However, complexity is one of the risks in this research because the combined approach of wavelet filtering and CNNs can introduce complexity into the model, potentially making it computationally expensive and harder to interpret.

J. Zhou et al. [26] proposed a convolutional auto-encoder (CAE)-based multimodal emotion identification technique. First, a CAE was created to extract the fusion features of numerous external physiological inputs and multichannel EEG signals. For recognizing emotions, a fully connected neural network classifier was created. The final average score using the DEAP dataset was 92.07%. The paper focuses on multimodal ER, which combines information from different modalities such as facial expressions, speech, and physiological signals. This approach can potentially enhance ER accuracy by leveraging complementary information. Moreover, the use of CAE as a feature extraction method is advantageous. CAEs are capable of learning hierarchical features from raw data, which can capture complex patterns in multimodal inputs. However, the research should demonstrate the generalizability of the multimodal approach across various emotional contexts and diverse populations to ensure its broader applicability.

Zhou et al. [27] proposed the implementation of a 3D CNN model to autonomously capture both spatial and temporal information from EEG recordings. In the Four Class Classification Task of the DEAP dataset (encompassing low valence/low arousal, high valence/high arousal, and high valence), a commendable accuracy rate of 93.53% was achieved. Muhamet et al. [28] utilized a singular model based on SEED dataset, leveraging ResNet50 architecture and Adam optimizer. A distinct model solely utilizing EEG data was employed for recognizing positive, neutral, and negative emotional states. The dataset was partitioned into training and testing sets, followed by shuffling before input to the CNN model. Impressively, the accuracy for negative ER attained the highest value at 94.86%, trailed by neutral ER at 94.29%, and positive ER at 93.25%. This culminated in an average accuracy of 94.13%. Ramzan et al. [29] combined CNN and LSTM-RNN; two deep learning models appeared to work better when analyzing EEG signals for emotions. The fused deep learning classification model’s analysis of the SEED dataset’s average accuracy in detecting both positive and negative emotions yielded a result of 93.74%. In the previous three papers, the authors hint at the potential for real-time or near-real-time ER, which is valuable for applications such as affective computing and human–computer interaction. However, deep learning models, including 3D CNNs, are often considered black-box models. Understanding how and why the model makes specific predictions can be challenging, especially in applications where transparency is important.

3. Proposed Work

In this section, the proposed approach will be discussed in detail. Figure 1 shows the main processes in the proposed approach, which are: (1) Data acquisition, which presents the acquisition of EEG data through using DEAP and SEED datasets; (2) Data preprocessing, preprocess the EEG data to make it suitable for CNN; (3) Feature extraction, which implements feature extraction methods to derive significant attributes from the preprocessed EEG signals; and (4) Classification, classifying the output of the previous stage using the CNN’s classifier.

3.1. EEG Data Acquisition

Data acquisition includes rigorous quality checks and validation specific to DEAP and SEED datasets. The utilization of the DEAP EEG dataset allows for a qualitative assessment of the derived features, as opposed to relying on multiple, comparatively smaller datasets. The DEAP EEG dataset encompasses EEG and peripheral signals from 32 individuals, constituting a publicly accessible, preprocessed resource [27]. A total of 40 video clips with a maximum runtime of one minute were employed as emotional stimuli. Each clip is scored by 32 viewers based on the three factors of arousal, dominance, and valence. Using a scale from 1 to 9, determine the degree of intimacy and likeability. Each dimension’s label corresponds to the self-rated assessment provided by the participants. The valence dimension represents a measure of pleasantness, extending from unhappy to cheerful. The arousal dimension gauges the intensity of emotions, ranging from the least to the most intense. Lastly, the dominance dimension quantifies the extent of emotional control, spanning from total control to complete loss of control.

On the other hand, the SEED dataset [30,31] includes various emotional stimuli that are carefully selected to elicit specific emotional responses from the participants. These emotional stimuli are classified into distinct emotion categories, including but not limited to happiness, sadness, fear, and anger. The SEED dataset primarily focuses on EEG signals, which are recorded using EEG caps with multiple electrodes placed on the scalp to capture brain activity related to emotions. It includes the EEG and eye movement of 12 participants and EEG data of another 3 subjects. Data were gathered while they watched movie snippets. The movie snippets were chosen with care to elicit various emotions, including neutral, negative, and positive ones as shown in Figure 2.

Each movie clip lasts for almost four minutes. Each experiment is comprised of a total of 15 trials. To avoid consecutive presentation of two movie snippets that elicit the same emotion, a specific arrangement is followed in the presentation. Participants were instructed to promptly complete a questionnaire after viewing each film clip, offering feedback on their emotional reactions as depicted in the protocol outlined in Figure 3.

3.2. Data Preprocessing

Preprocessing readies the raw EEG data for subsequent feature extraction and classification processes, augmenting the model’s capacity to assimilate significant patterns and enhance accuracy in ER. The next step in the data preprocessing phase in EEG-ER utilizing CNNs for both the SEED and DEAP datasets is data cleaning with two essential methods:

Remove noise and artifacts [32]: EEG signals are susceptible to noise and artifacts, such as eye blinks, muscle movements, and environmental interference [4]. Implement suitable filtering methods, including notch filters, band-pass filters, and high-pass filters, to eliminate undesirable noise and artifacts [33].
Baseline correction [11]: Conduct baseline correction to standardize the EEG data with respect to a predefined baseline interval [34]. This procedure aids in mitigating the influence of baseline drift on the EEG signals as shown in Figure 4.

According to what was covered in the previous part, each of the 32 subjects received 40 one-minute EEG recordings. The first 20 s of the video were removed due to mood fluctuations between the films since they might have been influenced by the one before it. The sensations that were perceived as a result are false. The size of the temporal window is a vital choice that must be handled very carefully when using electrical brain signals to identify sentiments. After deducting the 20 s, there were 40 s left in the time clip. We separated it into 10-s time windows, and since each video clip has four of these windows, there are a total of 160 video clips in all. After inserting the EEG signals, 32 channels are then retrieved for each of the 32 subjects from the EEG channels that were chosen in accordance with 10–20 international criteria as shown in Figure 5 [26]. Returning 40 videos to each channel, processing goes on up to channel 32 starting with the first channel.

The SEED dataset underwent down-sampling to a frequency of 200 Hz. A band-pass frequency filter ranging from 0 Hz to 75 Hz was applied. EEG segments, corresponding in length to each movie clip, were extracted. The experiment was repeated thrice for each participant, with approximately one week between each session. Each subject file comprised 16 arrays. Specifically, the segmented and preprocessed EEG data from 15 trials of one experiment were stored in fifteen arrays (eeg_1 to eeg_15, channel × data). Emotional labels were assigned to each array entry (−1 for negative, 0 for neutral, and +1 for positive). The dataset maintained a precise channel ordering. The EEG cap for 62 channels adhered to the international 10–20 system as depicted in Figure 6. Our data collection and preprocessing strategies will enhance dataset quality, ensuring that the CNN models can effectively capture emotional nuances. This attention to dataset-specific details is a key novelty.

3.3. Feature Extraction

Within a CNN framework for EEG-based ER, the feature extraction phase assumes critical importance, with its objective being the identification and extraction of pertinent and distinguishing attributes from the preprocessed EEG data. The extracted features are used to represent the unique patterns present in the EEG signals that are associated with different emotional states. This process helps the CNN learn meaningful representations, making it more effective in classifying emotions accurately [35].

The power spectral density (PSD) is used to extract features as it stands as a prevalent method in EEG-ER through CNNs [36]. PSD illustrates the power distribution across varied frequency bands within EEG signals, showcasing efficacy in capturing frequency–domain attributes linked to diverse emotional states [21]. The incorporation of PSD as features into a CNN empowers the model to discern and identify unique emotion-associated patterns, rendering it a valuable strategy within the realm of ER research.

Here is a detailed explanation of the PSD feature extraction process for EEG-ER using CNN:

Preprocessed EEG Data: Before extracting PSD features, the raw EEG data are preprocessed to remove noise, artifacts, and baseline drift. Common preprocessing techniques include filtering, artifact removal, and normalization as mentioned in the preprocessing subsection before.
Windowing: Windowing the EEG signal from the DEEP or SEED dataset for use with a CNN involves dividing the continuous EEG data into $n$ smaller overlapping windows or segments. Each window ( $w$ ) represents a specific time interval of brain activity, and this process allows the CNN to capture local and temporal variations in the EEG signal. Windowing the EEG signal can be represented as shown in Equation (1).

$w = \{w_{1}, w_{2}, \dots, w_{n}\}$

(1)

where each window $w_{i}$ is of size $W$ . The starting time of each window is determined by the overlap size $O$ in Equation (2).

$w_{n} = x (t = n - 1) \times O t o (n - 1) \times O + N$

(2)

where $x (t)$ is a continuous time-domain EEG signal from datasets, $N$ represents duration of each window, and $O$ represents overlap between adjacent windows [37]. The windows are usually fed into the CNN as individual inputs for ER. The CNN will learn and recognize patterns in each window, and the overlap between adjacent windows ensures that temporal variations in the EEG signal are captured effectively.
Fourier Transform: For each EEG window, the fast Fourier transform (FFT) is employed to transform the signal from the time domain to the frequency domain. The FFT computes the complex amplitudes of distinct frequency components inherent in the EEG signal [38]. The frequency domain depiction acquired through FFT can serve as a feature in ER endeavors. The FFT is sampled at a rate of 128 Hz. Equation (3) outlines the methodology for constructing the period gram $p_{w_{n}}$ for each block [34].

$p_{w_{n}} = f f t (w_{n})$

(3)

Following the computation of FFT for each EEG signal window, the frequency domain portrayal (power spectrum) can be harnessed as an input feature for CNN-ER. The CNN will subsequently acquire the ability to identify frequency domain patterns that signify distinct emotional states. This enhancement in the model’s proficiency contributes to precise classification of emotions from EEG signals.

4.: Power Spectrum Calculation: PSD symbolizes the dispersion of power throughout distinct frequency bands within a signal [30]. Within the framework of EEG-ER through CNNs, the PSD is determined from the EEG signal utilizing the FFT. This representation denotes the squared magnitude of frequency components, normalized by the size of the window $N$ . The computed PSD values capture the power distribution across different frequency bands in the EEG signal, which are essential for identifying patterns associated with different emotional states.
5.: Frequency Bands: The power spectrum undergoes partitioning into multiple frequency bands using the Welch technique [34], encompassing alpha, beta, theta, and gamma bands, while excluding the computation of the delta band (slow and fast delta waves) which were detected in the ranges 0–2 Hz and 2–4 Hz, respectively, as shown in Figure 7. Delta bands detect deep sleep status which is not vital in our research; as such, the DEAP preprocessed signals were filtered between 4 and 45 Hz.

The power of the four bands is computed for each channel, yielding a power feature vector with a dimension of 128 (4 (bands)

\times

32 (channels) = 128) for DEEP dataset, and for the SEED dataset, the power of the four bands is computed for each channel, resulting in the creation of a 248-dimensional feature vector (4 (bands)

\times

62 (channels) = 248). Effective EEG power features have been discovered for recognizing emotional changes following music stimulation [32].

3.4. Classification

In our approach, the CNN classifier is used to apply EEG-ER. The architecture is designed to process the frequency–domain features obtained from the EEG data. The architecture typically includes three main layers as shown in Figure 8. The main functions of each layer are:

Input layers, which receive the preprocessed EEG data, which include features such as PSD obtained from the EEG signals.
Hidden layers, which consist of many layers: (a) Convolutional layers, which are used to detect spatial and temporal patterns in the frequency–domain features of EEG data. These layers consist of filters (i.e., kernels) that slide over the input data to extract relevant features; (b) Activation function layer (e.g., ReLU), which is applied after each convolutional layer to introduce non-linearity and enhance the model’s learning capacity; (c) Pooling layers, which undertake down-sampling of the feature maps, aiming to decrease computational intricacy and enhance translation invariance. The adoption of max pooling or average pooling techniques is a prevalent practice [20]; (d) Batch normalization layer, which stabilizes training and speeds up convergence in the CNN; and (e) Dropout layer (optional layer), which prevents over-fitting and enhances model generalization by randomly dropping out neurons during training.
Fully connected layers and the activation layer, where fully connected layers are utilized to establish a mapping from high-level features to emotion classes. The count of neurons in the output layer aligns with the number of emotional states intended for classification. Furthermore, the output layer is endowed with the SoftMax activation function, generating probability scores for each emotion class. This facilitates the model’s capacity to formulate class predictions.

Figure 8. Identification of CNN architecture.

The determination of the layer count, filter quantity, kernel sizes, and activation functions is contingent upon the task’s complexity and the dataset’s magnitude. Extracted features are frequently subjected to normalization, scaling values within a designated range (e.g., 0 to 1). This normalization contributes to improved convergence during training and enhances the overall model stability. The network processes the EEG signal through a 3 × 3 convolution kernel filter, with a singular stride. The architecture of the 2D CNN is depicted in Figure 8. The CNN model is compiled with appropriate loss function, optimizer, and evaluation metrics [33]. The loss function gauges the alignment between the network’s predictions and the true target values during training. It numerically expresses the disparity between the forecasted output and the factual ground truth labels for a specific training instance. The goal of training the CNN is to minimize the value of the loss function, which corresponds to making accurate predictions. The selection of an appropriate loss function hinges on the specific characteristics of the task undertaken by the CNN. Based on our specific task, we utilize sparse categorical cross-entropy loss. Examples of other loss functions that can be used with other types of tasks include:

Binary Cross-Entropy Loss, also known as Log Loss.
Mean Squared Error (MSE) Loss.
Connectionist Temporal Classification (CTC) Loss.

Sparse categorical cross entropy stands as one of the viable loss functions within our approach, employed to optimize network parameters as shown in the Equation (4) [39].

L o s s = - \sum_{i}^{c} t_{i} \log s_{i}

(4)

where

t_{i}

signifies the true or predicted value,

c

represents the count of class labels, and

s_{i}

corresponds to the result of the SoftMax value.

An optimizer holds pivotal significance in the training of machine learning models. Its central function is to iteratively modify the model’s parameters (weights and biases) throughout training, aiming to minimize the loss function and enhance the model’s proficiency in the given task. Optimization entails the pursuit of an optimal parameter configuration that empowers the model to yield precise predictions on training data and extend its effectiveness to new, unobserved data. In our research, the Adam optimizer will be used because of it is efficiency compared to other optimizers [18,28,29].

The CNN undergoes training on the training dataset, iteratively refining its weights through back propagation and gradient descent. The model endeavors to minimize the loss function, enhancing its capability to proficiently classify emotions via frequency–domain attributes. The trained CNN’s evaluation is conducted on the testing dataset to gauge its performance.

4. Results and Discussion

In this section, we start by presenting the evaluation metrics used to evaluate the applied approaches. After that, the employed datasets are described in detail and the employed training models are discussed. Finally, the analysis of the obtained results is presented.

4.1. Evaluation Metrics

The performance of the applied approaches is evaluated using classification accuracy, precision, recall, F1-score, and confusion matrix for quantifying the model’s prowess in emotion classification. Our evaluation metrics will be adapted to the nuances of DEAP and SEED datasets, considering variations in emotional labeling and annotation protocols. Comparative analysis will underscore the model’s versatility and its potential for cross-dataset generalization. Moreover, by employing dataset-specific evaluation metrics, our research offers a comprehensive assessment of model performance. The comparative analysis highlights the model’s adaptability, showcasing its applicability across different datasets. Equation (5) is used to determine the classification accuracy, where

T P

= True positive, FP = False positive,

T N

= True negative, and

F N

= False negative.

C l a s s i f i c a t i o n a c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(5)

Precision measures the proportion of true positive predictions (correctly predicted positive cases) out of all positive predictions made by the model. It quantifies the model’s ability to avoid false positives. Equation (6) is used to determine the precision.

P r e c i s i o n = T P / (T P + F P)

(6)

Recall, also known as sensitivity or true positive rate, measures the proportion of true positive predictions out of all actual positive cases. It quantifies the model’s ability to capture positive cases correctly. Equation (7) is used to determine the recall.

R e c a l l = T P / (T P + F N)

(7)

The F1 score is the harmonic mean of precision and recall. It provides a balanced measure that considers both false positives and false negatives. A higher F1 score indicates a better balance between precision and recall. Equation (8) is used to determine the F1 score.

F 1 s c o r e = 2 * (P r e c i s i o n \times R e c a l l) / (P r e c i s i o n + R e c a l l)

(8)

These evaluation metrics help assess the performance of the EEG-ER model using 2D CNNs by considering aspects of precision, recall, and the balance between them. They are valuable tools for understanding the model’s ability to correctly classify emotions in EEG data. Creating 2D CNN models specifically tailored to EEG characteristics in DEAP and SEED datasets is a novelty. Utilizing transfer learning and dataset-specific hyper parameter tuning enhances model adaptability and performance.

4.2. Datasets

As mentioned before, the DEAP and SEED datasets are utilized in our approach. These datasets are the benchmark datasets for EEG-ER. In this subsection, more detailed information about the structure of these datasets is given. Moreover, the employed cross validation over these datasets is discussed.

For DEAP dataset, a total of 32 subjects from the DEAP dataset were used as mentioned in Section 3. The 10-s time window in which each subject’s data should be sorted into one labeled emotion was shortened. Three categories—valence, arousal, and dominance—were used to organize the data. A total of 32 EEG electrodes (i.e., channels) were considered. The positions of these electrodes in relation to 10–20 system are: FC5, F4, F7, AF3, CP6, T7, C3, FC6, P4, Fp2, F8, P3, CP5, O1, F3, P8, CP2, CP1, PO4, O2, Pz, Oz, T8, FC2, Fz, AF4, PO3, Cz, C4, FC6, P4, Fp2, F8, and P3 [40]. Each participant had a total of 160 observations (40 videos

\times

4 segments = 160). A cross-validation approach was employed, segregating participant data into training and testing sets with

k = 5

. By employing the 5-fold cross-validation technique, the data were partitioned into five equitably sized portions. The initial four segments, constituting 128 observations in total, were allocated for individual classifier training, while the fifth segment, encompassing 32 observations, served as the testing data. This process was repeated iteratively for a total of five rounds, while the initial four segments consistently served as training data, with the last segment dedicated to testing.

For SEED datasets, a total of 15 subjects (i.e., 7 male and 8 female) from the SEED dataset were used as mentioned in Section 3. The 10-s time window in which each subject’s data should be sorted into one labeled emotion was shortened. The data were organized into three categories: positive, neutral, and negative. A total of 62 EEG electrodes (channels) were considered, where the positions of these electrodes are shown in Figure 6. A total of 15 video clips, encompassing positive, neutral, and negative emotions, were meticulously chosen as stimuli for the experiment. Each participant in the SEED dataset underwent three experimental sessions with a one-week gap between each session. This resulted in a cumulative count of 180 sessions (15 videos

\times

3 sessions

\times

4 segments). To organize the data, a cross-validation approach was adopted, segmenting participant data into training and testing sets with

k = 5

as used in the DEAP dataset. The initial four segments, amounting to a total of 144 observations, were allocated to train the classifier for each individual, while the fifth segment, containing 36 observations, was designated as the test data.

4.3. Training Model

In our conducted experiments, a convolution kernel sized

3 \times 3

was employed to discern emotions. The Keras model [18], harnessed from the EEG signals, makes use of Keras, an open-source deep learning framework for Python, which proficiently supports CNNs [20]. The model utilized within this research was of the two-dimensional variant. The quantity and dimensions of filters were hyperparameters that distinctly impacted the network’s aptitude for learning distinctive features. Subsequently, the activation function introduced non-linearity to the CNN, thereby empowering it to capture intricate relationships between input and output. Widely used activation functions include Rectified Linear Unit (ReLU), Leaky ReLU, and Parametric ReLU (PReLU) [23]. In our research, we adopted ReLU due to its straightforwardness and computational efficiency [41]. The utilization of max pooling was integral to our approach. Pooling effectively curtails computations, moderates overfitting, and upholds translation invariance.

For stride, we used one stride to determine the step size of the filter while traversing the input data. Larger strides lead to a reduction in spatial dimensions, and smaller strides help preserve more spatial information [29]. Then, in the batch normalization step, the batch size was equal to 32. We selected the number of epochs based on the concept of balance between under-fitting and over-fitting [42], which was 50 epochs in our case. The appropriate number of epochs leads to a well-trained model that generalizes effectively to unseen data and accurately classifies emotions in EEG signals. Table 1 illustrates the details of the training model for both SEED and DEAP dataset. Our interpretation of results goes beyond generic patterns, offering dataset-specific insights. This not only enhances the scientific understanding of EEG-ER but also aids in refining our models for these specific datasets.

4.4. Result Analysis

In this discussion, we will contextualize our findings, emphasizing how they address dataset-specific challenges in EEG-ER within DEAP and SEED datasets, thereby advancing the field’s understanding of EEG-ER. Wavelet technology was employed to filter EEG signals within the frequency range of 4–45 Hz. The complete training dataset was then engaged, utilizing a 5-fold cross-validation approach. The aim was to categorize EEG signals, employed within the 2D-CNN approach, into positive, neutral, and negative emotions for the SEED dataset and arousal, valence, and dominance for the DEAP dataset, employing the output of the wavelet filter as input. The investigation pursued a twofold objective in emotion analysis and recognition. Initially, the verification of a three-dimensional emotion model took place. Subsequently, scalograms were deployed for the evaluation and prediction of diverse emotions extracted from EEG data. An integral challenge involved the multi-task classification of emotions.

The entirety of the experiments was conducted through Google Colab. Table 2 displays the results of the classification analysis conducted on the SEED dataset. From the table, the achieved accuracy results are as follows: 98.23% for the positive dimensional space, 97.95% for the neutral dimensional space, and 94.54% for the negative dimensional space. The recall results are 90.78% for positive, 91.02% for neutral, and 90.32% for negative. When assessing the F1 score metric, the results are 93.68% for positive, 93.32% for neutral, and 92.65% for negative. Furthermore, the precision results are 94.012% for positive, 92.89% for neutral, and 92.087% for negative.

Turning our attention to the DEAP dataset, the results of the classification analysis conducted on the DEAP dataset are shown in Table 3. From the table, we achieved the following results: 94.23% accuracy for the valence dimensional space, 93.78% accuracy for the arousal dimensional space, and 89.54% accuracy for the dominance dimensional space. In terms of recall, the results are 86.64% for valence, 87.26% for arousal, and 75.85% for dominance. Assessing the F1 score, we recorded values of 89.32% for valence, 90.31% for arousal, and 81.23% for dominance. Lastly, the precision scores are 88.32% for valence, 91.45% for arousal, and 81.10% for dominance.

The performance evaluation and interpretation of the model’s effectiveness in recognizing emotions are based on the utilization of a confusion matrix [26], which is visually represented in Figure 9. The confusion matrix serves as a standard tool for gauging the classification accuracy of the machine learning model in the domain of ER, specifically focusing on the identification of positive, neutral, and negative emotions. In the context of the confusion matrix, the rows are indicative of the actual or target emotion classes, while the columns represent the predicted emotion classes generated by the model. The diagonal elements within the matrix illustrate instances where the model’s predictions align with the true emotions, thus signifying successful recognition of emotions by the model. The model’s performance was assessed across three distinct emotional categories: positive, neutral, and negative for SEED dataset, and three distinct emotional categories: valence, arousal, and dominance for DEAP dataset. The model achieved an average accuracy of 96.32% on the SEED dataset and 92.54% for the DEAP dataset, indicating its exceptional capability to recognize emotions with remarkable accuracy.

The progression of the training and validation processes is visually depicted in Figure 10 and Figure 11 for the SEED and DEEP datasets respectively, where the vertical axis, ranging from 0 to 1, represents the model’s skill, while the horizontal axis signifies the training epochs. The observations gleaned from Figure 10 indicate that the model seamlessly fits the training dataset, demonstrating its suitability and efficiency. Furthermore, the graph illustrates a clear correlation between the number of training epochs and the loss function. As the number of epochs increases, the loss steadily decreases, and conversely, the model’s performance improves. Both the training and testing datasets exhibit a marginal reduction in loss values over the course of 100 epochs. Notably, the training converges efficiently, as evidenced by the smooth trajectory without oscillations. This rapid decline in loss contributes to heightened training accuracy and signifies the effective training of the model.

In contrast to prior studies that validated their methodologies using the DEAP, SEED, or both datasets, the current research showcases a convincing classification rate. A comparison with recent ER investigations utilizing the DEAP dataset is presented in Table 4. In [28], leveraging CNN-based classification of multi-spectral topological images derived from EEG signals was proposed. This approach retained temporal, spectral, and spatial EEG signal information, in contrast to most techniques that discard spatial details. The accuracy for the DEAP dataset reached 90.62% for negative and positive valence, 86.1% for high and low arousal, 88.48% for high and low dominance, 86.2% for like-unlike, and an average accuracy of 88%. In [26], a convolutional auto-encoder-based multimodal ER approach was suggested. It utilized a CAE to extract fusion features from multichannel EEG signals and multiple EP signals, followed by a fully connected neural network classifier for ER. The average accuracy achieved was 92%. In [20], authors proposed a wavelet filter-based extraction of EEG signals within the 4–45 Hz frequency range, encompassing theta, alpha, beta, and gamma waves. Using CNN for classifying the extracted results into three emotion classes, an average accuracy of 83.44% was achieved. Furthermore, the authors in [16] introduced a learning model based on DNN for identifying emotional states using EEG signals, focusing solely on arousal with an accuracy of 93%. On the other hand, the accuracy of our approach is 92.54%, which is higher than the accuracies achieved in [19,25,27]. In [15], the authors only address the arousal dimension, while our approach addresses the three dimensions (i.e., VAD).

Turning our attention to the SEED dataset, a comparison with recent ER investigations utilizing the SEED dataset is presented in Table 5. In [36], the authors introduced a technique that employed CNN for emotion categorization by transforming EEG signals into topographic images incorporating frequency and spatial information. The maximum accuracy achieved was 89.06%. On the other hand, the accuracy of our approach is 96.32%, which is higher than the accuracy achieved in [35]. Our discussion highlights the unique contribution of our research within the context of DEAP and SEED datasets. It underscores the practical relevance and scholarly significance of our work, specifically tailored to these datasets.

5. Conclusions

Our proposed work demonstrated the effectiveness of 2D CNNs in EEG-ER, offering a non-invasive method for analyzing emotions and understanding the neural mechanisms associated with them. The results contribute to the advancement of ER technology, with potential applications in emotion-aware systems, mental health monitoring, and personalized affective computing. The study encourages further exploration of deep learning models, hyperparameter tuning, and advanced feature extraction techniques to advance EEG-ER. This study showcases the potential of using deep learning techniques, specifically 2D CNNs, for analyzing emotions through EEG signals and underscores its importance in various real-world applications. Finally, by specifying practical applications and future research directions, we emphasize the direct and continued relevance of our work within the EEG-ER domain, particularly tailored to DEAP and SEED datasets.

Author Contributions

Methodology, A.M., K.A. and M.I.; Software, A.M. and K.A.; Formal analysis, M.L.M.; Writing—original draft, A.M. and W.S.E.; Writing—review & editing, M.M.A.R.; Funding acquisition, M.M.A.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the researchers supporting project number (RSPD2023R995), King Saud University, Riyadh, Saudi Arabia.

Data Availability Statement

This study did not involve the creation of any new data. The SEED dataset is available at https://bcmi.sjtu.edu.cn/home/seed/ (accessed on 1 March 2023) and DEAP dataset is available at https://www.eecs.qmul.ac.uk/mmv/datasets/deap/ (accessed on 20 June 2022).

Acknowledgments

This research was supported by the researchers supporting project number (RSPD2023R995), King Saud University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zaman, K.; Sun, Z.; Shah, S.M.; Shoaib, M.; Pei, L.; Hussain, A. Driver Emotions Recognition Based on Improved Faster R-CNN and Neural Architectural Search Network. Symmetry 2022, 14, 687. [Google Scholar] [CrossRef]
Schaaff, K.; Schultz, T. Towards emotion recognition from electroencephalographic signals. In Proceedings of the 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, Amsterdam, The Netherlands, 10–12 September 2009; pp. 1–6. [Google Scholar]
Soleymani, M.; Asghari-Esfeden, S.; Fu, Y.; Pantic, M. Analysis of EEG Signals and Facial Expressions for Continuous Emotion Detection. IEEE Trans. Affect. Comput. 2016, 7, 17–28. [Google Scholar] [CrossRef]
Luo, X.; Fu, Q.J.; Galvin, J.J., III. Cochlear Implants Special Issue Article: Vocal Emotion Recognition by Normal-Hearing Listeners and Cochlear Implant Users. Trends Amplif. 2007, 11, 301–315. [Google Scholar] [CrossRef]
Black, M.J.; Yacoob, Y. Recognizing Facial Expressions in Image Sequences Using Local Parameterized Models of Image Motion. Int. J. Comput. Vis. 1997, 25, 23–48. [Google Scholar] [CrossRef]
Aznan, N.K.N.; Bonner, S.; Connolly, J.; Al Moubayed, N.; Breckon, T. On the Classification of SSVEP-Based Dry-EEG Signals via Convolutional Neural Networks. In Proceedings of the 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Miyazaki, Japan, 7–10 October 2018; pp. 3726–3731. [Google Scholar]
Al-Nafjan, A.; Hosny, M.; Al-Ohali, Y.; Al-Wabil, A. Review and Classification of Emotion Recognition Based on EEG Brain-Computer Interface System Research: A Systematic Review. Appl. Sci. 2017, 7, 1239. [Google Scholar] [CrossRef]
Pollatos, O.; Kirsch, W.; Schandry, R. On the relationship between interoceptive awareness, emotional experience, and brain processes. Cogn. Brain Res. 2005, 25, 948–962. [Google Scholar] [CrossRef]
Garg, D.; Verma, G.K. Emotion Recognition in Valence-Arousal Space from Multi-channel EEG data and Wavelet based Deep Learning Framework. Procedia Comput. Sci. 2020, 171, 857–867. [Google Scholar] [CrossRef]
Chang, C.-C.; Lin, C.-J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 1–27. [Google Scholar] [CrossRef]
Duan, R.-N.; Zhu, J.-Y.; Lu, B.-L. Differential entropy feature for EEG-based emotion classification. In Proceedings of the 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER), San Diego, CA, USA, 6–8 November 2013; pp. 81–84. [Google Scholar]
Wen, Z.; Xu, R.; Du, J. A novel convolutional neural networks for emotion recognition based on EEG signal. In Proceedings of the 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), Shenzhen, China, 15–17 December 2017; pp. 672–677. [Google Scholar]
Currie, J. Music After All. J. Am. Music. Soc. 2009, 62, 145–203. [Google Scholar] [CrossRef]
Mehrabian, A. Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in Temperament. Curr. Psychol. 1996, 14, 261–292. [Google Scholar] [CrossRef]
Brosschot, J.F.; Thayer, J.F. Heart rate response is longer after negative emotions than after positive emotions. Int. J. Psychophysiol. 2003, 50, 181–187. [Google Scholar] [CrossRef] [PubMed]
Gill, R.; Singh, J. Consumer Emotional State Evaluation Using EEG Based Emotion Recognition Using Deep Learning Approach. In Advanced Computing; Springer: Singapore, 2021; pp. 113–127. [Google Scholar]
Jenke, R.; Peer, A.; Buss, M. Feature extraction and selection for emotion recognition from EEG. IEEE Trans. Affect. Comput. 2014, 5, 327–339. [Google Scholar] [CrossRef]
Gao, Y.; Gao, B.; Chen, Q.; Liu, J.; Zhang, Y. Deep Convolutional Neural Network-Based Epileptic Electroencephalogram (EEG) Signal Classification. Front. Neurol. 2020, 11, 375. [Google Scholar] [CrossRef] [PubMed]
Tong, L.; Zhao, J.; Fu, W. Emotion Recognition and Channel Selection Based on EEG Signal. In Proceedings of the 2018 11th International Conference on Intelligent Computation Technology and Automation (ICICTA), Changsha, China, 22–23 September 2018; pp. 101–105. [Google Scholar]
Haqque, R.H.D.; Djamal, E.C.; Wulandari, A. Emotion Recognition of EEG Signals Using Wavelet Filter and Convolutional Neural Networks. In Proceedings of the 2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), Bandung, Indonesia, 29–30 September 2021; pp. 1–6. [Google Scholar]
Yamashita, R.; Nishio, M.; Do, R.K.G.; Togashi, K. Convolutional neural networks: An overview and application in radiology. Insights Imaging 2018, 9, 611–629. [Google Scholar] [CrossRef] [PubMed]
Bhatti, A.M.; Majid, M.; Anwar, S.M.; Khan, B. Human emotion recognition and analysis in response to audio music using brain signals. Comput. Hum. Behav. 2016, 65, 267–275. [Google Scholar] [CrossRef]
Salama, E.S.; El-Khoribi, R.A.; Shoman, M.E.; Wahby, M.A. EEG-Based Emotion Recognition using 3D Convolutional Neural Networks. Int. J. Adv. Comput. Sci. Appl. 2018, 9, 329–337. [Google Scholar] [CrossRef]
Song, T.; Zheng, W.; Song, P.; Cui, Z. EEG Emotion Recognition Using Dynamical Graph Convolutional Neural Networks. IEEE Trans. Affect. Comput. 2020, 11, 532–541. [Google Scholar] [CrossRef]
Zheng, W.-L.; Lu, B.-L. Investigating Critical Frequency Bands and Channels for EEG-Based Emotion Recognition with Deep Neural Networks. IEEE Trans. Auton. Ment. Dev. 2015, 7, 162–175. [Google Scholar] [CrossRef]
Zhou, J.; Wei, X.; Cheng, C.; Yang, Q.; Li, Q. Multimodal Emotion Recognition Method Based on Convolutional Auto-Encoder. Int. J. Comput. Intell. Syst. 2019, 12, 351–358. [Google Scholar] [CrossRef]
Zhao, Y.; Yang, J.; Lin, J.; Yu, D.; Cao, X. A 3D Convolutional Neural Network for Emotion Recognition based on EEG Signals. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–6. [Google Scholar]
Ozdemir, M.A.; Degirmenci, M.; Izci, E.; Akan, A. EEG-based emotion recognition with deep convolutional neural networks. Biomed. Eng./Biomed. Tech. 2020, 66, 43–57. [Google Scholar] [CrossRef]
Ramzan, M.; Dawn, S. Fused CNN-LSTM deep learning emotion recognition model using electroencephalography signals. Int. J. Neurosci. 2023, 133, 587–597. [Google Scholar] [CrossRef]
Wang, R.; Wang, J.; Yu, H.; Wei, X.; Yang, C.; Deng, B. Power spectral density and coherence analysis of Alzheimer’s EEG. Cogn. Neurodynamics 2015, 9, 291–304. [Google Scholar] [CrossRef]
Yoon, H.J.; Chung, S.Y. EEG-based emotion estimation using Bayesian weighted-log-posterior function and perceptron convergence algorithm. Comput. Biol. Med. 2013, 43, 2230–2237. [Google Scholar] [CrossRef] [PubMed]
Nawaz, R.; Nisar, H.; Voon, Y.V. The Effect of Music on Human Brain; Frequency Domain and Time Series Analysis Using Electroencephalogram. IEEE Access 2018, 6, 45191–45205. [Google Scholar] [CrossRef]
Khare, S.K.; Bajaj, V. Time-Frequency Representation and Convolutional Neural Network-Based Emotion Recognition. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 2901–2909. [Google Scholar] [CrossRef] [PubMed]
Nawaz, R.; Cheah, K.H.; Nisar, H.; Yap, V.V. Comparison of different feature extraction methods for EEG-based emotion recognition. Biocybern. Biomed. Eng. 2020, 40, 910–926. [Google Scholar] [CrossRef]
Picard, R.W.; Vyzas, E.; Healey, J. Toward Machine Emotional Intelligence: Analysis of Affective Physiological State. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 1175–1191. [Google Scholar] [CrossRef]
Rahman, A.; Anjum, A.; Milu, M.H.; Khanam, F.; Uddin, M.S.; Mollah, N. Emotion recognition from EEG-based relative power spectral topography using convolutional neural network. Array 2021, 11, 100072. [Google Scholar] [CrossRef]
Khosla, A.; Khandnor, P.; Chand, T. A comparative analysis of signal processing and classification methods for different applications based on EEG signals. Biocybern. Biomed. Eng. 2020, 40, 649–690. [Google Scholar] [CrossRef]
Novitasari, D.; Suwanto, S.; Bisri, M.; Asyhar, A. Classification of EEG Signals using Fast Fourier Transform (FFT) and Adaptive Neuro Fuzzy Inference System (ANFIS). Mantik J. Mat. 2019, 5, 35–44. [Google Scholar]
Wang, Z.; Tong, Y.; Heng, X. Phase-Locking Value Based Graph Convolutional Neural Networks for Emotion Recognition. IEEE Access 2019, 7, 93711–93722. [Google Scholar] [CrossRef]
Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.-S.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. DEAP: A Database for Emotion Analysis; Using Physiological Signals. IEEE Trans. Affect. Comput. 2012, 3, 18–31. [Google Scholar] [CrossRef]
Ho, N.-H.; Yang, H.-J.; Kim, S.-H.; Lee, G. Multimodal Approach of Speech Emotion Recognition Using Multi-Level Multi-Head Fusion Attention-Based Recurrent Neural Network. IEEE Access 2020, 8, 61672–61686. [Google Scholar] [CrossRef]
Gao, Q.; Yang, Y.; Kang, Q.; Tian, Z.; Song, Y. EEG-based Emotion Recognition with Feature Fusion Networks. Int. J. Mach. Learn. Cybern. 2022, 13, 421–429. [Google Scholar] [CrossRef]

Figure 1. The proposed EEG-ER approach.

Figure 2. Illustrative film segments selected to elicit positive, negative, and neutral emotional responses within the experimental context.

Figure 3. The protocol of SEED dataset.

Figure 4. The pre-processed EEG signal.

Figure 5. The 10–20 electrode distribution (black circles).

Figure 6. The 10–20 electrode distribution.

Figure 7. PSD bands area by Welch’s period gram (blue area).

Figure 9. Confusion matrixes for SEED and DEEP datasets.

Figure 10. Model loss and accuracy of the training and validation of the proposed CNN over SEED dataset with respect to iterations.

Figure 11. Model loss and accuracy of the training and validation of the proposed CNN over DEAP dataset with respect to iterations.

Table 1. The hyperparameters of the 2D CNN model in our work.

Parameters of CNN Model	Value
Mini Batch Size	32
Max Epochs	50 Epochs
Learning Rate	1 × 10⁻⁴
Loss Function	Cross Entropy
Activation Function	ReLU
Pooling	Max Pooling
Optimizer	Adam

Table 2. The classification performance results of the 2D CNN through the SEED dataset.

Dimensions	Accuracy	F1 Score %	Recall	Precision
Positive	98.23%	93.68%	90.78%	94.012%
Neutral	97.95%	93.32%	91.02%	92.89%
Negative	94.54%	92.65%	90.32%	92.087%

Table 3. The classification of performance results of the 2D CNN through the DEAP dataset.

Dimensions	Accuracy	F1 Score %	Recall	Precision
Valence	94.23%	89.32%	86.64%	88.32%
Arousal	93.78%	90.31%	87.26%	91.45%
Dominance	89.54%	81.23%	75.85%	81.10%

Table 4. Comparison of the accuracy of research approaches utilizing the DEAP dataset.

Reference	Features	Classifier	Average Accuracy (%)
[28]	FFT	CNN	88 for 3D VAD
[26]	CAE	FCNN	92
[20]	Wavelet Feature	2D CNN	83.44
[16]	PSD	DNN	93 for arousal
Ours	PSD	2D CNN	92.54

Table 5. Comparison of the accuracy of several research approaches utilizing the SEED dataset.

Reference	Features	Classifier	Average Accuracy (%)
[36]	RPSD-Based Topographic Image for Emotional EEG Data	CNN	89.06
Ours	PSD	2D CNN	96.32

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mahmoud, A.; Amin, K.; Al Rahhal, M.M.; Elkilani, W.S.; Mekhalfi, M.L.; Ibrahim, M. A CNN Approach for Emotion Recognition via EEG. Symmetry 2023, 15, 1822. https://doi.org/10.3390/sym15101822

AMA Style

Mahmoud A, Amin K, Al Rahhal MM, Elkilani WS, Mekhalfi ML, Ibrahim M. A CNN Approach for Emotion Recognition via EEG. Symmetry. 2023; 15(10):1822. https://doi.org/10.3390/sym15101822

Chicago/Turabian Style

Mahmoud, Aseel, Khalid Amin, Mohamad Mahmoud Al Rahhal, Wail S. Elkilani, Mohamed Lamine Mekhalfi, and Mina Ibrahim. 2023. "A CNN Approach for Emotion Recognition via EEG" Symmetry 15, no. 10: 1822. https://doi.org/10.3390/sym15101822

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A CNN Approach for Emotion Recognition via EEG

Abstract

1. Introduction

2. Related Work

3. Proposed Work

3.1. EEG Data Acquisition

3.2. Data Preprocessing

3.3. Feature Extraction

3.4. Classification

4. Results and Discussion

4.1. Evaluation Metrics

4.2. Datasets

4.3. Training Model

4.4. Result Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI