Cross-Subject Emotion Recognition Using Fused Entropy Features of EEG

Zuo, Xin; Zhang, Chi; Hämäläinen, Timo; Gao, Hanbing; Fu, Yu; Cong, Fengyu

doi:10.3390/e24091281

Open AccessArticle

Cross-Subject Emotion Recognition Using Fused Entropy Features of EEG

by

Xin Zuo

^1,2

,

Chi Zhang

^1,3,*

,

Timo Hämäläinen

²

,

Hanbing Gao

¹,

Yu Fu

¹ and

Fengyu Cong

^1,2

¹

School of Biomedical Engineering, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, China

²

Faculty of Information Technology, University of Jyväskylä, 40014 Jyväskylä, Finland

³

Liaoning Key Laboratory of Integrated Circuit and Biomedical Electronic System, Dalian 116024, China

^*

Author to whom correspondence should be addressed.

Entropy 2022, 24(9), 1281; https://doi.org/10.3390/e24091281

Submission received: 20 June 2022 / Revised: 4 September 2022 / Accepted: 5 September 2022 / Published: 11 September 2022

(This article belongs to the Collection Feature Papers in Information Theory)

Download

Browse Figures

Versions Notes

Abstract

:

Emotion recognition based on electroencephalography (EEG) has attracted high interest in fields such as health care, user experience evaluation, and human–computer interaction (HCI), as it plays an important role in human daily life. Although various approaches have been proposed to detect emotion states in previous studies, there is still a need to further study the dynamic changes of EEG in different emotions to detect emotion states accurately. Entropy-based features have been proved to be effective in mining the complexity information in EEG in many areas. However, different entropy features vary in revealing the implicit information of EEG. To improve system reliability, in this paper, we propose a framework for EEG-based cross-subject emotion recognition using fused entropy features and a Bidirectional Long Short-term Memory (BiLSTM) network. Features including approximate entropy (AE), fuzzy entropy (FE), Rényi entropy (RE), differential entropy (DE), and multi-scale entropy (MSE) are first calculated to study dynamic emotional information. Then, we train a BiLSTM classifier with the inputs of entropy features to identify different emotions. Our results show that MSE of EEG is more efficient than other single-entropy features in recognizing emotions. The performance of BiLSTM is further improved with an accuracy of 70.05% using fused entropy features compared with that of single-type feature.

Keywords:

emotion recognition; EEG; feature fusion; MSE; BiLSTM

1. Introduction

Emotion is a specific psychological and physiological response generated by perceiving external and inner stimuli. It is a complex state combining thoughts, feelings, and behaviors and is an important part of daily human life [1]. Previous studies have demonstrated that emotion plays a vital role not only in the process of perception, decision making, and communication but also in the learning and memory process [2]. As a result, the measurement and characterization of different emotion states are of great importance to emotion recognition-related studies both theoretically and practically. For example, emotion recognition can be widely used in areas such as health care, distance learning, and user experience evaluation of products, which are closely related to humans [3]. Furthermore, it contributes to the computer ability of emotion recognition and expression in the human–computer interaction (HCI) field [4]. As emotion is often accompanied by high cognitive activities of the brain involving complex psychology and physiology processes [5], further study on how to recognize different emotions accurately is necessary.

There are many kinds of approaches to recognizing emotion states in existing studies. According to the data used in emotion recognition, they can be roughly divided into two categories. One category is based on non-physiological signals, whereas the other is based on physiological signals. Conventional emotion recognition methods based on non-physiological signals usually use facial expressions, behaviors, and voice-based signals, etc.. The features of these signals are more obvious for observation and easier to be extracted. Jain et al. proposed deep convolutional neural networks for observing emotion states based on different facial motions in different image emotions [6]. Meng et al. developed a speech emotion recognition method using spectrum features of speech signals [7]. There is also emotion recognition research combing different types of non-physiological signals. For instance, Kessous et al. studied a multimodal automatic emotion recognition method using the Bayesian classifier based on a mixture of facial expressions, gestures, and acoustic signals, and they found that fusing the multimodal signals would largely increase classification accuracy compared with unimodal systems [8]. Although the data collection process of these methods is easier, their availability and reliability could not be completely guaranteed, as they are mainly affected by two factors [9]. On the one hand, effective non-physiological signals are hard to obtain from participants who have trouble expressing their feelings through body language. On the other hand, participants can deliberately control their expressions, tone, and postures to hide their real feelings. Contrary to the non-physiological measurements, physiological signals are more reliable and effective, as these signals originate from spontaneous activities of the nervous system which cannot be controlled intentionally [1]. The mostly frequently used physiological signals include autonomic nervous system (ANS) signals such as the electrocardiogram (ECG), the electromyogram (EMG), skin resistance, and blood pressure, and neutral nervous system signals such as EEG, functional magnetic resonance imaging (fMRI), and so on. Kim et al. analyzed the multimodal autonomic physiological signals (i.e., ECG, EMG, respiration, and skin conductivity) induced by music and developed a scheme of emotion-specific classification [10]. The brain signals obtained directly from the neutral nervous system can reflect the dynamic neuro-electrical changes in real-time with high resolution, compared to ANS signals which often include a time delay [9]. In addition, the activity of EEG signals varies in different brain regions while emotional processes occur. Particularly, the lateral temporal brain areas are more active than other areas, and the energy of EEG increases for positive emotion, whereas lower energy appears in neutral and negative emotions [11]. Therefore, the emotional changes in different emotion states can be measured by EEG signals in the lateral temporal region. What is more, the equipment of EEG collection is small in size, portable, and much cheaper than that of fMRI. EEG-based emotion recognition has become one of the most prosperous research fields.

To recognize emotion states accurately based on EEG signals, features revealing the dynamic changes of EEG under different emotions should first be extracted. There are four main types of features used in EEG-based emotion recognition [12]: time domain features (e.g., statistical features and auto-regression coefficient), frequency domain features (e.g., power spectral density and energy spectrum), time-frequency features (e.g., wavelet coefficients), and non-linear dynamic features (e.g., fractal dimension and entropy features). The ability of different features varies in reflecting emotion states. Energy-based features in different brain regions have been commonly adopted in emotion recognition. As different brain regions are activated in different emotions, the energy of different frequency bands in these brain regions can be used for emotion recognition [13]. Du et al. selected sound clips of three affective states (i.e., happy (high arousal), afraid (high arousal), and neutral (low arousal)) to explore frontal asymmetry [14]. Their research demonstrated that the right frontal region is more related to high-arousal emotions (i.e., happiness and fear), whereas the left frontal region correlates with low-arousal emotions (i.e., neutral); thus, the energy asymmetry between the left and right brain can be used to classify emotion states. Further, Liu et al. found that there is a correlation between the emotional states and EEG frequency bands and that high-frequency bands contain more emotional information than low-frequency bands [15]. EEG signals, which are a direct reflection of brain activities, are non-stationary signals with a low signal-to-noise ratio, and the activation of EEG and the information it contains varies in different emotions. It is difficult to analyze EEG signals using only traditional time- or frequency-domain features. In recent years, entropy-based features have been proven to manifest more complex dynamic information in EEG than conventional features, leading to a wide use in many fields [16]. Wang et al. extracted the sample entropy (SE) feature of overnight sleeping EEG data utilizing the assisted sliding box algorithm to show the dynamic changes and a reduction of computation time [17]. Chen et al. proposed a method using the approximate entropy (AE) feature and its transformation to identify four human emotions based on EEG with an accuracy of 83.34% [18]. Zheng et al. trained a Deep Belief Network (DBN) to classify three emotions with the input of the differential entropy (DE) feature [19]. Their results showed that the DE feature of certain brain regions could reflect the dynamic changes of EEG in different emotions and can be used for recognizing emotion states. What is more, entropy features deployed to analyze brain states in other areas may also contribute to emotion recognition. For instance, multiscale entropy (MSE), calculating entropy in multiple time scales, has been proven to achieve better robustness of results than conventional features in fields of disease diagnosis and sleep studies [20]. Hadoush et al. adopted MSE to explore patterns in children with mild and severe autism spectrum disorders (ASD) and found that MSE could serve as an effective index for the severity of ASD [21]. Vladimir et al. explored the changes in brain signal complexity across several distinct global states of consciousness using MSE [22]. The results indicated that MSE changes throughout the sleep cycle and is strongly time-scale dependent, which makes it possible to use MSE for sleep staging. However, a challenge still exists in analyzing EEG signals based on entropy features. These different features characterize the EEG implicit complexity information to varying degrees, but it remains unclear which type of entropy feature is more effective for describing emotional states. Further, previous studies demonstrated that any feature could add complementary information to the other features [23]. Hence, there is a necessity to integrate the advantages of different entropy features to enhance the performance of emotion classification.

To take advantage of the EEG features, researchers have trained a variety of classifiers to recognize different types of emotion states. Traditional classifiers such as SVM, K-Nearest Neighbor (KNN), and transfer learning are widely used for emotion classification. Liu et al. established a real-time EEG-based emotion recognition system using SVM that could successfully classify positive and negative emotions with acceptable results [24]. Kolodyazhniy et al. extracted features from physiological signals induced by different emotional film clips and proposed an affective computing approach based on the KNN classifier [25]. Lan et al. [26] utilized the DE feature of EEG and the transfer learning technique to detect three emotions reaching an accuracy of 72.47% in the SEED dataset. Although traditional classifiers have achieved different recognition performance in simple tasks (e.g., 86.43% accuracy for three positive emotions in [24], 83.34% accuracy for four different emotions in [18], 77.5% accuracy via different types of signals in [25]), they are not efficient enough to learn the contextual dependency in a time series and do not perform well in cross-subject emotion recognition [27]. As we all know, human emotions are a continuous time series, and the current emotion state is influenced by both the current stimulus and previous emotions. In this case, it is difficult for traditional classifiers to recognize human emotion only based on the current feature. The Bidirectional Long Short-term Memory (BiLSTM) network has the ability to learn long- and short-term dependency between time steps and to memorize both forward and backward contextual information in a time series, compared with the one-directional Long Short-term Memory (LSTM) network [28] that is widely used in speech synthesis, pathological voice detection, and motion prediction [29,30,31]. It has been proven to perform effectively in pattern recognition and has been successfully deployed in sequence-to-sequence classification tasks in many fields [32,33,34,35]. Mahmud et al. trained an automated BiLSTM model to detect sleep apnea based on EEG and reached a high accuracy on different publicly available datasets [36]. Chang et al. proposed a depression assessment framework based on the spatiotemporal network of EEG and BiLSTM and achieved more than 70% accuracy in the SEED dataset [37].

Two emotional models widely used in the existing research are the circumplex model of affects (CMA) and the discrete emotion model (DEM). CMA defines emotions in a two-dimensional space with arousal and valence. In the work of Posner et al. about CMA, the authors presented that emotion states occur from the cognitive interpretations of core neural sensations and that CMA is a useful tool to study the development of emotion disorders and cognitive underpinnings of affective processing [38]. DEM conversely supposes each discrete emotional state is a different state. In the study of Kılıç et al., the authors proposed that assigning each emotion as a separate discrete state based on DEM is important in recognizing different emotional states and particularly in neuropsychiatric diseases [39].

Motivated by the fact that discrete emotions are vital for emotion recognition and that different entropy features represent implicit EEG complexity in different degrees, we focus on finding out which entropy feature is the best for characterizing three discrete emotional states (i.e., positive, neutral, and negative) and whether integrating different entropy features can enhance the performance of emotion classification by utilizing BiLSTM in the present study. In this paper, we propose a novel framework for cross-subject emotion recognition based on fused entropy features of EEG and BiLSTM. Our approach is to model a BiLSTM classifier based on the fusion of entropy features in EEG induced by different emotional film clips. We first calculate the MSE feature and four other entropy features of EEG to mine the dynamic changes of EEG in different emotion statuses. Then, a BiLSTM classifier is trained to learn the bidirectional time dependency in the extracted EEG features and to realize emotion recognition.

This paper is organized as follows. Section 2 addresses the EEG dataset used in our work. Section 3 details the adopted methodologies. The results of the research are presented in Section 4 and discussed in Section 5. Finally, Section 6 concludes the paper.

2. Data Resource

The EEG data in this study came from the 2020 World Robot Competition—BCI Control Brain Robot Contest. It consisted of two public datasets, namely SEED [11] and SEED-FRA [40] (the SJTU Emotion EEG Dataset, https://bcmi.sjtu.edu.cn/home/seed/, accessed on 30 October 2014), collections for various emotion research purposes using EEG provided by Shanghai Jiao Tong University. Prior to the data collection, the experiment was approved by the Ethics Committee, Shanghai Jiao Tong University. The data were gathered from 23 healthy subjects (15 Chinese and 8 French) while they watched different emotional film clips in their native language. First, 50 cinema managers were asked to fill in a questionnaire in which they were supposed to describe the emotional valence of at least three film excerpts for each emotion state (i.e., positive, neutral, and negative). The cinema managers were selected because they were likely to have significant knowledge about films, which might contribute to creating a large preliminary list of film scenes [41]. Then, the listed emotional film excerpts were discussed and viewed by the cinema managers to rate their valence scale from 1 (sad) to 9 (happy) using the Self-Assessment Manikin (SAM). The mean and standard values of each film excerpt were calculated to analyze the rating results, and the initial pool of film clips was established. After this step, a pilot trial was executed to test whether the selected film clips could elicit the expected emotions. According to the SAM rating results of subjects, the mean and standard values of each film clip in the pilot trial could be obtained. Emotional film excerpts from five Chinese films and seven French films with the largest mean values and similar standard values were finally selected as the positive stimuli. There were also twelve film excerpts with the smallest mean values and approximative standard values chosen to be negative stimuli. As for the neutral stimuli, they consisted of film excerpts whose mean values were close to five (the mean value of the valence scale), and standard values were similar. Before the experiment, subjects were asked to finish the Eysenck Personality Questionnaire (EPQ), and only those with stable moods were selected. There are three types of emotions (i.e., positive, neutral, and negative) included in the experiment, and each type of emotion had five (for Chinese subjects) or seven (for French subjects) corresponding film clips. Each emotional film clip lasted for 2 min.

The experiment was performed in a quiet room. Figure 1 shows the experiment scene. A 62-channel electrode cap arranged according to the international 10–20 system was used to collect the EEG data. The sampling rate was set to 1000 Hz. Before starting, all subjects were given written and oral instructions on the experiment and were asked to stay as still as possible and refrain from moving. In the experiment, the subjects sat comfortably and paid attention to watching the forthcoming film clips. Eight of the subjects watched 21 film clips (i.e., 21 trials), with seven film clips corresponding to each emotion. The other fifteen subjects were shown 15 film clips, and there were five corresponding film clips for each emotion. The detailed protocol of the experiment is shown in Figure 2. A 5 s picture hint was set before each clip, and there was a 45 s interval after each clip, allowing the subjects to report their emotional states concerning the film clips based on their feelings. The self-reported emotional states were then used to validate the emotion classification results of the study. The details about the database can be found in [11,37,40].

3. Methodology

The analysis process of EEG-based emotion recognition includes three steps: data preprocessing, feature extraction, and emotion recognition. This section describes how we dealt with the data in detail. The analyzing process was implemented in MATLAB 2019b.

3.1. Preprocessing

As previous studies have demonstrated that some electrodes are irrelevant to emotion changes [42], and Zheng et al. found that the lateral temporal brain area is activated more than other brain areas in emotion processing [11], we first selected twelve electrodes (i.e., FT7, T7, TP7, P7, C5, CP5, FT8, T8, TP8, P8, C6, CP6) in the lateral temporal brain area for further research in this paper. The EEG data were then down-sampled to 256 Hz to improve the calculation efficiency. After that, the EEG segments corresponding to each film clip’s duration were extracted to obtain the entire EEG data from watching all the film clips, as the raw EEG data contained the EEG signals not only while watching the films but also in the preparation and self-assessment stages. To reject interference from the power line, we used a bandstop filter of 50 Hz. Five frequency bands (i.e., delta, theta, alpha, beta, and gamma) of the EEG signals were then roughly extracted, applying wavelet decomposition. Finally, a wavelet-based technique was used to remove the artifacts in each band.

Wavelet transform, which is an effective time-frequency analysis method with the ability for good local representation of signals in the time and frequency domain, is usually used to analyze EEG signals [43]. By decomposing the signal at each level, the detailed and approximate component wavelet coefficients can be obtained corresponding to the level. The wavelet coefficients could reflect the detailed information of the signal as well as the correlation with the mother wavelet. In fact, the coefficients of the artifacts are usually larger than those of a normal EEG signal. Therefore, artifacts were eliminated by setting a threshold value [44]. This wavelet-based method has been validated to be effective in the field of driver fatigue assessment [45,46]. We can calculate the threshold by

T_{j} = m e a n (C_{j}) + 2 \times s t d (C_{j})

(1)

where Cj is the wavelet coefficient at the jth level of wavelet decomposition. The value of any coefficient is larger than Tj; it is considered a coefficient of the artifact and halved to eliminate its influence. Then, the wavelet-corrected signal can be reconstructed with the new set of wavelet coefficients. More details about the preprocessing process can be found in Appendix A.

As usually used to resemble EEG signals in the literature [47], db6 was selected as the mother wavelet. The EEG signals from all the twelve electrodes were preprocessed in the same way as mentioned above in this paper. Figure 3 shows the preprocessing results of the gamma frequency band at the FT8 electrode. Figure 3a is a 10 s duration of the original EEG signal with various artifacts. Figure 3b gives the gamma frequency band extracted by wavelet decomposition. The body movements caused large fluctuations in the gamma band were obviously removed, though artifacts induced by blinks still exist. The result of the artifact removal is shown in Figure 3c. It is clear that the wavelet-based thresholding technique can reduce the interference of artifacts in Figure 3b.

3.2. Feature Extraction

Five entropy features were calculated to explore the dynamic changes in EEG induced by different emotional film clips including MSE, AE, FE, DE, and Rényi entropy (RE).

3.2.1. Multi-Scale Entropy

MSE, with the ability to reduce the interfere of residual noise on the results by calculating features in different time scales, was chosen to manifest the dynamic changes while subjects were viewing emotional films [48]. It was proposed firstly by Coasta et al. in 2003 [49] that MSE could reflect the complexity of signals in different scale factors by extending the idea of SE to several time scales. For the EEG signal {x₁, …, x_i, …, x_N}, it is first coarse-grained according to a specified scaling factor τ. In this process, the original signal is divided by sliding windows with a length of τ, and the average value is then calculated in each window to obtain the coarse-grained time series {y^(τ)}. It is defined as

y_{j}^{(τ)} = \frac{1}{τ} \sum_{i = (j - 1) τ + 1}^{j τ} x_{i}, 1 \leq j \leq \frac{N}{τ},

(2)

Then, the SE of the simplified time series {y^(τ)} is calculated at each time scale. For more information about the calculation of SE and the parameter setting, please see Appendix B.

3.2.2. Approximate Entropy

AE proposed by Pincus in 1991 is a kind of nonlinear dynamics parameter to measure the complexity and the statistical quantization characteristics of the signal [50]. Due to its effectiveness in reflecting the structure characteristics and complexity information of signals with fewer data points, it is widely used in time series classification studies. Its formula is

AE (m, r, N) = \frac{1}{N - m + 1} \sum_{i = 1}^{N - m + 1} \log C_{i}^{m} (r) - \frac{1}{N - m} \sum_{i = 1}^{N - m} \log C_{i}^{m + 1} (r),

(3)

where

C_{i}^{m} (r)

can be calculated by

C_{i}^{m} (r) = \frac{B_{i}^{m}}{N - m + 1},

(4)

where

B_{i}^{m}

is the number of matches of dimension m.

The mode dimension m is set as 2, and the tolerance r is equal to the standard deviation of the signal times 0.2. N is the data length, which is set as 256, equals the data points in a 1 s time window without overlap.

3.2.3. Fuzzy Entropy

FE is also a measure of the complexity of signals like AE. Instead of the Heaviside Step Function used in AE, the concept of the fuzzy set is introduced into FE to measure the similarity of two vectors. An exponential function is chosen as the fuzzy function that enables the FE values to change smoothly and continuously with parameters change [51]. In this case, FE is also calculated in our study to make a comparison with other entropy features. It can be calculated by

FE (m, n, r, N) = \ln \frac{O^{m} (n, r)}{O^{m + 1} (n, r)},

(5)

where O^m(n,r) is the mean value of the fuzzy membership of the time series with length N in dimension m and tolerance r; n is used to determine the gradient of the similarity tolerance boundary. More details about the definition of O^m(n,r) and the parameters in Equation (5) are described in detail in Appendix B.

3.2.4. Rényi Entropy

RE is a generalization of Shannon entropy, which reflects the time-frequency features and randomness of signals. It is widely used in information theory, such as classification problems. For a given EEG signal X = {x₁, …, x_i, …, x_N}, its RE can be calculated by

RE = \frac{1}{1 - q} \log (\sum_{i = 1}^{N} p {(i)}^{q}), q \geq 0 & q \neq 1,

(6)

where q is the entropic index, p(i) is the probability of choosing x_i in X, and

\sum_{i = 1}^{N} p (i) = 1

.

According to Kar et al.’s [52] study, we use q = 2 to calculate the two-order entropy in a sliding window with a length of one second.

3.2.5. Differential Entropy

As an extension of Shannon entropy, DE can be utilized to reveal the complexity of a time series [53]. Previous study has proven that DE performs better than energy spectrum and asymmetrical features in EEG-based emotion state detection [54]; thus, we calculated DE to represent the changes in EEG signals in different emotional films. It is defined as

DE = - \int_{a}^{b} f (x) \log (f (x)) d x,

(7)

where f(x) is the probability density function of the time series and [a, b] is the taking value interval. If the time series is approximately a Gaussian distribution N(μ,σ²), its DE can then be calculated by

DE = - \int_{- \infty}^{+ \infty} \frac{1}{\sqrt{2 π σ^{2}}} e^{\frac{{(x - μ)}^{2}}{2 σ^{2}}} \log (\frac{1}{\sqrt{2 π σ^{2}}} e^{\frac{{(x - μ)}^{2}}{2 σ^{2}}}) d x = \frac{1}{2} \log 2 π e σ^{2},

(8)

where μ and σ² are the expectation and variance of the time series. In our present work, DE is extracted from the signals in a sliding window of a length 1 s without overlap.

3.3. BiLSTM

As an update to LSTM, BiLSTM not only possesses the ability to avoid the receding gradient problem but also memorizes long- and short-term dependency of EEG in a forward direction as well as in a backward direction [55]. As shown in Figure 4, we can see that BiLSTM works in a way that integrates two LSTM together composed of LSTM memory cells.

LSTM memory cells contain four neural network layers compared to conventional RNN cells with only one layer to model the long-term context. The structures called “gate” consist of neural network layers, and their interactions make it possible for a LSTM memory cell to add or remove information from the cell state [56]. The details about how the memory cell works can be found in Appendix C. Figure 5 shows the structure of a LSTM memory cell.

4. Results

4.1. Feature Extraction

In this paper, the five-scale MSE feature was first extracted from the five frequency bands for the selected twelve electrodes after preprocessing. Four other kinds of entropy-based features mentioned in Section 3.2 were also calculated. A non-overlapped sliding window of 1 s was used in the feature extraction procedure. The dimension of the obtained MSE feature matrix for each subject is 300 × N, and the other features (i.e., AE, FE, RE, and DE) share the same dimension as 60 × N, where N stands for the sampling time. Figure 6 shows parts of the preprocessed gamma frequency band and its entropy features in FT8. The red numbers in Figure 6 are the emotion types of the film clips’ duration, which are in accordance with the self-reported emotional states. A positive emotion is marked as “1”, and “0” and “−1” represent neutral and negative emotions, respectively. The interval between two adjacent purple dashed lines corresponds to the film clip.

It can be seen from Figure 6 that the waveform of the gamma band after preprocessing varies in different emotional films, and the five entropy-based features change regularly corresponding to the film clips. In Figure 6a, the amplitudes of watching a positive film are obviously larger than those of the other two emotional films. The amplitudes of negative movies rank in second place, followed by those of neutral movies. The fluctuations of DE and FE are similar, as shown in Figure 6b,d. The highest peak values occur during positive emotion film clips, and the lowest valleys appear while subjects watch neutral film clips. The feature values of watching negative film clips are positioned between these two conditions. Additionally, AE and RE share the same waveforms, as can be seen in Figure 6c,e, which are contrary to those of DE and FE. As for the result of MSE in Figure 6f, there is a slightly increasing tendency in the values of MSE when subjects were watching positive and negative films compared with the neutral films, whereas no obvious differences can be seen between positive and negative films in MSE.

4.2. The Classification Results of BiLSTM

BiLSTM is applied to classify the emotion states of the subjects in order to explore the long-term dependency and interplay of the extracted features at different times. We trained BiLSTM models for each kind of feature and the fused entropy features separately. Then the performance of BiLSTM utilizing a single-type feature was compared with that of fused entropy features. Further, the result was also compared with conventional LSTM to make the results more convincing. The five types of feature matrixes obtained in Section 4.1 were first normalized to (−1, 1) to eliminate the effect of individual differences. Then, the normalized feature matrixes could be directly fed into the classifier. As for the output (i.e., the category label vectors), it can be set according to the sequence of the film clips and the results of the self-assessment. The dimension of the label vectors is 1 × N, where N is the sampling time. There are in total three categories in this paper: positive, neutral, and negative. The training data come from eighteen subjects selected randomly from all, and the data of the remaining five subjects was set as testing data. The recognition accuracy was defined as the average accuracy of the five subjects in the testing group. The results of different entropy features are shown in Table 1. “ALL” means the five entropy features.

From Table 1, it can be seen that the performance of BiLSTM is clearly better than that of LSTM. As for the result comparison the two models based on single-feature inputs, LSTM and BiLSTM with the input of MSE achieves the best result, reaching at 66.12% and 67.9%, respectively, and they are slightly higher than those of DE, which was proven to be a better feature to classify different emotion states in a previous study [11]. The RE feature leads to the lowest accuracy of 54.23% in LSTM and 57.15% in BiLSTM, and the accuracies of AE and FE lie between RE and AE. Furthermore, the two models’ performance is apparently enhanced while using fused entropy features to detect emotion states peaking at 67.22% in LSTM and 70.05% in BiLSTM.

5. Discussion

Different types of entropy features of EEG were calculated to explore the dynamic changes of EEG while subjects were watching emotional film clips, and the results of the gamma frequency band in FT8 are shown in Figure 6. It can be seen that the entropy values are significantly different for different emotional films. The higher peaks appear in positive and negative films in the results of DE, FE, and MSE, and the lowest valleys occur in neutral films. Subjects are highly stimulated in emotion when exposed to the positive and negative films; in these cases, the brain activity is usually active and complex (see Figure 6a), and it objectifies a high complexity in the gamma band. However, subjects are not as immersed in the neutral films as during the other two conditions. Thus, there is a decrease in complexity and the lowest entropy values appear. The results are accordant with previous studies showing that greater activities of the gamma band can be found in positive states and the lowest activities are in neutral conditions [15,57]. As for the results of RE, the positive films induce the largest absolute values followed by negative films, and the smallest absolute values are caused by neutral films. The results are consistent among DE, FE, and MSE. RE is a reflection of the amplitude’s distribution; a smaller RE is obtained when the EEG amplitude concentrates in a certain subsequence, indicating the signal is more ordered and less complex [58]. Hence, a larger absolute value of RE can be seen in a more active gamma band when subjects are greatly affected by positive and negative films. Subjects are usually more engaged in viewing positive film clips than neutral clips, generating more active brain activities, but the complexity changes in EEG may be not large enough to be detected by AE. Different from the changing rules of the above four features, AE reaches its highest value in neutral emotion and its lowest value in positive emotion, as shown in Figure 6c. As we know, AE was proposed to measure the average logarithmic conditional probability of the new pattern’s occurrence in a signal with the dimension change [50]. It introduces self-matches into calculation and will inevitably lead to calculation bias, which can result in an insensitivity to small changes in complexity [59]. Thus, the small complexity changes in brain activity during the positive film’s duration may be ignored. It decreases the detected new patterns and leads to the lower AE value in positive emotion.

The extracted entropy features are used to detect emotion states by training a BiLSTM model. The classification results in Table 1 illustrate that the mean accuracy of MSE is slightly higher than that of the other four entropy features using BiLSTM, as it can reveal the complex information of EEG in film watching as well as reduce the influence of the residual noise on the results to some extent [60]. Moreover, the performance of BiLSTM is further improved with multiple entropy features, reaching to 70.05%. Different features can compensate for each other according to the study in [23]. Then, the more useful information of EEG features can be learned by BiLSTM, thus contributing to a higher accuracy in emotion recognition. Additionally, the traditional one-directional LSTM was also trained to make a comparison with BiLSTM. Table 1 indicates that the trained BiLSTM performs better than LSTM since it can learn the long- and short-term dependency among EEG features in a forward direction and in a reverse direction [55]. This finding is consistent with the finding that utilizing the BiLSTM classifier is an efficient way to decrease the train and test error and to increase the classification accuracy [61]. The accuracy is comparable with that in the research in [26], who utilized the DE feature and transfer learning to detect three emotions and reached an accuracy of 72.47% in the SEED dataset. However, our accuracy is lower than that in [11] who used the DE feature and DBN to recognize different emotions in the SEED dataset. This might be because SEED includes only Chinese subjects, and the data we obtained from the competition include not only Chinese but also French subjects (i.e., the data used in our study consist of two datasets: SEED and SEED-FRA). Although the subjects watched their native language films during the experiment to elicit emotional changes more easily, and the stimulus types of the films are the same, there are differences between the Chinese and French subjects, which may lead to the lower accuracy in our study. In a study utilizing the same data as ours, a similar accuracy of about 70% was obtained in [37] for depression recognition.

This study shows the feasibility of recognizing emotion status by deploying multiple entropy features. However, as the dataset we used contains subjects and stimuli of two native languages, and film clips in different emotional categories vary in the degree of induced emotion, there are still limitations in the present work. First, the used dataset consisted of two public datasets, which involved both Chinese and French subjects and stimuli. Each subject watched films in his native language to elicit emotional changes more easily, but the number of film clips for the two languages was different. Since the stimulus types of the films were the same, we assume the different number of film clips for the subjects had no effect on the results in the present work, though influences we do not realize might exist in the results because of the different native speakers and the number of film clips. Further, subjective labeling of participant emotional states was adopted to recognize the emotions in our study, as it is beneficial to the feature analysis for the same type of emotion states and to ensure the reliability of the results by providing more accurate feelings of the subjects. Although most of studies about emotions in the literature choose subjective labeling, there is research that selects objective labeling to label the emotional states, which can be further studied in our future experiment design. Additionally, the results obtained in our present work can be further improved by utilizing some new, more effective algorithms. Electrical Source Imaging (ESI), an emerging algorithm to reconstruct brain or cardiac electrical activity from electrical potentials measured away from the brain, can determine the location of current sources from measurements of voltages [62]. This would be a novel and interesting topic for estimating the cortex brain regions involved in each video viewing to improve the emotion recognition accuracy in our future work.

6. Conclusions

In this paper, we proposed a cross-subject emotion recognition framework based on fused EEG entropy features and a BiLSTM classifier. It demonstrates that MSE is more effective in analyzing the complex emotion information in EEG than the other entropy features when adopting a single EEG feature. What is more, the classification accuracy can be apparently increased by combining all entropy features, which proves that there is information compensation among different types of features.

Future work mainly includes two aspects. One aspect is to extract more features from different perspectives, such as time-frequency domain features and non-linear dynamic features, for feature fusion; the other is enhancing the performance of the classifier with a new feature fusion algorithm or by estimating the cortex brain regions involved in watching emotional film clips by applying Electrical Source Imaging.

Author Contributions

Conceptualization, C.Z.; methodology, X.Z. and C.Z.; software, X.Z.; validation, X.Z.; formal analysis, X.Z.; investigation, X.Z.; data curation, H.G. and Y.F.; writing—original draft preparation, X.Z.; writing—review and editing, C.Z. and X.Z.; supervision, C.Z., T.H., and F.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (no. 61703069 and 62001312), the National Foundation in China (no. JCKY2019110B009), the Fundamental Research Funds for the Central Universities (no. DUT21GF301) and the Science and Technology Planning Project of Liaoning Province (no. 2021JH1/10400049).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: [https://bcmi.sjtu.edu.cn/home/seed/, accessed on 30 October 2014].

Acknowledgments

We would like to thank Shuo Cao from Dalian University of Technology for her support and helpful suggestions.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Preprocessing

In this section, we preprocessed the EEG signals to remove the artifacts and obtain five frequency bands. We first downsampled the signals from 1000 Hz to 256 Hz to enhance the calculation efficiency. Then, the EEG segments during film clip watching were extracted. After that, to obtain different frequency band rhythms, a bandstop Butterworth filter of order 4 with cutoff frequencies of 0.19 and 0.2 was used to remove the power line, and wavelet decomposition was applied to roughly obtain the five frequency bands of EEG signals. A wavelet-based technique was chosen to remove the artifacts in the end.

Wavelet transform is a well-known time-frequency analysis algorithm and has been widely adopted to deal with non-stationary signals. The original signals are decomposed and expressed with a scaled and shifted version of the mother wavelet ψ(t) and a scaling function ϕ(t) [52]. The discrete mother wavelet can be expressed as

ψ_{j, k} (t) = 2^{\frac{j}{2}} ψ (2^{- j} t - k), j, k \in Z,

(A1)

The signal S(t) can then be represented as

S (t) = \sum_{k} s_{j} (k) ϕ_{j, k} (t) + \sum_{k} d_{j} (k) ψ_{j, k} (t),

(A2)

where s_j(k) is the approximate coefficient at the jth level, and d_j(k) represents the detailed coefficient.

In this study, the EEG signal was decomposed into seven levels using db6 (Daubechies family), as the waveform of db6 is similar to the EEG signal and it has been widely used in decomposing EEG in the literature [47]. The delta (1–4 Hz), theta (4–8 Hz), alpha (8–13 Hz), beta (13–30 Hz), and gamma (30–50 Hz) frequency bands were roughly obtained by reconstructing the detailed components at levels of seven, five, four, three, and two, respectively.

To eliminate the artifacts’ interference, the five frequency band rhythms are decomposed separately into three levels with a 1 s sliding window, and the thresholds of each level are calculated by Equation (1). The coefficient is halved if its value is larger than the calculated threshold. In this way, a new set of signals is generated without artifacts.

Appendix B. Feature Extraction

This section lists some details about the well-known formulas and the parameter setting to extract the entropy features of the five frequency bands, which provides supplementary information for Section 3.2.

After the first coarse graining step in extracting the MSE feature (see Section 3.2.1), the SE is then calculated for the new time series in different scales by the following formula

SE (m, r, N) = - \ln [\frac{B^{m + 1} (r)}{B^{m} (r)}],

(A3)

where m is the mode dimension of the data vector, r is the tolerance for similarity matches, and N is the number of data points.

B^{m} (r)

and

B^{m + 1} (r)

are the numbers of matches for dimension m and m + 1, respectively.

In this paper, the five-scale of the MSE feature is calculated with a 1 s sliding window for the five frequency bands. The mode dimension m is set as 2, and the tolerance r is equal to the standard deviation of the signal times 0.2.

As for FE in Section 3.2.3, the parameter O^m(n,r) in Equation (5) is defined as

O^{m} (n, r) = \frac{1}{N - m} \sum_{i = 1}^{N - m} (\frac{1}{N - m - 1} \sum_{j = 1, j \neq i}^{N - m} D_{i j}^{m}),

(A4)

D_{i j}^{m} = f (d_{i j}^{m}, n, r),

(A5)

X^{m} (i) = [x (i), x (i + 1), \dots, x (i + m - 1)],

(A6)

X^{m} (j) = [x (j), x (j + 1), \dots, x (j + m - 1)],

(A7)

where n is the fuzzy exponent, and

d_{i j}^{m}

, the distance between vectors X^m(i) and X^m(j), is defined as the absolute value of the maximum difference between the corresponding elements in the two vectors.

D_{i j}^{m}

is then calculated by the fuzzy function f(

d_{i j}^{m}

,n,r) to show the similarity of vectors X^m(i) and X^m(j).

The same parameters of m, r, and N with AE are used. n is set as 2 according to the research of Chen et al. [51].

Appendix C. BiLSTM Classifier

As mentioned in Section 3.3, BiLSTM is an integration of two opposite-direction LSTM and consists of LSTM memory cells. A LSTM cell contains an input gate, output gate, and a forget gate (see Figure 5), which serve to protect and control the cell state C_t. The first step in a LSTM cell is to select that which should be deleted in C_t₋₁, and it is realized by the forget gate:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f}),

(A8)

Then, the input gate consisting of a sigmoid layer chooses what kind of values need to be updated (Equation (A9)), and a tanh layer creates a new candidate cell state

{\tilde{C}}_{t}

from Equation (A10). Following these two steps, the new cell state C_t can be obtained from Equation (A11):

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}),

(A9)

{\tilde{C}}_{t} = \tanh (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C})

(A10)

C_{t} = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t},

(A11)

In the end, the output gate and a tanh layer are activated to decide the final output h_t, as expressed in Equations (A12) and (A13):

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}),

(A12)

h_{t} = o_{t} * \tanh (C_{t})

(A13)

In the equations, σ and tanh represent the activation functions, x_t denotes the input of the cell (the EEG feature of time t), and W, b and h are the weight, bias, and hidden state of the gates, separately.

In this paper, we trained and tested various BiLSTM and LSTM classifiers with different parameters to find the best parameters for each feature. The Adam optimizer was utilized as the optimizer of the classifier. We selected the data of 18 subjects as the training set, and that of the remaining five subjects was the testing set. The average accuracy of the test set was used to evaluate the classifier’s performance. The details of the parameters tested are listed in Table A1.

Table A1. The detailed parameters trained in BiLSTM and LSTM.

Name	Value
Hidden units	[10:150] with step of 10
Epochs	[50:150] with step of 20
Mini batch size	[50:100] with step of 10
Learning rate	0.001

References

Shu, L.; Xie, J.; Yang, M.; Li, Z.; Li, Z.; Liao, D.; Xu, X.; Yang, X. A Review of Emotion Recognition Using Physiological Signals. Sensors 2018, 18, 2074. [Google Scholar] [CrossRef] [PubMed]
Picard, R.W. Affective Computing: Challenges. Int. J. Hum. Comput. Stud. 2003, 59, 55–64. [Google Scholar] [CrossRef]
Siriwardhana, S.; Kaluarachchi, T.; Billinghurst, M.; Nanayakkara, S. Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion. IEEE Access 2020, 8, 176274–176285. [Google Scholar] [CrossRef]
Batbaatar, E.; Li, M.; Ryu, K.H. Semantic-Emotion Neural Network for Emotion Recognition from Text. IEEE Access 2019, 7, 111866–111878. [Google Scholar] [CrossRef]
Martinez, H.P.; Bengio, Y.; Yannakakis, G.N. Learning Deep Physiological Models of Affect. IEEE Comput. Intell. Mag. 2013, 8, 20–33. [Google Scholar] [CrossRef]
Jain, D.K.; Shamsolmoali, P.; Sehdev, P. Extended Deep Neural Network for Facial Emotion Recognition. Pattern Recognit. Lett. 2019, 120, 69–74. [Google Scholar] [CrossRef]
Meng, H.; Yan, T.; Yuan, F.; Wei, H. Speech Emotion Recognition from 3D Log-Mel Spectrograms with Deep Learning Network. IEEE Access 2019, 7, 125868–125881. [Google Scholar] [CrossRef]
Kessous, L.; Castellano, G.; Caridakis, G. Multimodal Emotion Recognition in Speech-Based Interaction Using Facial Expression, Body Gesture and Acoustic Analysis. J. Multimodal User Interfaces 2010, 3, 33–48. [Google Scholar] [CrossRef]
Zhang, J.; Yin, Z.; Chen, P.; Nichele, S. Emotion Recognition Using Multi-Modal Data and Machine Learning Techniques: A Tutorial and Review. Inf. Fusion 2020, 59, 103–126. [Google Scholar] [CrossRef]
Kim, J.; André, E. Emotion Recognition Based on Physiological Changes in Music Listening. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 2067–2083. [Google Scholar] [CrossRef]
Zheng, W.-L.; Lu, B.-L. Investigating Critical Frequency Bands and Channels for EEG-Based Emotion Recognition with Deep Neural Networks. IEEE Trans. Auton. Ment. Dev. 2015, 7, 162–175. [Google Scholar] [CrossRef]
Egger, M.; Ley, M.; Hanke, S. Emotion Recognition from Physiological Signal Analysis: A Review. Electron. Notes Theor. Comput. Sci. 2019, 343, 35–55. [Google Scholar] [CrossRef]
Du, R.; Lee, H.J. Power Spectral Performance Analysis of EEG during Emotional Auditory Experiment. In Proceedings of the 2014 International Conference on Audio, Language and Image Processing, Shanghai, China, 7–9 July 2014; pp. 64–68. [Google Scholar] [CrossRef]
Du, R.; Lee, H.J. Frontal Alpha Asymmetry during the Audio Emotional Experiment Revealed by Event-Related Spectral Perturbation. In Proceedings of the 2015 8th International Conference on Biomedical Engineering and Informatics (BMEI), Shenyang, China, 14–16 October 2015; pp. 531–536. [Google Scholar] [CrossRef]
Liu, S.; Meng, J.; Zhang, D.; Yang, J.; Zhao, X.; He, F.; Qi, H.; Ming, D. Emotion Recognition Based on EEG Changes in Movie Viewing. In Proceedings of the 2015 7th International IEEE/EMBS Conference on Neural Engineering (NER), Montpellier, France, 22–24 April 2015; pp. 1036–1039. [Google Scholar] [CrossRef]
Mehmood, R.M.; Lee, H.J. A Novel Feature Extraction Method Based on Late Positive Potential for Emotion Recognition in Human Brain Signal Patterns. Comput. Electr. Eng. 2016, 53, 444–457. [Google Scholar] [CrossRef]
Wang, Y.-H.; Chen, I.-Y.; Chiueh, H.; Liang, S.-F. A Low-Cost Implementation of Sample Entropy in Wearable Embedded Systems: An Example of Online Analysis for Sleep EEG. IEEE Trans. Instrum. Meas. 2021, 70, 1–12. [Google Scholar] [CrossRef]
Chen, T.; Ju, S.; Yuan, X.; Elhoseny, M.; Ren, F.; Fan, M.; Chen, Z. Emotion Recognition Using Empirical Mode Decomposition and Approximation Entropy. Comput. Electr. Eng. 2018, 72, 383–392. [Google Scholar] [CrossRef]
Zheng, W.-L.; Guo, H.-T.; Lu, B.-L. Revealing Critical Channels and Frequency Bands for Emotion Recognition from EEG with Deep Belief Network. In Proceedings of the 2015 7th International IEEE/EMBS Conference on Neural Engineering (NER), Montpellier, France, 22–24 April 2015; pp. 154–157. [Google Scholar]
Ferrario, M.; Signorini, M.; Magenes, G.; Cerutti, S. Comparison of Entropy-Based Regularity Estimators: Application to the Fetal Heart Rate Signal for the Identification of Fetal Distress. IEEE Trans. Biomed. Eng. 2006, 53, 119–125. [Google Scholar] [CrossRef]
Hadoush, H.; Alafeef, M.; Abdulhay, E. Brain Complexity in Children with Mild and Severe Autism Spectrum Disorders: Analysis of Multiscale Entropy in EEG. Brain Topogr. 2019, 32, 914–921. [Google Scholar] [CrossRef]
Miskovic, V.; MacDonald, K.J.; Rhodes, L.J.; Cote, K.A. Changes in EEG Multiscale Entropy and Power-Law Frequency Scaling During the Human Sleep Cycle. Hum. Brain Mapp. 2019, 40, 538–551. [Google Scholar] [CrossRef]
Hasan, J.; Kim, J.-M. A Hybrid Feature Pool-Based Emotional Stress State Detection Algorithm Using EEG Signals. Brain Sci. 2019, 9, 376. [Google Scholar] [CrossRef]
Liu, Y.-J.; Yu, M.; Zhao, G.; Song, J.; Ge, Y.; Shi, Y. Real-Time Movie-Induced Discrete Emotion Recognition from EEG Signals. IEEE Trans. Affect. Comput. 2017, 9, 550–562. [Google Scholar] [CrossRef]
Kolodyazhniy, V.; Kreibig, S.D.; Gross, J.J.; Roth, W.T.; Wilhelm, F.H. An Affective Computing Approach to Physiological Emotion Specificity: Toward Subject-Independent and Stimulus-Independent Classification of Film-Induced Emotions. Psychophysiology 2011, 48, 908–922. [Google Scholar] [CrossRef] [PubMed]
Lan, Z.; Sourina, O.; Wang, L.; Scherer, R.; Muller-Putz, G.R. Domain Adaptation Techniques for EEG-Based Emotion Recognition: A Comparative Study on Two Public Datasets. IEEE Trans. Cogn. Dev. Syst. 2019, 11, 85–94. [Google Scholar] [CrossRef]
Wöllmer, M.; Kaiser, M.; Eyben, F.; Schuller, B.; Rigoll, G. LSTM-Modeling of Continuous Emotions in an Audiovisual Affect Recognition Framework. Image Vis. Comput. 2013, 31, 153–163. [Google Scholar] [CrossRef]
Liu, G.; Guo, J. Bidirectional LSTM with Attention Mechanism and Convolutional Layer for Text Classification. Neurocomputing 2019, 337, 325–338. [Google Scholar] [CrossRef]
Narendra, N.P.; Alku, P. Glottal Source Information for Pathological Voice Detection. IEEE Access 2020, 8, 67745–67755. [Google Scholar] [CrossRef]
Bollepalli, B.; Airaksinen, M.; Alku, P. Lombard Speech Synthesis Using Long Short-Term Memory Recurrent Neural Networks. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Orleans, LA, USA, 5–9 March 2017; pp. 5505–5509. [Google Scholar] [CrossRef]
Carrara, F.; Elias, P.; Sedmidubsky, J.; Zezula, P. LSTM-Based Real-Time Action Detection and Prediction in Human Motion Streams. Multimed. Tools Appl. 2019, 78, 27309–27331. [Google Scholar] [CrossRef]
Sun, Q.; Wang, C.; Guo, Y.; Yuan, W.; Fu, R. Research on a Cognitive Distraction Recognition Model for Intelligent Driving Systems Based on Real Vehicle Experiments. Sensors 2020, 20, 4426. [Google Scholar] [CrossRef]
Manoharan, T.A.; Radhakrishnan, M. Region-Wise Brain Response Classification of ASD Children Using EEG and BiLSTM RNN. Clin. EEG Neurosci. 2021, 15500594211054990. [Google Scholar] [CrossRef]
Fernando, T.; Denman, S.; Sridharan, S.; Fookes, C. Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection. Neural Netw. 2018, 108, 466–478. [Google Scholar] [CrossRef]
Joshi, V.M.; Ghongade, R.B. EEG Based Emotion Detection Using Fourth Order Spectral Moment and Deep Learning. Biomed. Signal Process. Control 2021, 68, 102755. [Google Scholar] [CrossRef]
Mahmud, T.; Khan, I.A.; Mahmud, T.I.; Fattah, S.A.; Zhu, W.-P.; Ahmad, M.O. Sleep Apnea Detection from Variational Mode Decomposed EEG Signal Using a Hybrid CNN-BiLSTM. IEEE Access 2021, 9, 102355–102367. [Google Scholar] [CrossRef]
Chang, H.; Zong, Y.; Zheng, W.; Tang, C.; Zhu, J.; Li, X. Depression Assessment Method: An EEG Emotion Recognition Framework Based on Spatiotemporal Neural Network. Front. Psychiatry 2022, 12, 837149. [Google Scholar] [CrossRef] [PubMed]
Posner, J.; Russell, J.A.; Peterson, B.S. The Circumplex Model of Affect: An Integrative Approach to Affective Neuroscience, Cognitive Development, and Psychopathology. Dev. Psychopathol. 2005, 17, 715–734. [Google Scholar] [CrossRef] [PubMed]
Kılıç, B.; Aydın, S. Classification of Contrasting Discrete Emotional States Indicated by EEG Based Graph Theoretical Network Measures. Neuroinformatics 2022, 1–15. [Google Scholar] [CrossRef]
Liu, W.; Zheng, W.-L.; Li, Z.; Wu, S.-Y.; Gan, L.; Lu, B.-L. Identifying Similarities and Differences in Emotion Recognition with EEG and Eye Movements among Chinese, German, and French People. J. Neural Eng. 2022, 19, 026012. [Google Scholar] [CrossRef]
Schaefer, A.; Nils, F.; Sanchez, X.; Philippot, P. Assessing the Effectiveness of a Large Database of Emotion-Eliciting Films: A New Tool for Emotion Researchers. Cogn. Emot. 2010, 24, 1153–1172. [Google Scholar] [CrossRef]
Nie, D.; Wang, X.-W.; Shi, L.-C.; Lu, B.-L. EEG-Based Emotion Recognition during Watching Movies. In Proceedings of the 2011 5th International IEEE/EMBS Conference on Neural Engineering, Cancun, Mexico, 27 April–1 May 2011; pp. 667–670. [Google Scholar] [CrossRef]
Gurudath, N.; Riley, H.B. Drowsy Driving Detection by EEG Analysis Using Wavelet Transform and K-means Clustering. Procedia Comput. Sci. 2014, 34, 400–409. [Google Scholar] [CrossRef]
Kumar, P.S.; Arumuganathan, R.; Sivakumar, K.; Vimal, C. A Wavelet Based Statistical Method for De-Noising of Ocular Artifacts in EEG Signals. IJCSNS Int. J. Comput. Sci. Netw. Secur. 2008, 8, 87–92. [Google Scholar]
Zhang, C.; Sun, L.; Cong, F.; Ristaniemi, T. Spatiotemporal Dynamical Analysis of Brain Activity During Mental Fatigue Process. IEEE Trans. Cogn. Dev. Syst. 2020, 13, 593–606. [Google Scholar] [CrossRef]
Zhang, C.; Cong, F.; Kujala, T.; Liu, W.; Liu, J.; Parviainen, T.; Ristaniemi, T. Network Entropy for the Sequence Analysis of Functional Connectivity Graphs of the Brain. Entropy 2018, 20, 311. [Google Scholar] [CrossRef]
Poorna, S.S.; Raghav, R.; Nandan, A.; Nair, G.J. EEG Based Control—A Study Using Wavelet Features. In Proceedings of the 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India, 19–22 September 2018; pp. 550–553. [Google Scholar] [CrossRef]
Costa, M.; Goldberger, A.L.; Peng, C.-K. Multiscale Entropy Analysis of Biological Signals. Phys. Rev. E 2005, 71, 21906. [Google Scholar] [CrossRef] [PubMed]
Costa, M.; Peng, C.-K.; Goldberger, A.L.; Hausdorff, J.M. Multiscale Entropy Analysis of Human Gait Dynamics. Phys. A Stat. Mech. Appl. 2003, 330, 53–60. [Google Scholar] [CrossRef] [PubMed]
Pincus, S.M. Approximate Entropy As a Measure of System Complexity. Proc. Natl. Acad. Sci. USA 1991, 88, 2297–2301. [Google Scholar] [CrossRef] [PubMed]
Chen, W.; Wang, Z.; Xie, H.; Yu, W. Characterization of Surface EMG Signal Based on Fuzzy Entropy. IEEE Trans. Neural Syst. Rehabil. Eng. 2007, 15, 266–272. [Google Scholar] [CrossRef]
Kar, S.; Bhagat, M.; Routray, A. EEG Signal Analysis for the Assessment and Quantification of Driver’s Fatigue. Transp. Res. Part F Traffic Psychol. Behav. 2010, 13, 297–306. [Google Scholar] [CrossRef]
Feutrill, A.; Roughan, M. A Review of Shannon and Differential Entropy Rate Estimation. Entropy 2021, 23, 1046. [Google Scholar] [CrossRef]
Duan, R.-N.; Zhu, J.-Y.; Lu, B.-L. Differential Entropy Feature for EEG-Based Emotion Classification. In Proceedings of the 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER), Diego, CA, USA, 6–8 November 2013; pp. 81–84. [Google Scholar]
Yang, J.; Huang, X.; Wu, H.; Yang, X. EEG-based emotion classification based on Bidirectional Long Short-Term Memory Network. Procedia Comput. Sci. 2020, 174, 491–504. [Google Scholar] [CrossRef]
Wu, E.Q.; Xiong, P.; Tang, Z.-R.; Li, G.-J.; Song, A.; Zhu, L.-M. Detecting Dynamic Behavior of Brain Fatigue Through 3-D-CNN-LSTM. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 90–100. [Google Scholar] [CrossRef]
Martini, N.; Menicucci, D.; Sebastiani, L.; Bedini, R.; Pingitore, A.; Vanello, N.; Milanesi, M.; Landini, L.; Gemignani, A. The Dynamics of EEG Gamma Responses to Unpleasant Visual Stimuli: From Local Activity to Functional Connectivity. NeuroImage 2012, 60, 922–932. [Google Scholar] [CrossRef]
Xie, O.; Liu, Z.-T.; Ding, X.-W. Electroencephalogram Emotion Recognition Based on a Stacking Classification Model. In Proceedings of the 2018 37th Chinese Control Conference (CCC), Wuhan, China, 25–27 July 2018; pp. 5544–5548. [Google Scholar] [CrossRef]
Richman, J.S.; Moorman, J.R. Physiological Time-Series Analysis Using Approximate Entropy and Sample Entropy. Am. J. Physiol. Heart Circ. Physiol. 2000, 278, H2039–H2049. [Google Scholar] [CrossRef]
Bhattacharyya, A.; Tripathy, R.K.; Garg, L.; Pachori, R.B. A Novel Multivariate-Multiscale Approach for Computing EEG Spectral and Temporal Complexity for Human Emotion Recognition. IEEE Sens. J. 2021, 21, 3579–3591. [Google Scholar] [CrossRef]
Kouchak, S.M.; Gaffar, A. Using Bidirectional Long-Short Term Memory with Attention Layer to Estimate Driver Behavior. In Proceedings of the 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 315–320. [Google Scholar] [CrossRef]
National Research Council (US) and Institute of Medicine (US) Committee on the Mathematics and Physics of Emerging Dynamic Biomedical Imaging. Chapter 8, Electrical Source Imaging. In Mathematics and Physics of Emerging Biomedical Imaging; National Academies Press (US): Washington, DC, USA, 1996. Available online: https://www.ncbi.nlm.nih.gov/books/NBK232494/ (accessed on 4 September 2022).

Figure 1. The emotion experiment scene.

Figure 2. Protocol of the experiment.

Figure 3. EEG signal preprocessing; the units are µV. (a) The original EEG signal; (b) the gamma wave obtained by wavelet decomposition; (c) the gamma wave after artifact removal.

Figure 4. BiLSTM network architecture. It consists of five layers (input layer, BiLSTM layer, fully connection layer, softmax layer, and classification layer). x_t is the EEG feature of time t. h_t is the hidden state of LSTM cell in time t.

Figure 5. The details of a LSTM memory cell. It contains two kinds of activation functions (

σ

and tanh). C_t is the LSTM cell state in time t. f_t, i_t, and o_t represent the outputs of forget gate, input gate, and output gate in time t separately.

Figure 5. The details of a LSTM memory cell. It contains two kinds of activation functions (

σ

and tanh). C_t is the LSTM cell state in time t. f_t, i_t, and o_t represent the outputs of forget gate, input gate, and output gate in time t separately.

Figure 6. The results of five entropy-based features of EEG. The numbers “1”, “0”, and “−1” represent positive, neutral, and negative emotions, respectively. The purple dashed lines show the boundaries of different film clips. (a) Preprocessed gamma frequency band; (b) DE; (c) AE; (d) FE; (e) RE; (f) MSE.

Table 1. The mean accuracies of BiLSTM and LSTM for different features (%).

Feature	AE	FE	RE	DE	MSE	ALL
LSTM	61.1	59.47	54.23	65.09	66.12	67.22
BiLSTM	63.43	61.1	57.15	66.34	67.9	70.05

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zuo, X.; Zhang, C.; Hämäläinen, T.; Gao, H.; Fu, Y.; Cong, F. Cross-Subject Emotion Recognition Using Fused Entropy Features of EEG. Entropy 2022, 24, 1281. https://doi.org/10.3390/e24091281

AMA Style

Zuo X, Zhang C, Hämäläinen T, Gao H, Fu Y, Cong F. Cross-Subject Emotion Recognition Using Fused Entropy Features of EEG. Entropy. 2022; 24(9):1281. https://doi.org/10.3390/e24091281

Chicago/Turabian Style

Zuo, Xin, Chi Zhang, Timo Hämäläinen, Hanbing Gao, Yu Fu, and Fengyu Cong. 2022. "Cross-Subject Emotion Recognition Using Fused Entropy Features of EEG" Entropy 24, no. 9: 1281. https://doi.org/10.3390/e24091281

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cross-Subject Emotion Recognition Using Fused Entropy Features of EEG

Abstract

1. Introduction

2. Data Resource

3. Methodology

3.1. Preprocessing

3.2. Feature Extraction

3.2.1. Multi-Scale Entropy

3.2.2. Approximate Entropy

3.2.3. Fuzzy Entropy

3.2.4. Rényi Entropy

3.2.5. Differential Entropy

3.3. BiLSTM

4. Results

4.1. Feature Extraction

4.2. The Classification Results of BiLSTM

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Preprocessing

Appendix B. Feature Extraction

Appendix C. BiLSTM Classifier

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI