Next Article in Journal
Co-Opetition Strategy for Remanufacturing the Closed-Loop Supply Chain Considering the Design for Remanufacturing
Previous Article in Journal
Comprehensive Evaluation of Low-Carbon City Competitiveness under the “Dual-Carbon” Target: A Cross-Sectional Comparison between Huzhou City and Neighboring Cities in China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

EEG-Based Emotion Recognition by Retargeted Semi-Supervised Regression with Robust Weights

1
Zhuoyue Honors College, Hangzhou Dianzi University, Hangzhou 310018, China
2
School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
3
Zhejiang Key Laboratory of Brain-Machine Collaborative Intelligence, Hangzhou 310018, China
*
Author to whom correspondence should be addressed.
Systems 2022, 10(6), 236; https://doi.org/10.3390/systems10060236
Submission received: 27 October 2022 / Revised: 24 November 2022 / Accepted: 25 November 2022 / Published: 29 November 2022
(This article belongs to the Topic Human–Machine Interaction)

Abstract

:
The electroencephalogram (EEG) can objectively reflect the emotional state of human beings, and has attracted much attention in the academic circles in recent years. However, due to its weak, non-stationary, and low signal-to-noise properties, it is inclined to cause noise in the collected EEG data. In addition, EEG features extracted from different frequency bands and channels usually exhibit different levels of emotional expression abilities in emotion recognition tasks. In this paper, we fully consider the characteristics of EEG and propose a new model RSRRW (retargeted semi-supervised regression with robust weights). The advantages of the new model can be listed as follows. (1) The probability weight is added to each sample so that it could help effectively search noisy samples in the dataset, and lower the effect of them at the same time. (2) The distance between samples from different categories is much wider than before by extending the ϵ -dragging method to a semi-supervised paradigm. (3) Automatically discover the EEG emotional activation mode by adaptively measuring the contribution of sample features through feature weights. In the three cross-session emotion recognition tasks, the average accuracy of the RSRRW model is 81.51%, which can be seen in the experimental results on the SEED-IV dataset. In addition, with the support of the Friedman test and Nemenyi test, the classification of RSRRW model is much more accurate than that of other models.

1. Introduction

As a complex psychological state, emotion plays a key role in human cognition, including rational decision-making, perception, interpersonal communications and human intelligence [1]. Therefore, emotion recognition has attracted the attention of researchers from various disciplines. Usually, researchers investigate emotion recognition from the data sources of language, body movements, speech, and facial expressions. However, these patterns have certain drawbacks. (1) When subjects deliberately disguise their emotions, the performance of the method may be significantly affected by the deceptive data collected from the subjects based on the above data model. (2) The previous mode is impractical for people with physical disabilities (deafness, aphasia and etc.) Therefore, we need a more objective mode of emotional recognition. Firstly, considering that emotion is hard to measure as the result of spontaneousness, thus it can be observed by the accompanying physiological reactions in the central nervous system and periphery [2,3]. Secondly, EEG is a signal from the central nervous system, which has several advantages, such as large amounts of information, simple operation and low-cost [4]. So, together with the rapid development of non-stationary signal processing and analysis techniques, EEG-based emotion recognition has become a research hotspot [5].
In recent years, emotion recognition based on EEG signals has aroused the discussion of researchers. Li et al. [6] summarized preceding research results, including traditional machine learning works [7], deep learning based works [8], transfer learning based works [9,10] and ensemble learning based works [11]. Considering that it is weak and unstable, noise can interfere with the EEG data during the data acquisition process, therefore, the quality of different EEG samples may vary accordingly. In addition, due to the multi-channel and multi-rhythm properties of EEG data, features from different frequency bands and channels should have different correlations with emotional effects. However, existing studies either ignore both the issues of sample and the importance of feature or only consider one of them.
Therefore, we establish the Retargeted Semi-supervised Regression with Robust Weights model (RSRRW). Firstly, to improve the robustness of the model, we introduce the probability weight factor, which has clear physical meaning. When the weight is 0, it means that the corresponding sample is a noise point, otherwise, it is a normal point. If a certain sample deviates from the entirety too much, the RSRRW model will choose to skip absolutely over the error as well as its contribution to noise points, thus ensuring that the model does not skew toward these outliers. In order to maintain the good discriminative ability of the model, the  ϵ -dragging method is introduced to widen the gap between different classes. Additionally, feature weight factor is taken into consideration for the purpose of exploring the extent to which different dimensions of feature weight contribute to the emotion recognition task.
Consequently, the main contributions of this work to available data can be summed up.
  • A new factor, probability weight, is added to each sample in the model. With the help of probability weight, the model can differentiate noise in the samples and remove the negative effects of noise. At the same time, the value of this variable also has a clear physical meaning.
  • Innovatively apply the ϵ -dragging technique to the semi-supervised paradigm. It is aimed at estimating the direction matrix by gradually optimizing the label of the unlabeled samples during the learning process, which effectively increases the margin between classes.
  • Compared with similar models, RSRRW has higher recognition accuracy. With the help of the feature weight factor, RSRRW can discover EEG activation patterns relating to the task, and resultantly determine comparatively significant frequency bands and predominant leads of the EEG under the current task.
Notations. In this paper, the frequency bands of EEG will be denoted by D e l t a , T h e t a , A l p h a , B e t a , and  G a m m a . Greek letters, such as η , λ , γ , represent the model parameters (e.g., α represents the confidence level). Matrices and vectors are respectively identified by boldface uppercase and lowercase letters, respectively. The  2 , 1 -norm of matrix W R d × c is defined as W 2 , 1 = i = 1 d j = 1 c w i j 2 = i = 1 d w i 2 . w i is the i-th row of W and w j is the j-th column of W . The ⊙ is a Hadamard product operator of matrices.
The main body of this article is organized as follows. Section 2 briefly reviews the background about EEG-based emotion recognition and some related techniques. In Section 3, we introduce the formulation and optimization of the RSRRW model in detail. In Section 4, we conduct experimental studies to illustrate the performance of RSRRW. Section 5 concludes the whole paper.

2. Related Work

2.1. EEG-Based Emotion Recognition

A typical EEG-based emotion recognition process includes three stages, data preprocessing, feature extraction and model training.
Since the EEG signals are easily disturbed during acquisition, in order to provide quality data, preprocessing is necessary in the experiment. EEG data preprocessing usually includes sampling and artifact removal. In general, the sampling frequency of EEG is usually 128∼1024 Hz [12]. The higher the sample rate, the more detail the EEG can capture in the data, but it also produces more noise. Noises, in other words, the causes of interference, mainly contain electrooculography, electrocardiogram, electromyography, vascular waves and so on. Common processing methods include principal components analysis, independent component analysis, and various filtering algorithms.
Feature extraction occupies an important position in the EEG emotion recognition task, and the way of feature extraction will directly have a great impact on the performance of emotion classification. EEG feature extraction methods can be classified into four categories, namely feature extraction based on time domain, feature extraction based on frequency domain, feature extraction based on time-frequency domain and feature extraction based on spatial domain. To be specific, the feature extraction methods based on time domain include histogram analysis method [13], Hjorth parameter [14], event-related potential [15] and so on. Time domain-based analysis methods often obtain information by analyzing the geometric features of EEG signals, and the rate of information loss is very low. The frequency domain-based method mainly performs feature extraction by transforming the EEG signal from the time domain to the frequency domain, and dividing the acquired spectrum into multiple sub-bands. Its general methods include power spectral density [16], higher-order crossover [17], differential entropy (DE) [18] and higher-order spectrum [19] and so on. EEG feature extraction based on time-frequency includes short-time fourier transform [20], wavelet transform [21], wavelet packet transform [22], Hilbert–Huang transform [23] and more. Feature extraction based on spatial domain is mainly to explore the correlation features between EEG signals at different locations. The methods include common spatial patterns [24], sub-band common spatial pattern [25], rational asymmetry [26], etc. Some review studies summarized the feature extraction methods in EEG-based emotion recognition [23,27].
As for the model training process, a lot of efforts were made in the past decades. Roughly, existing models can be categorized into linear and nonlinear models. In recent studies, deep learning has attracted much attention due to its powerful nonlinear learning ability. Thammasan et al. utilized the deep belief network (DBN) to classify the EEG emotions based on the handcrafted features extracted from EEG (fractal dimension, power spectral density, etc). [28]. Li et al. used the differential entropy features of EEG to build 2D-images, and used CNN network to complete the task of emotion recognition, and achieved good results [29]. On account of the previous study, some deep learning models further unify feature extraction and recognition to form an end-to-end learning model [30,31]. Although the above deep learning models have achieved relatively good results in EEG emotion recognition tasks, most of their models stay at the stage of black-box training modes, the outcomes are poorly interpretable, and the underlying mechanism is relatively abstract [32]. Furthermore, to improve the interpretability of the model, Peng et al. proposed a unified framework named GFIL for discovering feature importance in EEG emotion recognition tasks [33]. In [34], GFIL was extended to a semi-supervised graph learning framework. Recent advances in EEG-based emotion recognition can be found in [6].

2.2. Related Techniques

2.2.1. Emotion Models

According to the existing research, affective models can be roughly divided into two categories, one is the discrete affective state model, and the other is the dimension-space affective model. In the discrete model, the emotional space covers a limited number of basic emotions. The six basic emotions are generally recognized as anger, disgust, surprise, sadness, happiness and fear, based on which other emotions are mostly considered as their combinations [35,36]. In typical dimensional space models, emotion is considered to be distributed in two-dimension or three-dimension space, and the attributes of different dimensions provide evidence for locating positions of emotions. Among these spacial models, it is common for researchers to employ valence-arousal model (VA) [37] and valence-arousal-dominance model (VAD) [38]. In VA, the valence axis is used to measure the positive-negative degree of emotions while the arousal axis is used for measuring the intensity of emotions, as exemplified in Figure 1a. Compared with VA, VAD in Figure 1b adds the dominance axis to measure the degree to which emotions can be controlled.

2.2.2. Rescaled Least Squares Regression

Given a data matrix X = [ x 1 , x 2 , , x n ] R d × n where d is the feature dimension, n is the number of samples. We use Y = [ y 1 , y 2 , , y n ] R c × n to denote the binary indicator matrix. Specifically, if sample x i belongs to the j-th class and y i is the i-th column of Y , then the j-th element of y i is 1 and all the others of y i are 0. Its mathematical form is
y i j = 1 , x j belongs to the i - th emotion ; 0 , otherwise .
Least squares regression is a typical statistical analysis technique, which has high validity for data analysis. It has been favored in previous researches, and its semi-supervised general form is as follows
min W , b , Y u W T X + b 1 T Y 2 2 + γ W F 2 , s . t . Y u 0 , Y u T 1 = 1 ,
where Y u are unlabeled samples.
In the EEG emotion recognition task, considering that EEG data has the natural characteristic of multi-channel, the importance of data from different channels can be of difference from each other in the classification task, however, objective function (2) fail to meet the requirements of differentiated feature contribution. Thus Chen et al. [39] constructed the following model,
min W , b , Θ , Y u W T Θ X + b 1 T Y 2 2 + λ W F 2 , s . t . Y u 0 , Y u T 1 = 1 , Θ = diag ( θ ) , θ 0 , θ T 1 = 1 ,
where Θ is a diagonal matrix, Θ j j = θ j , θ j describes the significance of the j-th feature. Performing a simple transformation of the objective function (3), we obtain
min W ˜ , b , Y u W ˜ T X + b 1 T Y 2 2 + λ W ˜ 2 , 1 2 , s . t . Y u 0 , Y u T 1 = 1 ,
where W ˜ = Θ W .

2.2.3. Discriminative Least Squares Regression

The traditional LSR method is usually employed for data fitting. When it is used for data classification, the regression target y i of sample x i will be represented as a discrete value (such as +1 for the first category, −1 for the second category) or represented by one-hot encoding. In classification, data points from different classes are expected to be farther apart; however, traditional LSR cannot achieve the goal.
Therefore, Xiang et al. proposed the Discriminative Least Squares Regression (DLSR) method [40]. By introducing the ϵ -dragging technology, the class label information is embedded in the LSR framework and then the regression targets from different categories of samples move to the opposite direction, which can expand the distance between sample points from different classes. Specifically, for a positive slack variable ϵ i j , if  y i j = 1 , the output will become 1 + ϵ i j ; if y i j = 0 , the output will become ϵ i j . This method can be naturally extended to multi-class classification task. The mathematical form of DLSR is,
min W , b , M X T W + 1 b T Y B M F 2 + λ W F 2 , s . t . M 0 .
The matrix B is defined as,
B i j = + 1 , y i j = 1 , 1 , y i j = 0 ,
which is used to determine the target dragging direction. Variable M is the traction matrix and its elements are iteratively updated in model learning.
In the optimization process, ϵ -dragging constructs the loss function into two types of piecewise function, for the purpose of increasing the sample distance between different categories. Below is an example to explain it. Considering that we have a four-class classification problem, and the sample label vectors are in one-hot encoding. It is advisable to set its label of a sample from the first class as [ 1 , 0 , 0 , 0 ] T , and set the predicted value of the model under each category as W T x + b = [ y ^ 1 , y ^ 2 , y ^ 3 , y ^ 4 ] T . For its category (the first category, #1), if its predicted value y ^ 1 > 1 , ϵ will be updated as y ^ 1 1 , so that the error of the sample under the category is 0. For the other categories (the second, third, fourth category, #2 #3 #4), if its predicted value is y ^ j < 0 ( j 1 ) , ϵ will be updated as y ^ i . Accordingly, it makes the error of this sample under other categories be 0. The detailed mathematical principle will be explained in the third step of optimization process in Section 3.2.
Table 1 graphically illustrates the purpose of the ϵ -dragging method. For a four-class dataset with eight sample points, the regression targets of the models (2) and (5) can be found respectively in the third and fourth columns in the Table 1.

3. Method

In this section, we present the model formulation of RSRRW firstly and then its detailed optimization procedure.
In EEG-based semi-supervised emotion recognition problem, we use the matrix X = [ x 1 , x 2 , , x n ] R d × n to represent the EEG data. X consists of two subsets, one is labeled samples X l R d × n l , corresponding to the labels Y l R c × n l . The other one is unlabeled samples X u R d × n u , correspondingly to the labels Y u R c × n u . d is the feature dimension of EEG samples, c is the number of emotion categories, n l is the number of labeled samples, n u is the number of unlabeled samples and n = n l + n u .
The target of this algorithm is to accurately predict Y u using the given data X = [ X l , X u ] and Y l .

3.1. Model Formulation

The RSRRW model framework is shown in Figure 2, in which the model part mainly includes four components, namely sample weight learning, feature weight learning, ϵ -dragging process, coefficient matrix learning. The functions of each part are summarized as follows. (1) Based on sample weight learning, add a probability weight to each sample point in the data. When the sample is a normal point, its probability weight is 1, otherwise, its probability weight is 0. (2) Based on feature weight learning, differentiate the contributions of features. (3) In the ϵ -dragging process, the regression targets of different categories are forced to move in opposite directions, expanding the distance between categories. (4) Coefficient matrix learning, using the least squares method to learn W and b .
The ϵ -dragging strategy was proposed in [40], and its mathematical form is shown in
min W , b , M V T X + b 1 T Y B M F 2 + λ V F 2 , s . t . M 0 ,
where V is original coefficient matrix.
Considering that EEG data is a multi-channel time series data, features from different frequency bands and leads have different contributions to specific tasks, and the objective function (7) cannot distinguish the importance of different features, which is defective. Therefore, we introduce the sample feature weight representation matrix Θ to describe the importance of sample features, and construct the following objective function,
min W , b , M V T Θ X + b 1 T Y B M F 2 + λ V F 2 , s . t . M 0 , Θ = diag ( θ ) , θ 0 , θ T 1 = 1 .
Considering the weak and unstable characteristics of EEG data, the quality of different samples varies greatly. In order to improve the robustness of the model to samples, we introduce the sample probability weight s , so that the model can automatically filter outliers during the training process. The optimized objective function is
min i = 1 n s i V T Θ x i + b y i d i m i 2 + λ V F 2 , s . t . s T 1 = k , 0 s i 1 , Θ = diag ( θ ) , θ 0 , θ T 1 = 1 , m i 0 .
In model (9), s is the sample weight, k is the number of normal points in the sample, d i is the i-th column of B . However, model (9) can only be used in a supervised setting, here we extend it to a semi-supervised setting,
min i = 1 n s i V T Θ x i + b y i ( 2 y i 1 ) m i 2 + λ V F 2 , s . t . s T 1 = k , 0 s i 1 , Θ = diag ( θ ) , θ 0 , θ T 1 = 1 , Y = [ Y l ; Y u ] , Y u 0 , Y u T 1 = 1 , m i 0 .
By setting W = Θ V , we have V = Θ 1 W and then the above objective function can be rewritten as
min i = 1 n s i W T x i + b y i ( 2 y i 1 ) m i 2 + λ Θ 1 W F 2 , s . t . s T 1 = k , 0 s i 1 , Θ = diag ( θ ) , θ 0 , θ T 1 = 1 , Y = [ Y l ; Y u ] , Y u 0 , Y u 1 = 1 , m i 0 .
When s , W , b , M are fixed, because  Θ = d i a g ( θ ) and θ T 1 = 1 , the second term of objective function (11) can be rewritten as
min θ 0 , θ T 1 = 1 j = 1 d w j 2 2 θ j .
According to the Lagrange multiplier method, we write the Lagrangian function of (12) about θ j and set its derivative to θ j to 0, then we can get
θ j = w j 2 l = 1 d w l 2 .
With the above solution to θ , objective function (12) is equivalent to
min θ 0 , θ T 1 = 1 W 2 , 1 2 .
Now model (11) can be rewritten as
min i = 1 n s i W T x i + b y i ( 2 y i 1 ) m i 2 + λ W 2 , 1 2 , s . t . s T 1 = k , 0 s i 1 , Y = [ Y l ; Y u ] , Y u 0 , Y u 1 = 1 , m i 0 .

3.2. Model Optimization

There are five variables in model (15) that need to be optimized, namely W , b , Y u , s , M . For this problem, we designed a joint iterative optimization algorithm, which divides the above problem into four sub-problems to solve them separately and performs multiple iterations until convergence.
  • Update W , b by fixing Y u , s and M .
Performing simple variable substitution on objective function (15), we can get
min W , b i = 1 n s i μ ( e i ) + λ W 2 , 1 2 ,
where e i = W T x i + b z i 2 2 , z i is the i-th column of Z , Z = Y + ( 2 Y 1 1 T ) M and μ is the function of e, μ ( e ) = e 1 2 .
We introduce an effective algorithm [41] to solve objective function (16). The general problem is as follows
min x C i h i ( g i ( x ) ) + f ( x ) .
The form of x and g ( x ) could be scalar, vector or matrix. Algorithm 1 describes the detailed procedures. By comparison of (16) and (17), we can find μ ( e ) , e i ( W , b ) , λ W 2 , 1 2 in problem (17) as h ( x ) , g i ( x ) , and  f ( x ) in (17), respectively.
Algorithm 1 Solution of (17).
Input: 
Initialize x C
Output: 
The optimal x
  1:
while not converged do
  2:
   Calculated d i = h i ( g i ( x ) ) ;
  3:
   Solve the following minimization problem min x C i Tr ( d i T g i ( x ) ) + f ( x ) ;
  4:
end while
Firstly, we need to calculate d i . The formulation of d i is
d i = μ ( e i ) = 1 2 W T x i + b z i 2 .
Secondly, we need to solve
min W , b i = 1 n s i d i W T x i + b z i 2 2 + λ W 2 , 1 2 .
To facilitate subsequent calculations, we rewrite (19) into matrix form
J = Tr ( W T X + b 1 T Z ) Λ ( W T X + b 1 T Z ) T + λ W 2 , 1 2 ,
where Λ = SD , S = d i a g ( s 1 , s 2 , , s n ) , D = d i a g ( d 1 , d 2 , , d n ) . Considering that (19) is an unconstrained optimization problem, we use gradient descent to solve it. Take the partial derivative of J with respect to b , we have
J b = 2 W T X + b 1 T Z Λ 1 .
Let (21) take 0, we can get the optimization formula of b as
b = Z Λ 1 W T X Λ 1 1 T Λ 1 .
Replacing b in (20) with the result obtained in (22) and simplifying it, we can get
J = Tr ( W T X Z ) K ( W T X Z ) T + λ i = 1 d w j + δ .
where K = I Λ 1 1 T 1 T Λ 1 Λ I Λ 1 1 T 1 T Λ 1 T and K = K T . Similarly, (23) is also an unconstrained optimization problem, so we take the partial derivative of (23) in terms of W , then we can get
J W = 2 ( X K X T W X K Z T + λ QW ) ,
where Q R d × d is a diagonal matrix whose i-th diagonal elements is
q i i = i = 1 d w i 2 2 + δ w i 2 2 + δ ,
and δ is a fixed minimal constant value. By making the partial derivative value 0, the expression of W is obtained
W = ( XKX T + λ Q ) 1 XKZ T .
  • Update s by fixing Y u , W , b and M .
The corresponding objective function in terms of variable s is
min s i = 1 n s i e i s . t . s 0 , s T 1 = k .
where e i measures the approximation error on sample x i by the 2 -norm. It is interesting that parameter k and the variable s are closely related in model (27). To be specific, the value of k represents the number of elements of one in s , corresponding to the number of normal samples involved in model training. By calculating and ranking the loss e i for each sample, the optimal s can be obtained, in which the weights of the first k samples with the smallest errors are all set as one and the remaining values are zeros. This regularity is depicted by the following Theorem 1.
Theorem 1. 
The optimal s in problem (27) is a binary vector, in which the corresponding weights of the first k samples with the smallest errors are one and the others are zero.
Proof 1. 
Suppose there is another weight vector s that satisfies
j = 1 k s j + j = k + 1 n s j = k ,
s.t.
j = 1 n s j μ ( e ( j ) ) j = 1 n s j μ ( e ( j ) ) .
Firstly, we sort each sample in ascending order of error, i.e.,
μ ( e ( 1 ) ) μ ( e ( 2 ) ) μ ( e ( j ) ) μ ( e ( n ) ) .
After a simple split on (29), we get
j = 1 k s j μ ( e ( j ) ) + j = k + 1 n s j μ ( e ( j ) ) j = 1 k μ ( e ( j ) ) .
By moving the first term to the left of the inequality sign to the right, we have
j = k + 1 n s j μ ( e ( j ) ) j = 1 k ( 1 s j ) μ ( e ( j ) ) .
According to (30), we can get
j = k + 1 n s i μ ( e ( j ) ) ( 1 s j ) μ ( e ( k + 1 ) ) .
After combining (32) and formula (33), we can get
j = 1 k ( 1 s j ) μ ( e ( k + 1 ) ) j = 1 k ( 1 s j ) μ ( e ( j ) ) .
When inequality signs in (30) are not taken equal at the same time, Equation (34) obviously does not hold, so the assumption is logically impossible. Therefore, the optimal s in problem (27) is a binary vector, in which the corresponding weights of the first k samples with the smallest errors are one and the others are zero.    □
  • Update M by fixing Y u , W , b and s .
From problem (15), the objective function in terms of variable M is
min M 0 W T X + b 1 T Y ( 2 Y 11 T ) M 2 P B M 2 ,
where P W T X + b 1 T Y and B 2 Y 11 T . It is easily verified that matrix B is similar to the one defined in [40] in supervised learning. Considering that in the optimization process of (35), the optimization process of each element is independent of each other, therefore, for an element in the matrix, we can convert (35) into the following form
min M i j 0 ( P i j B i j M ˙ i j ) 2 .
Obviously, the solution to the above optimization problem is
M i j = max ( P i j B i j , 0 ) .
Then, the solution to (35) is
M = max ( P . / B , 0 ) .
In the formula (38), we find that the ϵ -dragging method essentially encourages the predicted value y ^ to migrate towards 1 + ϵ i j or ϵ i j by constructing the loss function as a two-class piecewise function. Below we will use an example to detail the updating method during the optimization process. One point to note is that in the algorithm, we have implemented the ϵ -dragging method through matrix M and matrix B , and the elements in M can be regarded as ϵ i j . To maintain the consistence of symbols, the element is represented as M i j .
For the sample x i , we set the emotional label as y i = [ 1 , 0 , 0 , 0 ] T . In the model (35), its predicted value is y ^ i = W T x i + b = [ y ^ 1 , y ^ 2 , y ^ 3 , y ^ 4 ] T . In order to distinguish it from b in the LSR, we set the i-th column of the direction matrix B to be B = [ b ˜ 1 , b ˜ 2 , , b ˜ n ] . The direction vector of x i is b ˜ i = [ 1 , 1 , 1 , 1 ] T . The error generated by the sample is,
error i 2 = y ^ i y i b ˜ i m i 2 2 = y ^ 1 1 + M 1 , i 2 + y ^ 2 + M 2 , i 2 + y ^ 3 + M 3 , i 2 + y ^ 4 + M 4 , i 2 .
Based on (39), the  e r r o r i 2 consists of four items, which can be divided into two categories. One is the error generated by the predicted value of the category to which the sample belongs, corresponding to the first item in (39); the other is the error generated by the predicted value of other categories, corresponding to the second, third, and fourth items in (39). We can easily conclude the following findings.
There are two steps in the calculation of the error of the category of the sample (that is, the first item in (39)). Firstly, M 1 , i can be represented by the following piecewise function,
M 1 , i = y ^ 1 1 y ^ 1 1 0 y ^ 1 < 1
Then the corresponding error is calculated as,
error 1 , i 2 = 0 y ^ 1 1 y ^ 1 1 2 y ^ 1 < 1
The formula (41) indicates that when the predicted value y ^ 1 of the category to which the sample belongs is greater than 1, the error generated by y ^ 1 is 0. However, in the traditional LSR, an error will still be generated at this time, and its value is ( y ^ 1 1 ) 2 . The  ϵ -dragging method uses the no-negative slack variable M 1 , i , offsetting the corresponding error. Thereby it encourages the predicted value y ^ 1 of the category to move in the direction 1 + ϵ .
Similarly, for the samples of the other categories (i.e., items 2, 3, and 4 in (39)), the analysis is presented below. The variable M c j , i ( c j 1 ) can be represented as
M c j , i = 0 y ^ c j 0 , y ^ c j y ^ c j < 0 ,
Secondly, the corresponding error is calculated as,
error c j , i 2 = y ^ c j 2 y ^ c j 0 , 0 y ^ c j < 0 .
The formula (43) indicates that when the predicted value y ^ c j of other categories is less than 0, the error generated by y ^ c j is 0. However, in LSR, an error will still be generated at this time, and its value is ( y ^ c j 1 ) 2 . The  ϵ -dragging method uses the no-negative slack variable M c j , i to offset the corresponding error. Thereby it encourages the predicted value of other category y ^ c j to move in the direction ϵ .
  • Update Y u by fixing s , W , b and M .
Rows in Y u are independent of each other, and then we can optimize Y u in a column-wise manner. To be specific, for  i = l + 1 , l + 2 , , l + u , the corresponding objective function in terms of y i is
min y i 0 , y i T 1 = 1 W T x i + b + m i y i ( 1 + 2 m i ) 2 a i y i 2 ,
where a ( W T x i + b + m i ) . / ( 1 + 2 m i ) . To simplify the derivation, we optimize the equivalent form of problem (44) as
min y i 0 , y i T 1 = 1 1 2 a i y i 2 2 .
Problem (45) can be solved by the Lagrange multiplier method combined with KKT condition. The corresponding Lagrangian function is
L = 1 2 y i a i 2 2 η ( y i T 1 1 ) β T y i ,
where η and β R c are the Lagrangian multipliers respectively in scalar and vector forms. Assume that y ˜ i is the optimal solution and the associated optimal Lagrangian multipliers are η ˜ and β ˜ . According to the KKT condition, we have
y ˜ j 0 , β ˜ j 0 , y j ˜ β j ˜ = 0 , y ˜ j a j η ˜ β ˜ j = 0 ,
for each j { 1 , 2 , , c } . The last expression in (47) can be written in vector form as
y a η ˜ 1 β ˜ = 0 .
Because y T 1 = 1 , the parameter a in (48) can be written as
η ˜ = 1 1 T a 1 T β ˜ c .
Substituting (49) into (48), we have
y ˜ = a 1 1 T c a + 1 c 1 1 T β ˜ c 1 + β ˜ .
Denote β ˜ ¯ = 1 T β ˜ c and g = a 1 1 T c + 1 c 1 , (50) can be rewritten as
y ˜ = g j + β j ˜ β ˜ ¯ 1 .
Therefore we have
y ˜ j = g j + β ˜ j β ˜ ¯ .
Through (47) and (52), we can get g j + β ˜ j β ˜ ¯ = ( g j β ˜ ¯ ) + , where
( f ( · ) ) + = 0 f ( · ) < 0 , f ( · ) otherwise .
Therefore, (52) can be written as
y ˜ j = ( g j β ˜ ¯ ) + .
At this time, if the optimal β ˜ ¯ can be determined, according to (54), the optimal y ˜ can also be determined. Similar to (54), (52) can be rewritten as β ˜ j = y ˜ j + β ˜ ¯ g i such that β ˜ j = ( β ˜ ¯ g i ) + . Therefore, β ˜ ¯ can be calculated as
β ˜ ¯ = 1 c j = 1 c ( β ˜ ¯ g j ) + .
According to the constraint y T 1 = 1 and (54), we can define the following function
f ( β ¯ ) = j = 1 c ( g j β ¯ ) 1 ,
and the optimal β ˜ ¯ should satisfy f ( β ˜ ¯ ) = 0 . When (56) equals 0, the optimal β ˜ ¯ can be calculated by the Newton method
β ¯ k + 1 = β ¯ k f ( β ¯ k ) f ( β ¯ k ) .
Finally, the optimization procedure to sub-problem (44) is listed in Algorithm 2 and the procedure to solve problem (15) is summarized in Algorithm 3.
Algorithm 2 The algorithm to solve sub-problem (45).
Input: 
vector a i R c
Output: 
variable x R c
  1:
Compute g = a i 1 1 T c + 1 c 1 ;
  2:
Use Newton’s method to obtain the root β ˜ ¯ of (56);
  3:
Obtain the optimal solution by (54);
Algorithm 3 The optimization algorithm of RSRRW.
Input: 
Labeled EEG data X l R d × n l and its corresponding label matrix Y l R c × n l , unlabeled EEG data X u R d × n u , parameters λ ;
Output: 
The estimated label Y u R c × n u .
  1:
Initialize W randomly. Initialize each element of Y u as 1 c . Initialize each element of M as 0. Initialize each element of s as 1;
  2:
while not converged do
  3:
   Update b via (22);
  4:
   Update W via (26);
  5:
   Update s by solve (27);
  6:
   Update M via (38);
  7:
   Update Y u by solving (45) for each i | n l + 1 n l + n u ;
  8:
end while

3.3. Discussion on RSRRW

In this section, we will discuss the variable s and ϵ -dragging methods in RSRRW. In addition, the complexity of the RSRRW model will also be briefly analyzed.
  • We know that s i | i = 1 n reflects the importance level of the i-th sample. Through proof in Section 3.2, we can know that s is a binary vector. When sample i is a noise point, the loss caused by it is 0, that is, sample x i will not get involved in the training process. On the contrary, when the sample x i is a normal point, its weight is 1, meaning that it contributes to the training. Therefore, with the help of the sample weight s , RSRRW can automatically find outliers in the data and remove them, thereby preventing the model from skewing towards outliers. Comparison experiments suggests that RSRRW performs better than DLSR. At the same time, through the visualization of s in the time dimension, we can intuitively observe the time distribution of outliers during the experiment, which can also provide some references for future experimental design. We will elaborate on this in Section 4.3 below.
  • In [40], the ϵ -dragging method was applied to supervised learning tasks and B was obtained by calculating the label of the training set samples, so as to obtain the dragging direction of each sample and achieve the purpose of expanding the distance between classes. Considering that in the EEG emotion recognition task, there are usually few labeled data samples, and the distribution of data from different sessions is quite different. Therefore, it may be hard to obtain ideal results by wholly relying on the supervised learning paradigm to complete the emotion recognition task. Therefore, we extend the ϵ -dragging method to the semi-supervised learning process and estimate the dragging direction of each sample through the unlabeled sample category probability obtained in the optimization process, realizing semi-supervised ϵ -dragging method. Ideally, if Y u can be estimated accurately, then we can get the exact dragging direction of each sample. In the following experiments, we will intuitively show the performance of ϵ -dragging method.
  • We analyze the asymptotic time complexity of our model in this section. Firstly, in the optimization of b , the asymptotic time complexity is O ( n 2 c + n d c + n 2 c ) . Secondly, in the optimization of W , the asymptotic time complexity of ( XKX T + λ Q ) 1 is O ( n 2 d + d 2 c + d 3 ) , the asymptotic time complexity of XKZ T is O ( n 2 d ) . To calculate K , considering that it depends on the operation of the diagonal matrix and the unit matrix, so its time complexity is obviously lower than O ( n 2 d ) . Then, as for the improvement of s , its asymptotic time complexity is O ( n log n ) . Finally, in the optimizing process of M , its asymptotic time complexity is O ( d c n ) . In the optimization process of Y u , since W T X has been completed in the process of updating M , its asymptotic time complexity is O ( u c ) , where u is the number of unlabeled samples. In general, n u > d c . Therefore, the overall asymptotic time complexity of the RSRRW algorithm model is O ( n 2 d t ) , where t is the number of iterations.

4. Experiments

4.1. Data Sets

The SEED-IV dataset contains EEG signals collected from subjects while they’re watching movie clips. It selects 72 film clips with highly emotional contents as signals for emotional induction. A total of 15 healthy right-handed subjects (including 7 males and 8 females) participated in the experiment, and each one needed to take exams in three different sessions on different days. The four emotional states evoked by the video are happy, fearful, neutral, and sad. In each session, one affective state corresponds to six trials. The experimental process of each session is shown in Figure 3. There will be a 5 s start prompt before each video clip is played, and the subjects will get a 45 s self-assessment after the playback ends.
The EEG acquisition equipment used during the experiment included the ESI Neuroscan system and a 62-electrode cap that conforms to the international 10–20 placement standard. In the experiments mentioned in this paper, we used differential entropy (DE) features extracted from five frequency bands, namely D e l t a (1–3 Hz), T h e t a (4–7 Hz), A l p h a (8–13 Hz), B e t a (14–30 Hz) and G a m m a (31–50 Hz). The sample x i is formed by concatenating the values of the 62 leads in each of the 5 frequency bands, resulting in a dimension of 310. More details can be found in [42].

4.2. Experimental Settings

During the experiments, we performed the cross-session emotion recognition task in chronological order. In other words, each subject is supposed to complete three tasks, including “session 1–session 2”, “session 1–session 3”, “session 2–session 3”. In the “session 1–session 2” task, the samples in session 1 are labeled data, and the samples in session 2 are unlabeled data. During the model training process, the samples from session 2 will be gradually labeled.
We compare the RSRRW algorithm with five closely related models, including the traditional semi-supervised linear square regression model (sLSR), semi-supervised support vector machine model (sSVM), the optimized rescaled linear square regression model (RLSR) [39], discriminative linear square regression model (DLSR) [40], and robust semi-supervised least squares regression model (RSLSR) [43]. RLSR is an optimization of sLSR, which adds a feature weight factor to describe feature importance on the basis of sLSR. DLSR is another optimization of sLSR. On the basis of LSR, it adds the ϵ -dragging method to increase the distance between categories. In sSVM, we use a linear kernel. RSLSR adds sample weights on the basis of sLSR, which can filter out outliers in samples. In the above five methods, there are hyperparameters that need to be adjusted ( λ in sLSR, sSVM, RLSR, DLSR, RSLSR, RSRRW; k in RSRRW, RLSR). In this experiment, the adjustment range of λ is { 2 10 , 2 9 , , 2 10 } . For the RSSLS and RSRRW, the adjustment range of k is { 0.8 a , 0.81 a , , a } , where a is the number of samples. The iteration stopping condition is that the number of iterations reaches 100 or the rate of change of the objective function o b j δ = o b j ( t + 1 ) o b j ( t ) o b j ( t ) is less than or equal to 1 × 10 5 . The typical number of iterations of RSRRW is around 70.

4.3. Results and Analysis

Table 2, Table 3 and Table 4 shows the recognition accuracy of the above models in the cross-session EEG emotional states recognition tasks. Among them, Table 2 shows the experimental results with session 1 serving as labeled data and session 2 as unlabeled data. Table 3 shows the experimental results with the labeled data in session 1 and the unlabeled data in session 3. Table 4 shows the experimental results with session 2 as labeled data and session 3 as unlabeled data. The highest accuracy in each group is marked in bold.
Through the observation of the above data, we found the following meaningful points.
  • Although the EEG data from different sessions have significant differences in distribution, our RSRRW model still achieves high recognition accuracy. Specifically, the average accuracies of RSRRW in the three cross-session tasks are 79.50%, 81.25%, and 83.77%, respectively, which are 7.68%, 3.49%, and 6.28% higher than the runner-up.
  • Compared with sLSR, the RSLSR model adds the sample weight factor s . From the average performance of the model, in the three cross session tasks, RSLSR is 5.78%, 12.01%, and 5.37% higher than sLSR, which shows that the method of dynamically screening samples by using the sample probability weight s has a good effect. That is, when the sample error is too large, we can discard these samples to ensure that the model will not shift to noise.
  • Compared with sLSR, the RLSR model adds the feature weight factor Θ . From the average performance of the model, RLSR is 3.09%, 6.98%, and 2.10% higher than sLSR, respectively, indicating that the model performance has been improved to a certain extent through adaptive feature weight learning. That is, for the EEG emotion recognition task, the EEG contribution of different frequency bands and different leads will be different.
  • Compared with sLSR, the DLSR model adds ϵ -dragging method. From the average performance of the model, DLSR is 3.62%, 5.00%, and 0.53% higher than sLSR, respectively. It can be found that the ϵ -dragging method can effectively improve the model performance.
In order to illustrate the performance advantages of the RSRRW model compared to other models as far as possible, we use Friedman test [44] and Nemenyi test method [45] for significance testing. Firstly, we put forward the null hypothesis H 0 that “the experimental results of each group come from the population with no significant difference in numerical value”. Obviously, the alternative hypothesis H 1 is “the experimental results of each group come from the population with obvious differences in numerical value”. Secondly, determine the number of models k = 6 , and the groups of cross-subject EEG emotion state recognition tasks N = 45 . Thirdly, we rank the accuracy under each cross session emotional state recognition task (the higher the recognition accuracy, the higher the ranking). After sorting, the average rank r i of each model is calculated separately, and the result is shown in Table 5.
Fourthly, we calculate the variable
τ F = ( N 1 ) τ X 2 N ( K 1 ) τ X 2 ,
where
τ X 2 = 12 N K ( K + 1 ) i = 1 K r i 2 K ( K + 1 ) 2 4 .
In this experiment, considering that the number of models k = 6 and the number of datasets N = 45 are relatively large, it can be considered that τ X 2 X 2 ( k 1 ) . Calculated by MATLAB, τ F = 39.9503 , which is much larger than the critical value 2.2551 of the F test when k = 6 , n = 45 . Therefore, it can be accepted that in the case of confidence level α = 0.05 , reject the null hypothesis H 0 and accept the alternative hypothesis H 1 (that is, the probability of error in the judgment of rejecting the null hypothesis is 5%).
Based on this, in order to further distinguish the performance of each model, we performed post hoc test detection based on the Nemenyi test on the above experimental results. First, we calculate the critical distance (CD) of average ranking difference under this set of data through
C D = q α K ( K + 1 ) 6 N ,
where q α represents the critical value in the Nemenyi test. In the case of k = 6 , α = 0.05 , the critical value is q α = 2.850 . By calculation, C D = 1.1241 . According to the above experimental results, we have drawn Figure 4. In the figure, we use a vertical line to represent the critical region, and the midpoint of the line is the average rank of the model. If two lines do not overlap, then we can draw into the conclusion that these two model differ from each other significantly, otherwise, the former bears some resemblance to the latter. For example, the rank value of RSRRW is 1.22, and the rank value of RSLSR is 2.77. Their corresponding vertical lines do not overlap, so there is a significant difference between the two at the confidence level of 0.05. Therefore, RSSRW significantly outperforms RSLSR in the cross-session emotion recognition task.

4.4. EEG Spatial-Frequency Activation Patterns Mining

For the feature weight Θ obtained in our experiment, we can determine the contribution of each frequency band and lead in emotion recognition according to the corresponding relationship between spectral features and EEG frequency bands and channels [33]. The average band importance for all 45 experiments is shown in Figure 5a, from which we can find that the G a m m a band contributes the most in distinguishing different emotional states. Based on the consensus that brain regions may correlate differently with the occurrence of affective effects, we can also determine the importance of different EEG channels by feature weights Θ . As shown in Figure 5b, we show channel importance on the EEG topography and find that channels in the temporal and parietal lobes have higher importance. In Figure 6, we show the weight values of the top 10 channels. We believe that P 7 , C P 2 , T P 8 , C Z , F C 5 , T P 7 , F 8 , T 7 , F P Z and P 8 have high importance for the cross session emotion recognition task. The layout of the 10 key leads is shown in Figure 7.

4.5. Effect of the Dragging-Matrix

For samples from different categories, we want to embed the labels of the samples into the objective function to achieve the purpose of increasing the distance between categories. Therefore, we introduce the ϵ -dragging method to drag the regression targets from different classes in opposite directions by matrix M and B . The dragging-matrix M can significantly expand the distance between categories, and each column can be regarded as a binary regressor. For samples grouped into class j, their corresponding labels become 1 + m i j , and for samples that do not belong to class j, their labels become m i j , where m i j 0 .
To observe the performance of the ϵ -dragging method, we selected data from subject 15: session 1–session 2, subject 5: session 2–session 3 to plot the Y value of unlabeled samples. In Figure 8, we plotted certain columns of predicted labels Y for the above experiments. Then, modify the order of the samples, put the samples belonging to the same category together, so as to observe the distance between samples under each emotion category. Each color represents an affective state, with dashed lines for negative samples and solid lines for positive samples. It can be found that this method can significantly increase the distance between samples.

4.6. Effect of the Sample Probability Weight

In the RSRRW algorithm, we add the weight factor s to the objective function. During the learning process, all samples are sorted according to the error after training. The k samples with the smallest error are marked as 1, and the others are 0. Through the above process, outliers will be automatically located and eliminated, avoiding the loss of accuracy caused by the deviation of the model from these values.
Below, we have selected EEG data from subject 1: session 3, subject 9: session 2, and subject 10: session 2 to plot the distribution of noisy samples. In Figure 9, the background color represents the real emotion category, the horizontal axis represents time, and the vertical axis represents the s values of different samples. Due to the large number of actual samples, so we merely focus on the position of noisy samples. Each point in this figure does not exactly correspond to one sample in reality. The blue points indicate that the sample is normal, and the red points represent abnormal samples.
In the experimental paradigm of SEED-IV, each subject only had 45 s to evaluate and rest between trials. We believe that such short time cannot make some subjects completely recover from the previous emotional state. Therefore, in the follow-up EEG emotion experiment, the rest time of the subjects after each video clip should be appropriately extended to ensure that the subjects can get enough rest before the next experiment. Alongside this, it can also improve the quality of samples to a certain extent and ensure that the samples and labels have better consistence.

5. Conclusions

In this paper, we propose a model of Retargeted Semi-supervised Regression with Robust Weights Self-learning (RSRRW). We will summarize this paper from two aspects: Advantages and Results.
Advantages: (1) Compared with the DLSR method, the robustness of the model is fully considered, and binary weights are added to each sample to determine whether the sample is a noise point, which can ensure the performance of the model when some labels are inaccurate. Compared with the traditional LSR method, RSRRW introduces feature weights Θ to distinguish the weights of different features in the EEG emotion recognition task, thus obtaining the emotional activation mode. (2) Compared with the traditional LSR method, RSRRW introduces feature weight variable Θ to distinguish the weights of different features in the EEG emotion recognition task, thus obtaining the emotional activation mode. (3) RSRRW implements the ϵ -dragging method under the semi-supervised learning paradigm to expand the sample distance between different categories and improve the model performance. (4) The model has achieved good experimental results on the SEED-IV emotion recognition dataset, the average accuracies of RSRRW in the three cross-session tasks are 79.50%, 81.25%, and 83.77%, respectively, which are 7.68%, 3.49%, and 6.28% higher than the runner-up.
Results: (1) The RSRRW model greatly improves the accuracy of emotion recognition. (2) It is much more important to consider G a m m a and D e l t a EEG frequency bands and the EEG channels locating within the temporal and (central) parietal lobes in emotion recognition tasks.

Author Contributions

Conceptualization, Y.P. and Z.C.; Data curation, Z.C. and S.D.; Investigation, Y.P.; Methodology, Z.C. and Y.P.; Software, Z.C. and Y.P.; Validation, Y.P. and S.D.; Writing—original draft preparation, Z.C. and Y.P.; Writing—review and editing, Z.C. and Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (61971173), Fundamental Research Funds for the Provincial Universities of Zhejiang (GK209907299001-008) and National Innovation Training Program for College Students (202210336051).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board (or Ethics Committee) of Shanghai Jiao Tong University (protocol code 2017060).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Not applicable.

Acknowledgments

The authors also would like to thank the anonymous reviewers for their comments on this work.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Papero, D.; Frost, R.; Havstad, L.; Noone, R. Natural Systems Thinking and the Human Family. Systems 2018, 6, 19. [Google Scholar] [CrossRef] [Green Version]
  2. Shu, L.; Xie, J.; Yang, M.; Li, Z.; Li, Z.; Liao, D.; Xu, X.; Yang, X. A review of emotion recognition using physiological signals. Sensors 2018, 18, 2074. [Google Scholar] [CrossRef] [Green Version]
  3. Chang, C.; Chen, J.E. Multimodal EEG-fMRI: Advancing insight into large-scale human brain dynamics. Curr. Opin. Biomed. Eng. 2021, 18, 100279. [Google Scholar] [CrossRef]
  4. He, Z.; Li, Z.; Yang, F.; Wang, L.; Li, J.; Zhou, C.; Pan, J. Advances in multimodal emotion recognition based on brain–computer interfaces. Brain Sci. 2020, 10, 687. [Google Scholar] [CrossRef]
  5. MacNamara, A.; Joyner, K.; Klawohn, J. Event-related potential studies of emotion regulation: A review of recent progress and future directions. Int. J. Psychophysiol. 2022, 176, 73–88. [Google Scholar] [CrossRef] [PubMed]
  6. Li, X.; Zhang, Y.; Tiwari, P.; Song, D.; Hu, B.; Yang, M.; Zhao, Z.; Kumar, N.; Marttinen, P. EEG based Emotion Recognition: A Tutorial and Review. ACM Comput. Surv. 2022. [Google Scholar] [CrossRef]
  7. Peng, Y.; Kong, W.; Qin, F.; Nie, F.; Fang, J.; Lu, B.L.; Cichocki, A. Self-weighted semi-supervised classification for joint EEG-based emotion recognition and affective activation patterns mining. IEEE Trans. Instrum. Meas. 2021, 70, 1–11. [Google Scholar] [CrossRef]
  8. Lawhern, V.J.; Solon, A.J.; Waytowich, N.R.; Gordon, S.M.; Hung, C.P.; Lance, B.J. EEGNet: A compact convolutional neural network for EEG-based brain–computer interfaces. J. Neural Eng. 2018, 15, 056013. [Google Scholar] [CrossRef] [Green Version]
  9. Wang, F.; Zhang, W.; Xu, Z.; Ping, J.; Chu, H. A deep multi-source adaptation transfer network for cross-subject electroencephalogram emotion recognition. Neural. Comput. Appl. 2021, 33, 9061–9073. [Google Scholar] [CrossRef]
  10. Li, W.; Peng, Y. Transfer EEG Emotion Recognition by Combining Semi-Supervised Regression with Bipartite Graph Label Propagation. Systems 2022, 10, 111. [Google Scholar] [CrossRef]
  11. Chen, Y.; Chang, R.; Guo, J. Emotion recognition of eeg signals based on the ensemble learning method: Adaboost. Math. Probl. Eng. 2021, 2021, 8896062. [Google Scholar] [CrossRef]
  12. Rahman, M.M.; Sarkar, A.K.; Hossain, M.A.; Hossain, M.S.; Islam, M.R.; Hossain, M.B.; Quinn, J.M.; Moni, M.A. Recognition of human emotions using EEG signals: A review. Comput. Biol. Med. 2021, 136, 104696. [Google Scholar] [CrossRef]
  13. Bahador, N.; Erikson, K.; Laurila, J.; Koskenkari, J.; Ala-Kokko, T.; Kortelainen, J. Automatic detection of artifacts in EEG by combining deep learning and histogram contour processing. In Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada, 20–24 July 2020; pp. 138–141. [Google Scholar] [CrossRef]
  14. Mehmood, R.M.; Bilal, M.; Vimal, S.; Lee, S.W. EEG-based affective state recognition from human brain signals by using Hjorth-activity. Measurement 2022, 202, 111738. [Google Scholar] [CrossRef]
  15. Singh, M.I.; Singh, M. Emotion recognition: An evaluation of ERP features acquired from frontal EEG electrodes. Appl. Sci. 2021, 11, 4131. [Google Scholar] [CrossRef]
  16. Alsolamy, M.; Fattouh, A. Emotion estimation from EEG signals during listening to Quran using PSD features. In Proceedings of the 2016 7th International Conference on Computer Science and Information Technology (CSIT), Amman, Jordan, 13–15 July 2016; pp. 1–5. [Google Scholar] [CrossRef]
  17. Fan, C.; Liu, X.; Gu, X.; Zhou, L.; Li, X. Research on emotion recognition of EEG signal based on convolutional neural networks and high-order cross-analysis. J. Healthc. Eng. 2022, 2022, 6238172. [Google Scholar] [CrossRef]
  18. Hwang, S.; Hong, K.; Son, G.; Byun, H. Learning CNN features from DE features for EEG-based emotion recognition. Pattern Anal. Appl. 2020, 23, 1323–1335. [Google Scholar] [CrossRef]
  19. Sharma, R.; Pachori, R.B.; Sircar, P. Automated emotion recognition based on higher order statistics and deep learning algorithm. Biomed. Signal Process. Control 2020, 58, 101867. [Google Scholar] [CrossRef]
  20. Donmez, H.; Ozkurt, N. Emotion classification from EEG signals in convolutional neural networks. In Proceedings of the 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), Izmir, Turkey, 31 October–2 November 2019; pp. 1–6. [Google Scholar] [CrossRef]
  21. Gupta, V.; Chopda, M.D.; Pachori, R.B. Cross-subject emotion recognition using flexible analytic wavelet transform from EEG signals. IEEE Sens. J. 2018, 19, 2266–2274. [Google Scholar] [CrossRef]
  22. Chen, J.; Jiang, D.; Zhang, Y. A common spatial pattern and wavelet packet decomposition combined method for EEG-based emotion recognition. J. Adv. Comput. Intell. Intell. Inform. 2019, 23, 274–281. [Google Scholar] [CrossRef]
  23. Wang, J.; Wang, M. Review of the emotional feature extraction and classification using EEG signals. Cognit. Robot. 2021, 1, 29–40. [Google Scholar] [CrossRef]
  24. Basar, M.D.; Duru, A.D.; Akan, A. Emotional state detection based on common spatial patterns of EEG. Signal Image Video Process 2020, 14, 473–481. [Google Scholar] [CrossRef]
  25. Novi, Q.; Guan, C.; Dat, T.H.; Xue, P. Sub-band common spatial pattern (SBCSP) for brain-computer interface. In Proceedings of the 2007 3rd International IEEE/EMBS Conference on Neural Engineering, Kohala Coast, HI, USA, 2–5 May 2007; pp. 204–207. [Google Scholar] [CrossRef] [Green Version]
  26. Li, Z.; Tian, X.; Shu, L.; Xu, X.; Hu, B. Emotion recognition from EEG using RASM and LSTM. In Proceedings of the International Conference on Internet Multimedia Computing and Service (ICIMS), Qingdao, China, 23–25 August 2017; pp. 310–318. [Google Scholar] [CrossRef]
  27. Zhang, G.; Yu, M.J.; Chen, G.; Han, Y.H.; Zhang, D.; Zhao, G.Z.; Liu, Y.-J. A review of EEG features for emotion recognition. Sci. China Inf. Sci. 2019, 49, 1097–1118. (In Chinese) [Google Scholar] [CrossRef]
  28. Thammasan, N.; Fukui, K.I.; Numao, M. Application of deep belief networks in eeg-based dynamic music-emotion recognition. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 881–888. [Google Scholar] [CrossRef]
  29. Li, J.; Zhang, Z.; He, H. Hierarchical convolutional neural networks for EEG-based emotion recognition. Cognit. Comput. 2018, 10, 368–380. [Google Scholar] [CrossRef]
  30. Cui, H.; Liu, A.; Zhang, X.; Chen, X.; Wang, K.; Chen, X. EEG-based emotion recognition using an end-to-end regional-asymmetric convolutional neural network. Knowl. Based Syst. 2020, 205, 106243. [Google Scholar] [CrossRef]
  31. Liu, Y.; Ding, Y.; Li, C.; Cheng, J.; Song, R.; Wan, F.; Chen, X. Multi-channel EEG-based emotion recognition via a multi-level features guided capsule network. Comput. Biol. Med. 2020, 123, 103927. [Google Scholar] [CrossRef] [PubMed]
  32. Gong, S.; Xing, K.; Cichocki, A.; Li, J. Deep learning in EEG: Advance of the last ten-year critical period. IEEE Trans. Cogn. Develop. Syst. 2022, 14, 348–365. [Google Scholar] [CrossRef]
  33. Peng, Y.; Qin, F.; Kong, W.; Ge, Y.; Nie, F.; Cichocki, A. GFIL: A unified framework for the importance analysis of features, frequency bands, and channels in EEG-based emotion recognition. IEEE Trans. Cogn. Develop. Syst. 2022, 14, 935–947. [Google Scholar] [CrossRef]
  34. Peng, Y.; Jin, F.; Kong, W.; Nie, F.; Lu, B.L.; Cichocki, A. OGSSL: A semi-supervised classification model coupled with optimal graph learning for EEG emotion recognition. IEEE Trans. Neural Syst. Rehabil. Eng. 2022, 30, 1288–1297. [Google Scholar] [CrossRef]
  35. Ekman, P. Facial expression and emotion. Am. Psychol. 1993, 48, 384. [Google Scholar] [CrossRef]
  36. Van den Broek, E.L. Ubiquitous emotion-aware computing. Pers. Ubiquitous. Comput. 2013, 17, 53–67. [Google Scholar] [CrossRef] [Green Version]
  37. Posner, J.; Russell, J.A.; Peterson, B.S. The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology. Dev. Psychopathol. 2005, 17, 715–734. [Google Scholar] [CrossRef]
  38. Lang, P.J. The emotion probe: Studies of motivation and attention. Am. Psychol. 1995, 50, 372. [Google Scholar] [CrossRef] [PubMed]
  39. Chen, X.; Nie, F.; Yuan, G.; Huang, J.Z. Semi-supervised feature selection via rescaled linear regression. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, 19–25 August 2017; Volume 2017, pp. 1525–1531. [Google Scholar] [CrossRef] [Green Version]
  40. Xiang, S.; Nie, F.; Meng, G.; Pan, C.; Zhang, C. Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 1738–1754. [Google Scholar] [CrossRef]
  41. Nie, F.; Yuan, J.; Huang, H. Optimal mean robust principal component analysis. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 1062–1070. [Google Scholar]
  42. Zheng, W.L.; Liu, W.; Lu, Y.; Lu, B.L.; Cichocki, A. Emotionmeter: A multimodal framework for recognizing human emotions. IEEE Trans. Cybern. 2018, 49, 1110–1122. [Google Scholar] [CrossRef] [PubMed]
  43. Wang, J.; Xie, F.; Nie, F.; Li, X. Robust Supervised and Semisupervised Least Squares Regression Using 2,p-Norm Minimization. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–15. [Google Scholar] [CrossRef]
  44. Friedman, M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 1937, 32, 675–701. [Google Scholar] [CrossRef]
  45. Zhou, Z. Machine Learning; Tsinghua University Press: Beijing, China, 2016; pp. 41–45. (In Chinese) [Google Scholar]
Figure 1. Emotion models VA (a) and VAD (b). (a) The VA model consists of the dimensions named valence and arousal, (b) The VAD model consists of the dimensions named valence, arousal and dominance.
Figure 1. Emotion models VA (a) and VAD (b). (a) The VA model consists of the dimensions named valence and arousal, (b) The VAD model consists of the dimensions named valence, arousal and dominance.
Systems 10 00236 g001
Figure 2. The general framework of RSRRW model.
Figure 2. The general framework of RSRRW model.
Systems 10 00236 g002
Figure 3. Experimental protocol for SEED-IV [42].
Figure 3. Experimental protocol for SEED-IV [42].
Systems 10 00236 g003
Figure 4. Nemenyi test result.
Figure 4. Nemenyi test result.
Systems 10 00236 g004
Figure 5. The average importance of EEG channels (a) and frequency bands (b) obtained by RSRRW.
Figure 5. The average importance of EEG channels (a) and frequency bands (b) obtained by RSRRW.
Systems 10 00236 g005
Figure 6. Top 10 EEG channels.
Figure 6. Top 10 EEG channels.
Systems 10 00236 g006
Figure 7. Top 10 EEG channels.
Figure 7. Top 10 EEG channels.
Systems 10 00236 g007
Figure 8. Examples to show the effectiveness of the ϵ -dragging.
Figure 8. Examples to show the effectiveness of the ϵ -dragging.
Systems 10 00236 g008
Figure 9. Visualization of sample weights s .
Figure 9. Visualization of sample weights s .
Systems 10 00236 g009aSystems 10 00236 g009b
Table 1. Illustration to the ϵ -dragging method.
Table 1. Illustration to the ϵ -dragging method.
SampleClassLSR TargetsDLSR TargetsConstraint
x 1 1 [ 1 , 0 , 0 , 0 ] T [ 1 + ϵ 11 , ϵ 12 , ϵ 13 , ϵ 14 ] T ϵ 11 , ϵ 12 , ϵ 13 , ϵ 14 0
x 2 1 [ 1 , 0 , 0 , 0 ] T [ 1 + ϵ 21 , ϵ 22 , ϵ 23 , ϵ 24 ] T ϵ 21 , ϵ 22 , ϵ 23 , ϵ 24 0
x 3 2 [ 0 , 1 , 0 , 0 ] T [ ϵ 31 , 1 + ϵ 32 , ϵ 33 , ϵ 34 ] T ϵ 31 , ϵ 32 , ϵ 33 , ϵ 34 0
x 4 2 [ 0 , 1 , 0 , 0 ] T [ ϵ 41 , 1 + ϵ 42 , ϵ 43 , ϵ 44 ] T ϵ 41 , ϵ 42 , ϵ 43 , ϵ 44 0
x 5 3 [ 0 , 0 , 1 , 0 ] T [ ϵ 51 , ϵ 52 , 1 + ϵ 53 , ϵ 54 ] T ϵ 51 , ϵ 52 , ϵ 53 , ϵ 54 0
x 6 3 [ 0 , 0 , 1 , 0 ] T [ ϵ 61 , ϵ 62 , 1 + ϵ 63 , ϵ 64 ] T ϵ 61 , ϵ 62 , ϵ 63 , ϵ 64 0
x 7 4 [ 0 , 0 , 0 , 1 ] T [ ϵ 71 , ϵ 72 , ϵ 73 , 1 + ϵ 74 ] T ϵ 71 , ϵ 72 , ϵ 73 , ϵ 74 0
x 8 4 [ 0 , 0 , 0 , 1 ] T [ ϵ 81 , ϵ 82 , ϵ 83 , 1 + ϵ 84 ] T ϵ 81 , ϵ 82 , ϵ 83 , ϵ 84 0
Table 2. Cross-session emotion recognition results (%) of session 1–session 2.
Table 2. Cross-session emotion recognition results (%) of session 1–session 2.
SubjectsLSRsSVMRLSRDLSRRSLSRRSRRW
subject 160.2272.7275.6062.5077.0476.20
subject 279.2182.4583.5376.4483.5386.78
subject 365.2671.3977.8879.2177.8878.00
subject 468.3955.2968.5172.2468.5180.65
subject 561.7872.6054.3366.8356.8573.44
subject 656.9764.6653.2557.9358.2975.36
subject 773.8070.1980.8979.3382.3389.90
subject 875.3667.5574.7668.9974.7688.70
subject 960.8268.6362.3863.7070.5570.91
subject 1045.5552.7647.0054.3359.7466.83
subject 1154.3354.6959.7452.4061.3068.15
subject 1269.2356.0156.4963.5862.1475.00
subject 1363.1062.3858.7771.1560.5873.80
subject 1484.0187.6285.2278.8585.2290.26
subject 1572.6092.7998.5697.3698.5698.56
Avg.66.0468.7869.1369.6671.8279.50
Table 3. Cross-session emotion recognition results (%) of session 1–session 3.
Table 3. Cross-session emotion recognition results (%) of session 1–session 3.
SubjectsLSRsSVMRLSRDLSRRSLSRRSRRW
subject 173.8481.0280.7881.6383.0992.58
subject 280.2986.8691.0086.1392.2192.34
subject 341.1253.6557.0637.4761.6863.38
subject 481.3960.5880.2992.9480.2974.21
subject 568.8679.9372.5182.6074.3382.73
subject 674.7074.4577.1382.4879.9383.45
subject 761.3184.9180.6682.1287.2392.34
subject 886.2554.8783.2184.5584.9192.46
subject 965.3354.8753.7757.9163.7566.79
subject 1045.0160.8341.8539.9064.4866.55
subject 1161.1954.7471.6563.3873.7280.54
subject 1255.6058.3967.6461.9270.9276.28
subject 1347.5752.3160.9555.4763.9971.41
subject 1470.9271.0579.4467.7690.8887.23
subject 1572.8780.5493.0785.0495.0196.47
Avg.65.7567.2772.7370.7577.7681.25
Table 4. Cross-session emotion recognition results (%) of session 2–session 3.
Table 4. Cross-session emotion recognition results (%) of session 2–session 3.
SubjectsLSRsSVMRLSRDLSRRSLSRRSRRW
subject 156.2070.9271.4171.9070.9282.24
subject 280.2977.3786.7385.8986.0191.50
subject 359.8565.0970.8169.8371.7878.59
subject 483.0972.1477.6577.7476.7687.71
subject 573.8469.4971.9077.3778.9582.70
subject 688.3265.6987.4379.5687.1091.73
subject 782.6087.8388.5983.4588.9393.80
subject 877.9875.3077.1170.3281.3985.85
subject 950.2451.1949.7842.0959.8569.95
subject 1066.9160.3473.2765.9472.8776.28
subject 1147.9361.6849.3652.5552.3160.71
subject 1280.4166.7972.7485.2879.5681.27
subject 1354.5056.3353.0146.4767.0379.44
subject 1491.4894.8990.6991.0094.0495.86
subject 1588.0878.7192.8890.3994.8998.91
Avg.72.1270.2574.2272.6577.4983.77
Table 5. The average rank of each model.
Table 5. The average rank of each model.
ModelRSRRWRSLSRRLSRDLSRsLSRsSVM
r1.222.773.963.984.604.48
r: Average rank.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Chen, Z.; Duan, S.; Peng, Y. EEG-Based Emotion Recognition by Retargeted Semi-Supervised Regression with Robust Weights. Systems 2022, 10, 236. https://doi.org/10.3390/systems10060236

AMA Style

Chen Z, Duan S, Peng Y. EEG-Based Emotion Recognition by Retargeted Semi-Supervised Regression with Robust Weights. Systems. 2022; 10(6):236. https://doi.org/10.3390/systems10060236

Chicago/Turabian Style

Chen, Ziyuan, Shuzhe Duan, and Yong Peng. 2022. "EEG-Based Emotion Recognition by Retargeted Semi-Supervised Regression with Robust Weights" Systems 10, no. 6: 236. https://doi.org/10.3390/systems10060236

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop