entropy-logo

Journal Browser

Journal Browser

Machine and Deep Learning for Affective Computing

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Signal and Data Analysis".

Deadline for manuscript submissions: closed (31 May 2023) | Viewed by 20567

Special Issue Editor

School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
Interests: facial expression analysis; micro-expression analysis; speech emotion recognition; multi-modal emotion recognition; EEG emotion recognition; domain adaptation
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Research on affective computing aims to enable machines to automatically perceive, recognize, and express emotions from/with multimodal signals, e.g., video, audio, and text. If machines were able to understand emotions from the above multimodal signals in a way similar to human beings, existing human–computer interaction (HCI) systems would undoubtedly be more natural. For this reason, affective computing has become a hot research topic in HCI and artificial intelligence (AI), and has drawn widespread attention from many researchers from HCI, AI, and other related communities over the past several decades.

On the other hand, thanks to the cross-entropy, which is a significant achievement in information theory, machine learning (especially deep learning) models have recently achieved great progress and been applied to many AI-related areas, including affective computing. By resorting to well-designed structures (e.g., convolutional neural networks (CNNs)) and well-performing loss functions (i.e., cross-entropy and its variants), these models can learn discriminative features from affective signals to accurately describe and recognize human emotions. Hence, the gap between affective computing techniques and practical applications has been remarkably narrowed.

Under the above considerations, we are organizing this Special Issue to provide a platform to gather novel contributions on machine/deep-learning methods related to entropy, information, or probability theory for affective computing. We encourage all contributions on topics including but not limited to: (1) novel machine/deep-learning methods related to entropy, information, or probability theory for unimodal affective computing; (2) novel machine/deep-learning methods related to entropy, information, or probability theory for multimodal affective computing; (3) novel machine/deep-learning methods related to entropy, information, or probability theory for emotional signal synthesis and conversion; (4) novel entropy-based methods for designing loss functions and network structures in affective computing; (5) large-scale databases for unimodal and multimodal affective computing; (6) surveys of recent advances in affective computing; (7) applications of affective computing techniques in healthcare, education, entertainment, etc.

Dr. Yuan Zong
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • affective computing
  • emotion recognition
  • multimodal information fusion
  • emotion database
  • emotional signal generation and conversion
  • machine learning
  • deep learning

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

18 pages, 3547 KiB  
Article
Multimodal Attention Dynamic Fusion Network for Facial Micro-Expression Recognition
by Hongling Yang, Lun Xie, Hang Pan, Chiqin Li, Zhiliang Wang and Jialiang Zhong
Entropy 2023, 25(9), 1246; https://doi.org/10.3390/e25091246 - 22 Aug 2023
Cited by 1 | Viewed by 1114
Abstract
The emotional changes in facial micro-expressions are combinations of action units. The researchers have revealed that action units can be used as additional auxiliary data to improve facial micro-expression recognition. Most of the researchers attempt to fuse image features and action unit information. [...] Read more.
The emotional changes in facial micro-expressions are combinations of action units. The researchers have revealed that action units can be used as additional auxiliary data to improve facial micro-expression recognition. Most of the researchers attempt to fuse image features and action unit information. However, these works ignore the impact of action units on the facial image feature extraction process. Therefore, this paper proposes a local detail feature enhancement model based on a multimodal dynamic attention fusion network (MADFN) method for micro-expression recognition. This method uses a masked autoencoder based on learnable class tokens to remove local areas with low emotional expression ability in micro-expression images. Then, we utilize the action unit dynamic fusion module to fuse action unit representation to improve the potential representation ability of image features. The state-of-the-art performance of our proposed model is evaluated and verified on SMIC, CASME II, SAMM, and their combined 3DB-Combined datasets. The experimental results demonstrated that the proposed model achieved competitive performance with accuracy rates of 81.71%, 82.11%, and 77.21% on SMIC, CASME II, and SAMM datasets, respectively, that show the MADFN model can help to improve the discrimination of facial image emotional features. Full article
(This article belongs to the Special Issue Machine and Deep Learning for Affective Computing)
Show Figures

Figure 1

19 pages, 12890 KiB  
Article
Task-Decoupled Knowledge Transfer for Cross-Modality Object Detection
by Chiheng Wei, Lianfa Bai, Xiaoyu Chen and Jing Han
Entropy 2023, 25(8), 1166; https://doi.org/10.3390/e25081166 - 04 Aug 2023
Viewed by 778
Abstract
In harsh weather conditions, the infrared modality can supplement or even replace the visible modality. However, the lack of a large-scale dataset for infrared features hinders the generation of a robust pre-training model. Most existing infrared object-detection algorithms rely on pre-training models from [...] Read more.
In harsh weather conditions, the infrared modality can supplement or even replace the visible modality. However, the lack of a large-scale dataset for infrared features hinders the generation of a robust pre-training model. Most existing infrared object-detection algorithms rely on pre-training models from the visible modality, which can accelerate network convergence but also limit performance due to modality differences. In order to provide more reliable feature representation for cross-modality object detection and enhance its performance, this paper investigates the impact of various task-relevant features on cross-modality object detection and proposes a knowledge transfer algorithm based on classification and localization decoupling analysis. A task-decoupled pre-training method is introduced to adjust the attributes of various tasks learned by the pre-training model. For the training phase, a task-relevant hyperparameter evolution method is proposed to increase the network’s adaptability to attribute changes in pre-training weights. Our proposed method improves the accuracy of multiple modalities in multiple datasets, with experimental results on the FLIR ADAS dataset reaching a state-of-the-art level and surpassing most multi-spectral object-detection methods. Full article
(This article belongs to the Special Issue Machine and Deep Learning for Affective Computing)
Show Figures

Figure 1

17 pages, 1173 KiB  
Article
AM3F-FlowNet: Attention-Based Multi-Scale Multi-Branch Flow Network
by Chenghao Fu, Wenzhong Yang, Danny Chen and Fuyuan Wei
Entropy 2023, 25(7), 1064; https://doi.org/10.3390/e25071064 - 14 Jul 2023
Viewed by 916
Abstract
Micro-expressions are the small, brief facial expression changes that humans momentarily show during emotional experiences, and their data annotation is complicated, which leads to the scarcity of micro-expression data. To extract salient and distinguishing features from a limited dataset, we propose an attention-based [...] Read more.
Micro-expressions are the small, brief facial expression changes that humans momentarily show during emotional experiences, and their data annotation is complicated, which leads to the scarcity of micro-expression data. To extract salient and distinguishing features from a limited dataset, we propose an attention-based multi-scale, multi-modal, multi-branch flow network to thoroughly learn the motion information of micro-expressions by exploiting the attention mechanism and the complementary properties between different optical flow information. First, we extract optical flow information (horizontal optical flow, vertical optical flow, and optical strain) based on the onset and apex frames of micro-expression videos, and each branch learns one kind of optical flow information separately. Second, we propose a multi-scale fusion module to extract more prosperous and more stable feature expressions using spatial attention to focus on locally important information at each scale. Then, we design a multi-optical flow feature reweighting module to adaptively select features for each optical flow separately by channel attention. Finally, to better integrate the information of the three branches and to alleviate the problem of uneven distribution of micro-expression samples, we introduce a logarithmically adjusted prior knowledge weighting loss. This loss function weights the prediction scores of samples from different categories to mitigate the negative impact of category imbalance during the classification process. The effectiveness of the proposed model is demonstrated through extensive experiments and feature visualization on three benchmark datasets (CASMEII, SAMM, and SMIC), and its performance is comparable to that of state-of-the-art methods. Full article
(This article belongs to the Special Issue Machine and Deep Learning for Affective Computing)
Show Figures

Figure 1

15 pages, 647 KiB  
Article
Enhanced Semantic Representation Learning for Sarcasm Detection by Integrating Context-Aware Attention and Fusion Network
by Shufeng Hao, Jikun Yao, Chongyang Shi, Yu Zhou, Shuang Xu, Dengao Li and Yinghan Cheng
Entropy 2023, 25(6), 878; https://doi.org/10.3390/e25060878 - 30 May 2023
Viewed by 1202
Abstract
Sarcasm is a sophisticated figurative language that is prevalent on social media platforms. Automatic sarcasm detection is significant for understanding the real sentiment tendencies of users. Traditional approaches mostly focus on content features by using lexicon, n-gram, and pragmatic feature-based models. However, these [...] Read more.
Sarcasm is a sophisticated figurative language that is prevalent on social media platforms. Automatic sarcasm detection is significant for understanding the real sentiment tendencies of users. Traditional approaches mostly focus on content features by using lexicon, n-gram, and pragmatic feature-based models. However, these methods ignore the diverse contextual clues that could provide more evidence of the sarcastic nature of sentences. In this work, we propose a Contextual Sarcasm Detection Model (CSDM) by modeling enhanced semantic representations with user profiling and forum topic information, where context-aware attention and a user-forum fusion network are used to obtain diverse representations from distinct aspects. In particular, we employ a Bi-LSTM encoder with context-aware attention to obtain a refined comment representation by capturing sentence composition information and the corresponding context situations. Then, we employ a user-forum fusion network to obtain the comprehensive context representation by capturing the corresponding sarcastic tendencies of the user and the background knowledge about the comments. Our proposed method achieves values of 0.69, 0.70, and 0.83 in terms of accuracy on the Main balanced, Pol balanced and Pol imbalanced datasets, respectively. The experimental results on a large Reddit corpus, SARC, demonstrate that our proposed method achieves a significant performance improvement over state-of-art textual sarcasm detection methods. Full article
(This article belongs to the Special Issue Machine and Deep Learning for Affective Computing)
Show Figures

Figure 1

15 pages, 897 KiB  
Article
An Entropy-Based Method with a New Benchmark Dataset for Chinese Textual Affective Structure Analysis
by Shufeng Xiong, Xiaobo Fan, Vishwash Batra, Yiming Zeng, Guipei Zhang, Lei Xi, Hebing Liu and Lei Shi
Entropy 2023, 25(5), 794; https://doi.org/10.3390/e25050794 - 13 May 2023
Cited by 1 | Viewed by 1175
Abstract
Affective understanding of language is an important research focus in artificial intelligence. The large-scale annotated datasets of Chinese textual affective structure (CTAS) are the foundation for subsequent higher-level analysis of documents. However, there are very few published datasets for CTAS. This paper introduces [...] Read more.
Affective understanding of language is an important research focus in artificial intelligence. The large-scale annotated datasets of Chinese textual affective structure (CTAS) are the foundation for subsequent higher-level analysis of documents. However, there are very few published datasets for CTAS. This paper introduces a new benchmark dataset for the task of CTAS to promote development in this research direction. Specifically, our benchmark is a CTAS dataset with the following advantages: (a) it is Weibo-based, which is the most popular Chinese social media platform used by the public to express their opinions; (b) it includes the most comprehensive affective structure labels at present; and (c) we propose a maximum entropy Markov model that incorporates neural network features and experimentally demonstrate that it outperforms the two baseline models. Full article
(This article belongs to the Special Issue Machine and Deep Learning for Affective Computing)
Show Figures

Figure 1

24 pages, 532 KiB  
Article
A Two-Stage Voting-Boosting Technique for Ensemble Learning in Social Network Sentiment Classification
by Su Cui, Yiliang Han, Yifei Duan, Yu Li, Shuaishuai Zhu and Chaoyue Song
Entropy 2023, 25(4), 555; https://doi.org/10.3390/e25040555 - 24 Mar 2023
Cited by 2 | Viewed by 1611
Abstract
In recent years, social network sentiment classification has been extensively researched and applied in various fields, such as opinion monitoring, market analysis, and commodity feedback. The ensemble approach has achieved remarkable results in sentiment classification tasks due to its superior performance. The primary [...] Read more.
In recent years, social network sentiment classification has been extensively researched and applied in various fields, such as opinion monitoring, market analysis, and commodity feedback. The ensemble approach has achieved remarkable results in sentiment classification tasks due to its superior performance. The primary reason behind the success of ensemble methods is the enhanced diversity of the base classifiers. The boosting method employs a sequential ensemble structure to construct diverse data while also utilizing erroneous data by assigning higher weights to misclassified samples in the next training round. However, this method tends to use a sequential ensemble structure, resulting in a long computation time. Conversely, the voting method employs a concurrent ensemble structure to reduce computation time but neglects the utilization of erroneous data. To address this issue, this study combines the advantages of voting and boosting methods and proposes a new two-stage voting boosting (2SVB) concurrent ensemble learning method for social network sentiment classification. This novel method not only establishes a concurrent ensemble framework to decrease computation time but also optimizes the utilization of erroneous data and enhances ensemble performance. To optimize the utilization of erroneous data, a two-stage training approach is implemented. Stage-1 training is performed on the datasets by employing a 3-fold cross-segmentation approach. Stage-2 training is carried out on datasets that have been augmented with the erroneous data predicted by stage 1. To augment the diversity of base classifiers, the training stage employs five pre-trained deep learning (PDL) models with heterogeneous pre-training frameworks as base classifiers. To reduce the computation time, a two-stage concurrent ensemble framework was established. The experimental results demonstrate that the proposed method achieves an F1 score of 0.8942 on the coronavirus tweet sentiment dataset, surpassing other comparable ensemble methods. Full article
(This article belongs to the Special Issue Machine and Deep Learning for Affective Computing)
Show Figures

Figure 1

19 pages, 639 KiB  
Article
Dual-ATME: Dual-Branch Attention Network for Micro-Expression Recognition
by Haoliang Zhou, Shucheng Huang, Jingting Li and Su-Jing Wang
Entropy 2023, 25(3), 460; https://doi.org/10.3390/e25030460 - 06 Mar 2023
Cited by 6 | Viewed by 1662
Abstract
Micro-expression recognition (MER) is challenging due to the difficulty of capturing the instantaneous and subtle motion changes of micro-expressions (MEs). Early works based on hand-crafted features extracted from prior knowledge showed some promising results, but have recently been replaced by deep learning methods [...] Read more.
Micro-expression recognition (MER) is challenging due to the difficulty of capturing the instantaneous and subtle motion changes of micro-expressions (MEs). Early works based on hand-crafted features extracted from prior knowledge showed some promising results, but have recently been replaced by deep learning methods based on the attention mechanism. However, with limited ME sample sizes, features extracted by these methods lack discriminative ME representations, in yet-to-be improved MER performance. This paper proposes the Dual-branch Attention Network (Dual-ATME) for MER to address the problem of ineffective single-scale features representing MEs. Specifically, Dual-ATME consists of two components: Hand-crafted Attention Region Selection (HARS) and Automated Attention Region Selection (AARS). HARS uses prior knowledge to manually extract features from regions of interest (ROIs). Meanwhile, AARS is based on attention mechanisms and extracts hidden information from data automatically. Finally, through similarity comparison and feature fusion, the dual-scale features could be used to learn ME representations effectively. Experiments on spontaneous ME datasets (including CASME II, SAMM, SMIC) and their composite dataset, MEGC2019-CD, showed that Dual-ATME achieves better, or more competitive, performance than the state-of-the-art MER methods. Full article
(This article belongs to the Special Issue Machine and Deep Learning for Affective Computing)
Show Figures

Figure 1

16 pages, 27860 KiB  
Article
Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation
by Huawei Tao, Shuai Shan, Ziyi Hu, Chunhua Zhu and Hongyi Ge
Entropy 2023, 25(1), 68; https://doi.org/10.3390/e25010068 - 30 Dec 2022
Viewed by 1507
Abstract
The absence of labeled samples limits the development of speech emotion recognition (SER). Data augmentation is an effective way to address sample sparsity. However, there is a lack of research on data augmentation algorithms in the field of SER. In this paper, the [...] Read more.
The absence of labeled samples limits the development of speech emotion recognition (SER). Data augmentation is an effective way to address sample sparsity. However, there is a lack of research on data augmentation algorithms in the field of SER. In this paper, the effectiveness of classical acoustic data augmentation methods in SER is analyzed, based on which a strong generalized speech emotion recognition model based on effective data augmentation is proposed. The model uses a multi-channel feature extractor consisting of multiple sub-networks to extract emotional representations. Different kinds of augmented data that can effectively improve SER performance are fed into the sub-networks, and the emotional representations are obtained by the weighted fusion of the output feature maps of each sub-network. And in order to make the model robust to unseen speakers, we employ adversarial training to generalize emotion representations. A discriminator is used to estimate the Wasserstein distance between the feature distributions of different speakers and to force the feature extractor to learn the speaker-invariant emotional representations by adversarial training. The simulation experimental results on the IEMOCAP corpus show that the performance of the proposed method is 2–9% ahead of the related SER algorithm, which proves the effectiveness of the proposed method. Full article
(This article belongs to the Special Issue Machine and Deep Learning for Affective Computing)
Show Figures

Figure 1

14 pages, 3419 KiB  
Article
DIA-TTS: Deep-Inherited Attention-Based Text-to-Speech Synthesizer
by Junxiao Yu, Zhengyuan Xu, Xu He, Jian Wang, Bin Liu, Rui Feng, Songsheng Zhu, Wei Wang and Jianqing Li
Entropy 2023, 25(1), 41; https://doi.org/10.3390/e25010041 - 26 Dec 2022
Cited by 4 | Viewed by 2431
Abstract
Text-to-speech (TTS) synthesizers have been widely used as a vital assistive tool in various fields. Traditional sequence-to-sequence (seq2seq) TTS such as Tacotron2 uses a single soft attention mechanism for encoder and decoder alignment tasks, which is the biggest shortcoming that incorrectly or repeatedly [...] Read more.
Text-to-speech (TTS) synthesizers have been widely used as a vital assistive tool in various fields. Traditional sequence-to-sequence (seq2seq) TTS such as Tacotron2 uses a single soft attention mechanism for encoder and decoder alignment tasks, which is the biggest shortcoming that incorrectly or repeatedly generates words when dealing with long sentences. It may also generate sentences with run-on and wrong breaks regardless of punctuation marks, which causes the synthesized waveform to lack emotion and sound unnatural. In this paper, we propose an end-to-end neural generative TTS model that is based on the deep-inherited attention (DIA) mechanism along with an adjustable local-sensitive factor (LSF). The inheritance mechanism allows multiple iterations of the DIA by sharing the same training parameter, which tightens the token–frame correlation, as well as fastens the alignment process. In addition, LSF is adopted to enhance the context connection by expanding the DIA concentration region. In addition, a multi-RNN block is used in the decoder for better acoustic feature extraction and generation. Hidden-state information driven from the multi-RNN layers is utilized for attention alignment. The collaborative work of the DIA and multi-RNN layers contributes to outperformance in the high-quality prediction of the phrase breaks of the synthesized speech. We used WaveGlow as a vocoder for real-time, human-like audio synthesis. Human subjective experiments show that the DIA-TTS achieved a mean opinion score (MOS) of 4.48 in terms of naturalness. Ablation studies further prove the superiority of the DIA mechanism for the enhancement of phrase breaks and attention robustness. Full article
(This article belongs to the Special Issue Machine and Deep Learning for Affective Computing)
Show Figures

Figure 1

13 pages, 2210 KiB  
Article
BHGAttN: A Feature-Enhanced Hierarchical Graph Attention Network for Sentiment Analysis
by Junjun Zhang, Zhengyan Cui, Hyun Jun Park and Giseop Noh
Entropy 2022, 24(11), 1691; https://doi.org/10.3390/e24111691 - 18 Nov 2022
Cited by 1 | Viewed by 1732
Abstract
Recently, with the rise of deep learning, text classification techniques have developed rapidly. However, the existing work usually takes the entire text as the modeling object and pays less attention to the hierarchical structure within the text, ignoring the internal connection between the [...] Read more.
Recently, with the rise of deep learning, text classification techniques have developed rapidly. However, the existing work usually takes the entire text as the modeling object and pays less attention to the hierarchical structure within the text, ignoring the internal connection between the upper and lower sentences. To address these issues, this paper proposes a Bert-based hierarchical graph attention network model (BHGAttN) based on a large-scale pretrained model and graph attention network to model the hierarchical relationship of texts. During modeling, the semantic features are enhanced by the output of the intermediate layer of BERT, and the multilevel hierarchical graph network corresponding to each layer of BERT is constructed by using the dependencies between the whole sentence and the subsentence. This model pays attention to the layer-by-layer semantic information and the hierarchical relationship within the text. The experimental results show that the BHGAttN model exhibits significant competitive advantages compared with the current state-of-the-art baseline models. Full article
(This article belongs to the Special Issue Machine and Deep Learning for Affective Computing)
Show Figures

Figure 1

Review

Jump to: Research

33 pages, 497 KiB  
Review
A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face
by Hailun Lian, Cheng Lu, Sunan Li, Yan Zhao, Chuangao Tang and Yuan Zong
Entropy 2023, 25(10), 1440; https://doi.org/10.3390/e25101440 - 12 Oct 2023
Cited by 2 | Viewed by 4903
Abstract
Multimodal emotion recognition (MER) refers to the identification and understanding of human emotional states by combining different signals, including—but not limited to—text, speech, and face cues. MER plays a crucial role in the human–computer interaction (HCI) domain. With the recent progression of deep [...] Read more.
Multimodal emotion recognition (MER) refers to the identification and understanding of human emotional states by combining different signals, including—but not limited to—text, speech, and face cues. MER plays a crucial role in the human–computer interaction (HCI) domain. With the recent progression of deep learning technologies and the increasing availability of multimodal datasets, the MER domain has witnessed considerable development, resulting in numerous significant research breakthroughs. However, a conspicuous absence of thorough and focused reviews on these deep learning-based MER achievements is observed. This survey aims to bridge this gap by providing a comprehensive overview of the recent advancements in MER based on deep learning. For an orderly exposition, this paper first outlines a meticulous analysis of the current multimodal datasets, emphasizing their advantages and constraints. Subsequently, we thoroughly scrutinize diverse methods for multimodal emotional feature extraction, highlighting the merits and demerits of each method. Moreover, we perform an exhaustive analysis of various MER algorithms, with particular focus on the model-agnostic fusion methods (including early fusion, late fusion, and hybrid fusion) and fusion based on intermediate layers of deep models (encompassing simple concatenation fusion, utterance-level interaction fusion, and fine-grained interaction fusion). We assess the strengths and weaknesses of these fusion strategies, providing guidance to researchers to help them select the most suitable techniques for their studies. In summary, this survey aims to provide a thorough and insightful review of the field of deep learning-based MER. It is intended as a valuable guide to aid researchers in furthering the evolution of this dynamic and impactful field. Full article
(This article belongs to the Special Issue Machine and Deep Learning for Affective Computing)
Show Figures

Figure 1

Back to TopTop