Applied AI in Emotion Recognition

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 15 October 2024 | Viewed by 5199

Special Issue Editors


E-Mail Website1 Website2
Guest Editor
Zhejiang Lab, Hangzhou 311121, China
Interests: affective computing; speech signal processing; machine learning; digital health

E-Mail Website
Guest Editor
School of Information and Communication Engineering, Nanjing Institute of Technology, Nanjing 211167, China
Interests: signal processing; affective computing; speech signal processing

E-Mail Website
Guest Editor
School of Information Science and Engineering, Southeast University, Nanjing 210096, China
Interests: speech signal processing; speech emotion recognition; machine learning

Special Issue Information

Dear Colleagues,

In recent years, the use of artificial intelligence in emotion recognition has become increasingly important, and the development of intelligent computing technology has significantly aided the growth of emotion-related research. Thus, application scenarios have expanded beyond human–computer collaboration to include health care, security, business intelligence, and education platforms.

As a result of artificial intelligence, emotional intelligence has been improved, and this improvement is not limited to the classification of emotion types; it also requires higher standards for the interaction with emotional information. It is necessary to investigate new methods and mechanisms from the perspectives of emotion perception, cognition, expression, and generation.

In the field of emotion recognition, we are currently facing a number of intriguing challenges and research topics. For emotional databases, there is a challenge regarding how to collect data in a natural and non-intrusive manner. There are also challenges associated with learning and adapting. Adapting to language, personal, and context differences, as well as facial masks during the COVID-19 pandemic, are examples of how we can comprehend emotional meanings in a variety of circumstances. In contrast, the rapid development and widespread application of physiological sensors have provided us with access to emotional data 24 hours a day, 7 days a week. New technologies pertaining to sensors and embedded emotion and health systems are becoming increasingly popular research topics.

The primary purpose of this Special Issue of Electronics is to present newly emerged research interests in specific emotions and novel A.I. approaches. In addition, it will present novel developments in intelligent computing methods and novel hardware systems to advance emotion recognition technology for the scientific community and industry.

Topics include, but are not limited to, the following:

  • Emotion recognition, conversion and synthesis;
  • Identification of emotions related to learning and cognitive processes;
  • Health-related emotional states studies, such as depression, ASD and fatigue;
  • Analysis of novel emotional features to improve generality and robustness;
  • Micro-expression recognition, vocal burst recognition, and deceptive speech detection;
  • Personalized and contextual adaptation;
  • Embedded systems for emotion recognition, including new methods and new protocols;
  • Practical applications of emotion recognition technology, including call-center applications, health and medical applications, education and online-learning applications.

Dr. Chengwei Huang
Prof. Dr. Yongqiang Bao
Prof. Dr. Li Zhao
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • emotion recognition
  • health-related emotions
  • learning and cognitive process
  • deceptive speech detection
  • cross-database emotion recognition
  • personalized adaptation
  • embedded systems

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 2467 KiB  
Article
Multi-Representation Joint Dynamic Domain Adaptation Network for Cross-Database Facial Expression Recognition
by Jingjie Yan, Yuebo Yue, Kai Yu, Xiaoyang Zhou, Ying Liu, Jinsheng Wei and Yuan Yang
Electronics 2024, 13(8), 1470; https://doi.org/10.3390/electronics13081470 - 12 Apr 2024
Viewed by 290
Abstract
In order to obtain more fine-grained information from multiple sub-feature spaces for domain adaptation, this paper proposes a novel multi-representation joint dynamic domain adaptation network (MJDDAN) and applies it to achieve cross-database facial expression recognition. The MJDDAN uses a hybrid structure to extract [...] Read more.
In order to obtain more fine-grained information from multiple sub-feature spaces for domain adaptation, this paper proposes a novel multi-representation joint dynamic domain adaptation network (MJDDAN) and applies it to achieve cross-database facial expression recognition. The MJDDAN uses a hybrid structure to extract multi-representation features and maps the original facial expression features into multiple sub-feature spaces, aligning the expression features of the source domain and target domain in multiple sub-feature spaces from different angles to extract features more comprehensively. Moreover, the MJDDAN proposes the Joint Dynamic Maximum Mean Difference (JD-MMD) model to reduce the difference in feature distribution between different subdomains by simultaneously minimizing the maximum mean difference and local maximum mean difference in each substructure. Three databases, including eNTERFACE, FABO, and RAVDESS, are used to design a large number of cross-database transfer learning facial expression recognition experiments. The accuracy of emotion recognition experiments with eNTERFACE, FABO, and RAVDESS as target domains reach 53.64%, 43.66%, and 35.87%, respectively. Compared to the best comparison method chosen in this article, the accuracy rates were improved by 1.79%, 0.85%, and 1.02%, respectively. Full article
(This article belongs to the Special Issue Applied AI in Emotion Recognition)
Show Figures

Figure 1

14 pages, 567 KiB  
Article
Fidgety Speech Emotion Recognition for Learning Process Modeling
by Ming Zhu, Chunchieh Wang and Chengwei Huang
Electronics 2024, 13(1), 146; https://doi.org/10.3390/electronics13010146 - 28 Dec 2023
Viewed by 444
Abstract
In this paper, the recognition of fidgety speech emotion is studied, and real-world speech emotions are collected to enhance emotion recognition in practical scenarios, especially for cognitive tasks. We first focused on eliciting fidgety emotions and data acquisition for general math learning. Students [...] Read more.
In this paper, the recognition of fidgety speech emotion is studied, and real-world speech emotions are collected to enhance emotion recognition in practical scenarios, especially for cognitive tasks. We first focused on eliciting fidgety emotions and data acquisition for general math learning. Students practice mathematics by performing operations, solving problems, and orally responding to questions, all of which are recorded as audio data. Subsequently, the teacher evaluates the accuracy of these mathematical exercises by scoring, which reflects the cognitive outcomes of the students. Secondly, we propose an end-to-end speech emotion model based on a multi-scale one-dimensional (1-D) residual convolutional neural network. Finally, we conducted an experiment to recognize fidgety speech emotions by testing various classifiers, including SVM, LSTM, 1-D CNN, and the proposed multi-scale 1-D CNN. The experimental results show that the classifier we constructed can identify fidgety emotion well. After conducting a thorough analysis of fidgety emotions and their influence on the learning process, a clear relationship between the two was apparent. The automatic recognition of fidgety emotions is valuable for assisting on-line math teaching. Full article
(This article belongs to the Special Issue Applied AI in Emotion Recognition)
Show Figures

Figure 1

15 pages, 5269 KiB  
Article
Multimodal Emotion Recognition in Conversation Based on Hypergraphs
by Jiaze Li, Hongyan Mei, Liyun Jia and Xing Zhang
Electronics 2023, 12(22), 4703; https://doi.org/10.3390/electronics12224703 - 19 Nov 2023
Viewed by 873
Abstract
In recent years, sentiment analysis in conversation has garnered increasing attention due to its widespread applications in areas such as social media analytics, sentiment mining, and electronic healthcare. Existing research primarily focuses on sequence learning and graph-based approaches, yet they overlook the high-order [...] Read more.
In recent years, sentiment analysis in conversation has garnered increasing attention due to its widespread applications in areas such as social media analytics, sentiment mining, and electronic healthcare. Existing research primarily focuses on sequence learning and graph-based approaches, yet they overlook the high-order interactions between different modalities and the long-term dependencies within each modality. To address these problems, this paper proposes a novel hypergraph-based method for multimodal emotion recognition in conversation (MER-HGraph). MER-HGraph extracts features from three modalities: acoustic, text, and visual. It treats each modality utterance in a conversation as a node and constructs intra-modal hypergraphs (Intra-HGraph) and inter-modal hypergraphs (Inter-HGraph) using hyperedges. The hypergraphs are then updated using hypergraph convolutional networks. Additionally, to mitigate noise in acoustic data and mitigate the impact of fixed time scales, we introduce a dynamic time window module to capture local-global information from acoustic signals. Extensive experiments on the IEMOCAP and MELD datasets demonstrate that MER-HGraph outperforms existing models in multimodal emotion recognition tasks, leveraging high-order information from multimodal data to enhance recognition capabilities. Full article
(This article belongs to the Special Issue Applied AI in Emotion Recognition)
Show Figures

Figure 1

15 pages, 2332 KiB  
Article
Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network
by Zhichao Peng, Hua Zeng, Yongwei Li, Yegang Du and Jianwu Dang
Electronics 2023, 12(22), 4620; https://doi.org/10.3390/electronics12224620 - 12 Nov 2023
Viewed by 748
Abstract
Dimensional emotion can better describe rich and fine-grained emotional states than categorical emotion. In the realm of human–robot interaction, the ability to continuously recognize dimensional emotions from speech empowers robots to capture the temporal dynamics of a speaker’s emotional state and adjust their [...] Read more.
Dimensional emotion can better describe rich and fine-grained emotional states than categorical emotion. In the realm of human–robot interaction, the ability to continuously recognize dimensional emotions from speech empowers robots to capture the temporal dynamics of a speaker’s emotional state and adjust their interaction strategies in real-time. In this study, we present an approach to enhance dimensional emotion recognition through modulation-filtered cochleagram and parallel attention recurrent neural network (PA-net). Firstly, the multi-resolution modulation-filtered cochleagram is derived from speech signals through auditory signal processing. Subsequently, the PA-net is employed to establish multi-temporal dependencies from diverse scales of features, enabling the tracking of the dynamic variations in dimensional emotion within auditory modulation sequences. The results obtained from experiments conducted on the RECOLA dataset demonstrate that, at the feature level, the modulation-filtered cochleagram surpasses other assessed features in its efficacy to forecast valence and arousal. Particularly noteworthy is its pronounced superiority in scenarios characterized by a high signal-to-noise ratio. At the model level, the PA-net attains the highest predictive performance for both valence and arousal, clearly outperforming alternative regression models. Furthermore, the experiments carried out on the SEWA dataset demonstrate the substantial enhancements brought about by the proposed method in valence and arousal prediction. These results collectively highlight the potency and effectiveness of our approach in advancing the field of dimensional speech emotion recognition. Full article
(This article belongs to the Special Issue Applied AI in Emotion Recognition)
Show Figures

Figure 1

15 pages, 4532 KiB  
Article
MMATERIC: Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in Conversation
by Xingwei Liang, You Zou, Xinnan Zhuang, Jie Yang, Taiyu Niu and Ruifeng Xu
Electronics 2023, 12(7), 1534; https://doi.org/10.3390/electronics12071534 - 24 Mar 2023
Cited by 3 | Viewed by 1744
Abstract
The accurate recognition of emotions in conversations helps understand the speaker’s intentions and facilitates various analyses in artificial intelligence, especially in human–computer interaction systems. However, most previous methods need more ability to track the different emotional states of each speaker in a dialogue. [...] Read more.
The accurate recognition of emotions in conversations helps understand the speaker’s intentions and facilitates various analyses in artificial intelligence, especially in human–computer interaction systems. However, most previous methods need more ability to track the different emotional states of each speaker in a dialogue. To alleviate this dilemma, we propose a new approach, Multi-Task Learning and Multi-Fusion AudioText Emotion Recognition in Conversation (MMATERIC) for emotion recognition in conversation. MMATERIC can refer to and combine the benefits of two distinct tasks: emotion recognition in text and emotion recognition in speech, and production of fused multimodal features to recognize the emotions of different speakers in dialogue. At the core of MATTERIC are three modules: an encoder with multimodal attention, a speaker emotion detection unit (SED-Unit), and a decoder with speaker emotion detection Bi-LSTM (SED-Bi-LSTM). Together, these three modules model the changing emotions of a speaker at a given moment in a conversation. Meanwhile, we adopt multiple fusion strategies in different stages, mainly using model fusion and decision stage fusion to improve the model’s accuracy. Simultaneously, our multimodal framework allows features to interact across modalities and allows potential adaptation flows from one modality to another. Our experimental results on two benchmark datasets show that our proposed method is effective and outperforms the state-of-the-art baseline methods. The performance improvement of our method is mainly attributed to the combination of three core modules of MATTERIC and the different fusion methods we adopt in each stage. Full article
(This article belongs to the Special Issue Applied AI in Emotion Recognition)
Show Figures

Figure 1

Back to TopTop