Advances in Emotion Recognition and Affective Computing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (31 October 2023) | Viewed by 19656

Special Issue Editors


E-Mail Website
Guest Editor
School of Engineering, Polytechnic of Porto, 4200-072 Porto, Portugal
Interests: multimedia; content annotation; computer vision; machine learning; visualization

E-Mail Website
Guest Editor
Centre for Telecommunications and Multimedia at INESC TEC—Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal
Interests: computer vision; image processing; multimedia; machine learning; video analytics
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Lasige, Faculty of Sciences, University of Lisbon, Lisbon, Portugal
Interests: multimedia; HCI; hypervideo; visualization; affective computing

Special Issue Information

Dear Colleagues,

The ability to automatically recognize and interpret emotions may have a great impact in several areas of application. Examples include the ability to describe data based on conveyed emotions, to personalize and recommend content based on the impact it may have on the spectator, or to aid decision making in hostile environments.

Boosted by the increasing amount of user-generated content posted on social media and online platforms, by the spread of professional media streaming services, by the need to automatically repurpose and automatically generate new emotion-aware content and to process the increasingly larger amounts of data that are captured, research has been attempting to endow machines with cognitive capabilities to recognize, interpret and express emotions and sentiments.

Additionally, the advent of new and challenging applications such as self-driving technologies and intelligent urban transport systems, including car-sharing services and autonomous vehicles, increase the need to recognize emotions that may help to identify dangerous situations.

Affective computing is a domain where emotion recognition plays an important role so that machines can interpret the emotional state of humans and adapt their behavior to them, giving an appropriate response to those emotions.

This Special Issue aims to cover different aspects of emotion recognition and affective computing that can lead to significant advances in this field. The goal is to collect a diverse set of works that span a wide range of topics, data modalities, approaches and application areas, including media content analysis for emotion-aware content retrieval, automatic content creation, decision making and new paradigms for content visualization, navigation and interaction.

Dr. Paula Viana
Dr. Pedro Carvalho
Dr. Teresa Chambel
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • multimodal emotion recognition
  • emotion-aware content visualization
  • datasets for emotion recognition
  • facial emotion recognition
  • body gesture analysis for automatic emotion recognition
  • technologies for human emotion recognition
  • NLP for emotion recognition
  • music emotion classification
  • video emotion classification

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 1151 KiB  
Article
Aspect-Level Sentiment Analysis Based on Syntax-Aware and Graph Convolutional Networks
by Qun Gu, Zhidong Wang, Hai Zhang, Siyi Sui and Rui Wang
Appl. Sci. 2024, 14(2), 729; https://doi.org/10.3390/app14020729 - 15 Jan 2024
Viewed by 710
Abstract
Aspect-level sentiment analysis is a task of identifying and understanding the sentiment polarity of specific aspects of a sentence. In recent years, significant progress has been made in aspect-level sentiment analysis models based on graph convolutional neural networks. However, existing models still have [...] Read more.
Aspect-level sentiment analysis is a task of identifying and understanding the sentiment polarity of specific aspects of a sentence. In recent years, significant progress has been made in aspect-level sentiment analysis models based on graph convolutional neural networks. However, existing models still have some shortcomings, such as aspect-level sentiment analysis models based on graph convolutional networks not making full use of the information of specific aspects in a sentence and ignoring the enhancement of the model by external general knowledge of sentiment. In order to solve these problems, this paper proposes a sentiment analysis model based on the Syntax-Aware and Graph Convolutional Network (SAGCN). The model first integrates aspect-specific features into contextual information, and second incorporates external sentiment knowledge to enhance the model’s ability to perceive sentiment information. Finally, a multi-head self-attention mechanism and Point-wise Convolutional Transformer (PCT) are applied to capture the semantic information of the sentence. The semantic and syntactic information of the sentences are considered together. Experimental results on three benchmark datasets show that the SAGCN model is able to achieve superior performance compared to the benchmark methods. Full article
(This article belongs to the Special Issue Advances in Emotion Recognition and Affective Computing)
Show Figures

Figure 1

14 pages, 4845 KiB  
Article
Comparative Analysis of Vision Transformer Models for Facial Emotion Recognition Using Augmented Balanced Datasets
by Sukhrob Bobojanov, Byeong Man Kim, Mukhriddin Arabboev and Shohruh Begmatov
Appl. Sci. 2023, 13(22), 12271; https://doi.org/10.3390/app132212271 - 13 Nov 2023
Cited by 3 | Viewed by 1988
Abstract
Facial emotion recognition (FER) has a huge importance in the field of human–machine interface. Given the intricacies of human facial expressions and the inherent variations in images, which are characterized by diverse facial poses and lighting conditions, the task of FER remains a [...] Read more.
Facial emotion recognition (FER) has a huge importance in the field of human–machine interface. Given the intricacies of human facial expressions and the inherent variations in images, which are characterized by diverse facial poses and lighting conditions, the task of FER remains a challenging endeavour for computer-based models. Recent advancements have seen vision transformer (ViT) models attain state-of-the-art results across various computer vision tasks, encompassing image classification, object detection, and segmentation. Moreover, one of the most important aspects of creating strong machine learning models is correcting data imbalances. To avoid biased predictions and guarantee reliable findings, it is essential to maintain the distribution equilibrium of the training dataset. In this work, we have chosen two widely used open-source datasets, RAF-DB and FER2013. As well as resolving the imbalance problem, we present a new, balanced dataset, applying data augmentation techniques and cleaning poor-quality images from the FER2013 dataset. We then conduct a comprehensive evaluation of thirteen different ViT models with these three datasets. Our investigation concludes that ViT models present a promising approach for FER tasks. Among these ViT models, Mobile ViT and Tokens-to-Token ViT models appear to be the most effective, followed by PiT and Cross Former models. Full article
(This article belongs to the Special Issue Advances in Emotion Recognition and Affective Computing)
Show Figures

Figure 1

17 pages, 2974 KiB  
Article
Attention-Guided Network Model for Image-Based Emotion Recognition
by Herag Arabian, Alberto Battistel, J. Geoffrey Chase and Knut Moeller
Appl. Sci. 2023, 13(18), 10179; https://doi.org/10.3390/app131810179 - 10 Sep 2023
Cited by 1 | Viewed by 1002
Abstract
Neural networks are increasingly able to outperform traditional machine learning and filtering approaches in classification tasks. However, with the rise in their popularity, many unknowns still exist when it comes to the internal learning processes of the networks in terms of how they [...] Read more.
Neural networks are increasingly able to outperform traditional machine learning and filtering approaches in classification tasks. However, with the rise in their popularity, many unknowns still exist when it comes to the internal learning processes of the networks in terms of how they make the right decisions for prediction. As a result, in this work, different attention modules integrated into a convolutional neural network coupled with an attention-guided strategy were examined for facial emotion recognition performance. A custom attention block, AGFER, was developed and evaluated against two other well-known modules of squeeze–excitation and convolution block attention modules and compared with the base model architecture. All models were trained and validated using a subset from the OULU-CASIA database. Afterward, cross-database testing was performed using the FACES dataset to assess the generalization capability of the trained models. The results showed that the proposed attention module with the guidance strategy showed better performance than the base architecture while maintaining similar results versus other popular attention modules. The developed AGFER attention-integrated model focused on relevant features for facial emotion recognition, highlighting the efficacy of guiding the model during the integral training process. Full article
(This article belongs to the Special Issue Advances in Emotion Recognition and Affective Computing)
Show Figures

Figure 1

28 pages, 3449 KiB  
Article
Evaluating the Influence of Room Illumination on Camera-Based Physiological Measurements for the Assessment of Screen-Based Media
by Joseph Williams, Jon Francombe and Damian Murphy
Appl. Sci. 2023, 13(14), 8482; https://doi.org/10.3390/app13148482 - 22 Jul 2023
Viewed by 1121
Abstract
Camera-based solutions can be a convenient means of collecting physiological measurements indicative of psychological responses to stimuli. However, the low illumination playback conditions commonly associated with viewing screen-based media oppose the bright conditions recommended for accurately recording physiological data with a camera. A [...] Read more.
Camera-based solutions can be a convenient means of collecting physiological measurements indicative of psychological responses to stimuli. However, the low illumination playback conditions commonly associated with viewing screen-based media oppose the bright conditions recommended for accurately recording physiological data with a camera. A study was designed to determine the feasibility of obtaining physiological data, for psychological insight, in illumination conditions representative of real world viewing experiences. In this study, a novel method was applied for testing a first-of-its-kind system for measuring both heart rate and facial actions from video footage recorded with a single discretely placed camera. Results suggest that conditions representative of a bright domestic setting should be maintained when using this technology, despite this being considered a sub-optimal playback condition. Further analyses highlight that even within this bright condition, both the camera-measured facial action and heart rate data contained characteristic errors. In future research, the influence of these performance issues on psychological insights may be mitigated by reducing the temporal resolution of the heart rate measurements and ignoring fast and low-intensity facial movements. Full article
(This article belongs to the Special Issue Advances in Emotion Recognition and Affective Computing)
Show Figures

Figure 1

15 pages, 750 KiB  
Article
Wearable-Based Intelligent Emotion Monitoring in Older Adults during Daily Life Activities
by Eduardo Gutierrez Maestro, Tiago Rodrigues De Almeida, Erik Schaffernicht and Óscar Martinez Mozos
Appl. Sci. 2023, 13(9), 5637; https://doi.org/10.3390/app13095637 - 03 May 2023
Cited by 1 | Viewed by 1819
Abstract
We present a system designed to monitor the well-being of older adults during their daily activities. To automatically detect and classify their emotional state, we collect physiological data through a wearable medical sensor. Ground truth data are obtained using a simple smartphone app [...] Read more.
We present a system designed to monitor the well-being of older adults during their daily activities. To automatically detect and classify their emotional state, we collect physiological data through a wearable medical sensor. Ground truth data are obtained using a simple smartphone app that provides ecological momentary assessment (EMA), a method for repeatedly sampling people’s current experiences in real time in their natural environments. We are making the resulting dataset publicly available as a benchmark for future comparisons and methods. We are evaluating two feature selection methods to improve classification performance and proposing a feature set that augments and contrasts domain expert knowledge based on time-analysis features. The results demonstrate an improvement in classification accuracy when using the proposed feature selection methods. Furthermore, the feature set we present is better suited for predicting emotional states in a leave-one-day-out experimental setup, as it identifies more patterns. Full article
(This article belongs to the Special Issue Advances in Emotion Recognition and Affective Computing)
Show Figures

Figure 1

23 pages, 756 KiB  
Article
Utilizing Machine Learning for Detecting Harmful Situations by Audio and Text
by Merav Allouch, Noa Mansbach, Amos Azaria and Rina Azoulay
Appl. Sci. 2023, 13(6), 3927; https://doi.org/10.3390/app13063927 - 20 Mar 2023
Cited by 2 | Viewed by 2300
Abstract
Children with special needs may struggle to identify uncomfortable and unsafe situations. In this study, we aimed at developing an automated system that can detect such situations based on audio and text cues to encourage children’s safety and prevent situations of violence toward [...] Read more.
Children with special needs may struggle to identify uncomfortable and unsafe situations. In this study, we aimed at developing an automated system that can detect such situations based on audio and text cues to encourage children’s safety and prevent situations of violence toward them. We composed a text and audio database with over 1891 sentences extracted from videos presenting real-world situations, and categorized them into three classes: neutral sentences, insulting sentences, and sentences indicating unsafe conditions. We compared insulting and unsafe sentence-detection abilities of various machine-learning methods. In particular, we found that a deep neural network that accepts the text embedding vectors of bidirectional encoder representations from transformers (BERT) and audio embedding vectors of Wav2Vec as input attains the highest accuracy in detecting unsafe and insulting situations. Our results indicate that it may be applicable to build an automated agent that can detect unsafe and unpleasant situations that children with special needs may encounter, given the dialogue contexts conducted with these children. Full article
(This article belongs to the Special Issue Advances in Emotion Recognition and Affective Computing)
Show Figures

Figure 1

11 pages, 3930 KiB  
Communication
DevEmo—Software Developers’ Facial Expression Dataset
by Michalina Manikowska, Damian Sadowski, Adam Sowinski and Michal R. Wrobel
Appl. Sci. 2023, 13(6), 3839; https://doi.org/10.3390/app13063839 - 17 Mar 2023
Cited by 2 | Viewed by 1369
Abstract
The COVID-19 pandemic has increased the relevance of remote activities and digital tools for education, work, and other aspects of daily life. This reality has highlighted the need for emotion recognition technology to better understand the emotions of computer users and provide support [...] Read more.
The COVID-19 pandemic has increased the relevance of remote activities and digital tools for education, work, and other aspects of daily life. This reality has highlighted the need for emotion recognition technology to better understand the emotions of computer users and provide support in remote environments. Emotion recognition can play a critical role in improving the remote experience and ensuring that individuals are able to effectively engage in computer-based tasks remotely. This paper presents a new dataset, DevEmo, that can be used to train deep learning models for the purpose of emotion recognition of computer users. The dataset consists of 217 video clips of 33 students solving programming tasks. The recordings were collected in the participants’ actual work environment, capturing the students’ facial expressions as they engaged in programming tasks. The DevEmo dataset is labeled to indicate the presence of the four emotions (anger, confusion, happiness, and surprise) and a neutral state. The dataset provides a unique opportunity to explore the relationship between emotions and computer-related activities, and has the potential to support the development of more personalized and effective tools for computer-based learning environments. Full article
(This article belongs to the Special Issue Advances in Emotion Recognition and Affective Computing)
Show Figures

Figure 1

27 pages, 7665 KiB  
Article
Poetry in Pandemic: A Multimodal Neuroaesthetic Study on the Emotional Reaction to the Divina Commedia Poem
by Bianca Maria Serena Inguscio, Giulia Cartocci, Simone Palmieri, Stefano Menicocci, Alessia Vozzi, Andrea Giorgi, Silvia Ferrara, Paolo Canettieri and Fabio Babiloni
Appl. Sci. 2023, 13(6), 3720; https://doi.org/10.3390/app13063720 - 14 Mar 2023
Cited by 2 | Viewed by 1813
Abstract
Poetry elicits emotions, and emotion is a fundamental component of human ontogeny. Although neuroaesthetics is a rapidly developing field of research, few studies focus on poetry, and none address its different modalities of fruition (MOF) of universal cultural heritage works, such as the [...] Read more.
Poetry elicits emotions, and emotion is a fundamental component of human ontogeny. Although neuroaesthetics is a rapidly developing field of research, few studies focus on poetry, and none address its different modalities of fruition (MOF) of universal cultural heritage works, such as the Divina Commedia (DC) poem. Moreover, alexithymia (AX) resulted in being a psychological risk factor during the COVID-19 pandemic. The present study aims to investigate the emotional response to poetry excerpts from different cantica (Inferno, Purgatorio, Paradiso) of DC with the dual objective of assessing the impact of both the structure of the poem and MOF and that of the characteristics of the acting voice in experts and non-experts, also considering AX. Online emotion facial coding biosignal (BS) techniques, self-reported and psychometric measures were applied to 131 literary (LS) and scientific (SS) university students. BS results show that LS globally manifest more JOY than SS in both reading and listening MOF and more FEAR towards Inferno. Furthermore, LS and SS present different results regarding NEUTRAL emotion about acting voice. AX influences listening in NEUTRAL and SURPRISE expressions. DC’s structure affects DISGUST and SADNESS during listening, regardless of participant characteristics. PLEASANTNESS varies according to DC’s structure and the acting voice, as well as AROUSAL, which is also correlated with AX. Results are discussed in light of recent findings in affective neuroscience and neuroaesthetics, suggesting the critical role of poetry and listening in supporting human emotional processing. Full article
(This article belongs to the Special Issue Advances in Emotion Recognition and Affective Computing)
Show Figures

Figure 1

16 pages, 715 KiB  
Article
Meta Learning Based Deception Detection from Speech
by Noa Mansbach and Amos Azaria
Appl. Sci. 2023, 13(1), 626; https://doi.org/10.3390/app13010626 - 03 Jan 2023
Cited by 2 | Viewed by 2166
Abstract
It is difficult to overestimate the importance of detecting human deception, specifically by using speech cues. Indeed, several works attempt to detect deception from speech. Unfortunately, most works use the same people and environments in training and in testing. That is, they do [...] Read more.
It is difficult to overestimate the importance of detecting human deception, specifically by using speech cues. Indeed, several works attempt to detect deception from speech. Unfortunately, most works use the same people and environments in training and in testing. That is, they do not separate training samples from test samples according to the people who said each statement or by the environments in which each sample was recorded. This may result in less reliable detection results. In this paper, we take a meta-learning approach in which a model is trained on a variety of learning tasks to enable it to solve new learning tasks using only a few samples. In our approach, we split the data according to the persons (and recording environment), i.e., some people are used for training, and others are used for testing only, but we do assume a few labeled samples for each person in the data set. We introduce CHAML, a novel deep learning architecture that receives as input the sample in question along with two more truthful samples and non-truthful samples from the same person. We show that our method outperforms other state-of-the-art methods of deception detection based on speech and other approaches for meta-learning on our data-set. Namely, CHAML reaches an accuracy of 61.34% and an F1-Score of 0.3857, compared to an accuracy of only 55.82% and an F1-score of only 0.3444, achieved by a previous, most recent approach. Full article
(This article belongs to the Special Issue Advances in Emotion Recognition and Affective Computing)
Show Figures

Figure 1

24 pages, 2989 KiB  
Article
Emotion Detection Using Facial Expression Involving Occlusions and Tilt
by Awais Salman Qazi, Muhammad Shoaib Farooq, Furqan Rustam, Mónica Gracia Villar, Carmen Lili Rodríguez and Imran Ashraf
Appl. Sci. 2022, 12(22), 11797; https://doi.org/10.3390/app122211797 - 20 Nov 2022
Cited by 9 | Viewed by 2379
Abstract
Facial emotion recognition (FER) is an important and developing topic of research in the field of pattern recognition. The effective application of facial emotion analysis is gaining popularity in surveillance footage, expression analysis, activity recognition, home automation, computer games, stress treatment, patient observation, [...] Read more.
Facial emotion recognition (FER) is an important and developing topic of research in the field of pattern recognition. The effective application of facial emotion analysis is gaining popularity in surveillance footage, expression analysis, activity recognition, home automation, computer games, stress treatment, patient observation, depression, psychoanalysis, and robotics. Robot interfaces, emotion-aware smart agent systems, and efficient human–computer interaction all benefit greatly from facial expression recognition. This has garnered attention as a key prospect in recent years. However, due to shortcomings in the presence of occlusions, fluctuations in lighting, and changes in physical appearance, research on emotion recognition has to be improved. This paper proposes a new architecture design of a convolutional neural network (CNN) for the FER system and contains five convolution layers, one fully connected layer with rectified linear unit activation function, and a SoftMax layer. Additionally, the feature map enhancement is applied to accomplish a higher detection rate and higher precision. Lastly, an application is developed that mitigates the effects of the aforementioned problems and can identify the basic expressions of human emotions, such as joy, grief, surprise, fear, contempt, anger, etc. Results indicate that the proposed CNN achieves 92.66% accuracy with mixed datasets, while the accuracy for the cross dataset is 94.94%. Full article
(This article belongs to the Special Issue Advances in Emotion Recognition and Affective Computing)
Show Figures

Figure 1

Back to TopTop