SAFEPA: An Expandable Multi-Pose Facial Expressions Pain Assessment Method

Alghamdi, Thoria; Alaghband, Gita

doi:10.3390/app13127206

Open AccessArticle

SAFEPA: An Expandable Multi-Pose Facial Expressions Pain Assessment Method

by

Thoria Alghamdi

^1,2,* and

Gita Alaghband

¹

Department of Computer Science and Engineering, College of Engineering, University of Colorado Denver, Denver, CO 80204, USA

²

Department of Information Systems, College of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(12), 7206; https://doi.org/10.3390/app13127206

Submission received: 21 April 2023 / Revised: 4 June 2023 / Accepted: 12 June 2023 / Published: 16 June 2023

(This article belongs to the Special Issue Artificial Intelligence Methods in Healthcare and Clinical Decision Making)

Download

Browse Figures

Versions Notes

Abstract

:

Accurately assessing the intensity of pain from facial expressions captured in videos is crucial for effective pain management and critical for a wide range of healthcare applications. However, in uncontrolled environments, detecting facial expressions from full left and right profiles remains a significant challenge, and even the most advanced models for recognizing pain levels based on facial expressions can suffer from declining performance. In this study, we present a novel model designed to overcome the challenges posed by full left and right profiles—Sparse Autoencoders for Facial Expressions-based Pain Assessment (SAFEPA). Our model utilizes Sparse Autoencoders (SAE) to reconstruct the upper part of the face from the input image, and feeds both the original image and the reconstructed upper face into two pre-trained concurrent and coupled Convolutional Neural Networks (CNNs). This approach gives more weight to the upper part of the face, resulting in superior recognition performance. Moreover, SAFEPA’s design leverages CNNs’ strengths while also accommodating variations in head poses, thus eliminating the need for face detection and upper-face extraction preprocessing steps needed in other models. SAFEPA achieves high accuracy in recognizing four levels of pain on the widely used UNBC-McMaster shoulder pain expression archive dataset. SAFEPA is extended for facial expression recognition, where we show it to outperform state-of-the-art models in recognizing seven facial expressions viewed from five different angles, including the challenging full left and right profiles, on the Karolinska Directed Emotional Faces (KDEF) dataset. Furthermore, the SAFEPA system is capable of processing BioVid Heat Pain datasets with an average processing time of 17.82 s per video (5 s in length), while maintaining a competitive accuracy compared to other state-of-the-art pain detection systems. This experiment demonstrates its applicability in real-life scenarios for monitoring systems. With SAFEPA, we have opened new possibilities for accurate pain assessment, even in challenging situations with varying head poses.

Keywords:

autoencoders; convolutional neural network; deep learning; facial expression recognition; pain assessment

1. Introduction

Assessing and tracking patients’ pain levels over time is critical in determining the effectiveness of medical treatments [1,2] and preventing the development of chronic pain [2]. The current standard methods of pain assessment, which rely on patient self-reporting or medical staff observations, are highly subjective and imprecise, making it difficult to accurately monitor pain levels in patients, especially those with communication impairments [1,2,3,4]. Furthermore, medical staff observations of pain are highly subjective and require experienced observers [5], whose scarcity can hinder the continuity of pain assessment monitoring [6]. The need for accurate pain-level assessment has led to a growing demand for automated pain monitoring systems that can be used both in hospitals and at home for elderly and injured patients [5,6]. Such a system would provide a more objective and reliable method of pain assessment, helping to ensure that patients receive appropriate pain management and avoid unnecessary suffering.

Developing such systems is no easy task due to a variety of challenges, including variations in head poses, illumination [7], and occlusion [8]. Despite significant progress in recent years, many deep-learning-based pain assessment models rely on face detection algorithms as a critical data processing step, which may not work accurately for full left/right profile views. Our previous work, Facial Expressions-based Pain Assessment System (FEAPAS) [9], employs the Multi-Task Cascaded Neural Network (MTCNN) algorithm to detect a face from the input image and subsequently employs OpenCV library to segment the detected face into two parts, in order to extract the upper part of the face (eyes and eyebrows). These parts are then fed to two concurrent pre-trained CNNs to mimic the Prkachin and Solomon Pain Intensity (PSPI) measurement [10]. Therefore, it assigns a higher weight to the upper part of the face in the pain evaluation process. FEAPAS achieved an accuracy of 99.10% on 10-fold cross-validation, thus outperforming the state-of-the-art model on the UNBC-McMaster shoulder pain expression archive dataset [11]. It also scores 90.56% on unseen subject data with an average response time of 6.49 s. However, it is restricted to the limitations of the MTCNN algorithm which cannot detect and extract the upper part of the face from full left/right profile images (note that, as with other referenced pain-assessment material, the left/right facial profiles are removed from input images). Consequently, the absence of input frames hinders the application of the FEPAS model, rendering it ineffective for left/right profile facial images.

To overcome these challenges and develop a more robust pain assessment model, we have turned to different approaches such as the Sparse Autoencoders for Facial Expressions-based Pain Assessment (SAFEPA). Unlike traditional models, in the SAFEPA, we utilize Sparse Autoencoders (SAE) to reconstruct the upper part of the face from input images, allowing us to effectively handle variations in head poses. By leveraging this approach, we achieve high recognition performance and surpass the accuracy of state-of-the-art models, even when faced with challenging full left/right profile views.

Autoencoders (AE) have gained popularity as an unsupervised neural network that can reconstruct data while minimizing the error between input and output [12] with an encoder mapping the input to a hidden representation through a nonlinear function and a decoder reconstructing the input [13]. AE has been shown to perform well in facial reconstruction, resolving issues of partial occlusion [14]. In this paper, we present a novel and advanced model, named Sparse Autoencoders for Facial Expressions-based Pain Assessment (SAFEPA), which builds upon our previous FEAPAS described in [9].

Our contribution lies in:

Developing a new system, such as SAFEPA, for pain assessment that accounts for variations in head poses by utilizing a custom SAE. The SAE can reconstruct the upper part of the face in any pose without requiring face detection or splitting steps, overcoming limitations in face detection algorithms, and reducing preprocessing steps, thereby improving the accuracy of pain assessment. The SAFEPA model achieved 98.93% on 10-fold cross-validation and 84.06% on unseen subject data on the UNBC-McMaster dataset, Section 4.1. To the best of our knowledge, sparse autoencoders have not been used in this manner in relevant publications.
Extending the capabilities of the SAFEPA system to recognize seven facial expressions on data with different poses including full left and right profiles. Our results demonstrate SAFEPA’s high accuracy in facial expressions recognition, achieving 94.29% on 10-fold cross-validation on the Karolinska Directed Emotional Faces dataset (KDEF) [15] and outperforming state-of-the-art models, highlighting SAFEPA’s efficient performance with various head poses including full left/right profiles, Section 4.2.
Investigating the performance of the SAFEPA system to work in real-world situations on a new pain assessment dataset, measuring its accuracy and average processing time. We demonstrate that the SAFEPA system generalizes well, as evidenced by the performance on unseen BioVid Heat Pain datasets [16] processing each video in 17.82 s making the SAFEPA suitable for real-world situations, Section 4.3.

The rest of this article is organized as follows. In Section 2, related works are discussed, while Section 3 provides a detailed description of the proposed framework, including the SAE structure and the entire system. Section 4 presents the results of three main experiments and compares them with existing methods in addition to an ablation study to show the positive impact of our autoencoder approach. Finally, Section 5 provides a conclusion and future works.

2. Related Works

Recent studies in Facial Expressions Recognition (FER) and Pain Assessment have introduced multiple cutting-edge models that achieve an acceptable performance. These models have tackled various challenges related to performance improvement by leveraging state-of-the-art Convolutional Neural Networks (CNN) [17,18] or by adopting Long Short-Term Memory (LSTM) [19]. Moreover, researchers have explored the impact of various techniques such as augmentation [20] and batch normalization [21]. Recent studies turned to fusion models of two [5,9,22] or more CNNs [23]. Furthermore, a recent study implemented IoT in the FER system [24].

Despite this progress, some researchers continue to use handcrafted feature descriptor algorithms such as the Histogram of Oriented Gradient (HOG) [25,26] and the Local Binary Pattern (LBP) [26,27] in their work due to their simplicity and efficiency.

While most studies focus on improving recognition accuracy, challenges such as poses, occlusion, and illumination demand further investigation and more efficient solutions. In order to compare the SAFEPA with the state-of-the-art, we have selected four recent studies [17,24,25,26] that use CNN, HOG, and LBP for FER on the same datasets as our study. Additionally, we included three more recent studies that focus on facial expression-based pain assessment [3,22,23].

In [17], Bentoumi and his colleagues have developed an advanced facial-expression-classifier using two of the most common deep learning models: VGG16 and ResNet50, to extract features. Their model employs a multilayer perceptron (MLP) classifier for the extracted features by VGG16, ResNet50, or both (Ensemble). The team conducted testing on the Extended Cohn-Kanade (CK+), JAFFE, and 980-images subset of KDEF dataset (frontal view).

In [24], Barra and his colleagues developed a facial expression recognition system based on a social IoT solution. To detect the face region, they applied the tree-structured model. To extract the features, they applied several different texture descriptors such as LBP and HOG. To extract more discriminating and distinctive features, they used a sparse representation technique, followed by the Spatial Pyramid Mapping technique. Finally, they used SVM for the classification task. The performance of the proposed system is tested on KDEF and GENKI4K datasets.

In [25], Eng and his colleagues proposed a model for the FER. Their model utilizes HOG to extract features from multiple cells in the image after detecting and cropping the face, followed by a support vector machine (SVM) for classification. This approach was applied to a subset of 980 images from KDEF dataset, focusing on images with frontal faces excluding other poses. Additionally, they utilized the Japanese Female Facial Expression (JAFFE) [28] database.

Yaddaden in his work [26] combined LBP and HOG to extract features from the image then he used SVM to classify the emotions. The model has been evaluated on JAFFE, KDEF, and Radboud Faces Database (RaFD).

In [3], the authors proposed a computer-aided pain assessment system based on facial expressions with a satisfactory accuracy to reduce the computational work. The proposed model consisted of two shallow neural networks: spatial appearance network (SANET) to extract spatial appearance-based features from RGB image, and a shape descriptors network (SDNET) to extract spatial facial appearance-related features from the landmarks input, then the two outputs are concatenated to obtain the expected shape of the two different inputs and is sent to joint feature learning to learn from the entire fusion network.

In [22], Bargshady and his colleagues developed an ensemble deep learning model (EDLM) for pain intensity detection. Their model consists of two phases: the early fusion and the late fusion. In the early fusion, the features are extracted through a combination of pre-trained CNN and linear Principal Component Analysis (PCA). In the late fusion, an ensemble of a three stream CNN + RNN hybrid deep learning network is used for classification. They used two databases in their study: the MIntPAIN and UNBC-McMaster databases.

In [23], the authors proposed a pain severity assessment model to work in a real environment. For that purpose, the authors collected their own data in addition to a sub-set of the UNBC-McMaster dataset. The model inputs were obtained by using MTCNN to extract the face localization from an RGB image, and the numpy matrix slicing operation for cropping the face. To extract the spatial appearance information from the raw RGB input images, they used the CNN-TL network. To extract local features from entropy images, they used ETNet. To learn jointly from RGB and entropy, they used concurrent networks of DSCNN. The classification decision is taken based on the fusion of the three outputs.

In [9], we developed a concurrent model consisting of two Convolutional Neural Network (CNN) branches called Facial Expression-based Automatic Pain Assessment System (FEAPAS). The first branch processes the detected face, while the second branch focuses specifically on the upper part of the face to improve attention and overall performance. All the models described above rely on face detection algorithms such as Haar or MTCNN as an essential step for data preprocessing. These models are unable to process frames that contain faces captured in full left or right profiles, leaving room for further improvement presented in this paper.

In contrast to the previous work, by developing our SAE and training it on the KDEF dataset that includes images with full left or right profile poses, the SAFEPA improves the facial expression capabilities of FEAPAS. The developed SAE can reconstruct the upper part of the face in any pose, without the need for face detection algorithms. After reconstructing the upper part of the face with the SAE, the resulting image is fed into InceptionV3, while the original image is processed in parallel with another InceptionV3. The outputs of the two InceptionV3 networks are then concatenated and sent to a fully connected layer for classification. The use of the SAE eliminates the need to use face detection algorithms and reduces the data preprocessing steps.

3. Materials and Methods

This section provides a comprehensive overview of our approach, including the datasets we used, the classifier we developed using SAE and concurrent CNNs, the SAFEPA system we built, and the evaluation processes we undertook.

3.1. Datasets

To ensure robustness, we utilized two widely recognized facial expression datasets, namely the Japanese Female Facial Expression (JAFFE) [28] and the Karolinska Directed Emotional Faces (KDEF) [15], in addition to the UNBC-McMaster shoulder pain expression archive dataset [11], and BioVid Heat Pain dataset [16] which are widely used for pain assessment. These datasets and the purpose for which they are used are described in the following subsections. Table 1 provides a summary of the various datasets we used in this study along with their characteristics.

3.1.1. JAFFE Dataset

This dataset contains 213 gray scaled images that capture the essence of 7 distinct facial expressions: Afraid, Angry, Disgusted, Happy, Neutral, Sad, and Surprised. The images were captured from 10 female participants from a frontal view only. As one of the widely used and trusted datasets in the field of facial expression recognition, the JAFFE dataset was selected to test the performance and capability of extending our previous pain assessment monitoring system FEAPAS to facial expression recognition and to compare with state-of-the-art studies [17,25]. JAFFE dataset cannot be used with our new model SAFEPA, as it comprises non-color images that are not compatible with our system’s requirements.

3.1.2. KDEF Dataset

KDEF contains 4900 colored images from 70 participants, evenly split between 35 females and 35 males, and every subtle nuance of emotion is captured in detail. The dataset includes 7 distinct emotional facial expressions—Afraid, Angry, Disgusted, Happy, Neutral, Sad, and Surprised—and each expression is captured from 5 different angles, ensuring that every nuance of facial expression is accounted for. In Figure 1, we offer a glimpse of the KDEF dataset, presenting samples of different facial expressions captured from a variety of angles: (a) full left profile, (b) full right profile, (c) half left profile, (d) half right profile, and (e) straight-on. Although many studies in FER focus solely on a subset of the KDEF dataset—namely, the 980 images that represent the seven emotions from a frontal view—we went above and beyond to use the 3920 images that represent the seven emotions from full and half left and right poses excluding the frontal view as well as the full dataset in our study. To test the accuracy and effectiveness of our model, we applied SAFEPA to the KDEF dataset subsets as well as the full dataset.

3.1.3. The UNBC-McMaster Dataset

The UNBC-McMaster shoulder pain expression archive dataset contains 200 sequences of 48,398 colored frames, each measuring 320 × 240 pixels. This dataset captures the facial expressions of 25 adult participants, 12 male and 13 female, who suffer from shoulder pain. Figure 2 shows samples of the UNBC-McMaster dataset with different levels of pain. This dataset only contains images captured from a frontal perspective or in partial view. The UNBC-McMaster shoulder pain expression dataset is highly imbalanced. With a staggering 82.71% of the data representing the “No Pain” class—comprising 40,029 images—and only 17.29% devoted to varying levels of pain—just 8369 images in total. To address the biased classification [5,22], and ensure maximum accuracy, a subset of 24 participants’ data was randomly selected, resulting in a total of 6000 frames divided into four classes: No Pain, Low Pain, Moderate Pain, and Severe Pain. Each class contains 1500 images, with the Severe Pain class representing images from level 3 and greater. Testing the model with unseen subjects is critical for avoiding overfitting and ensuring generalizability, and in this study, we utilized all frames that belonged to participant 25 and were coded as “064-ak064”, resulting in a total of 1611 frames for testing. The subset of the UNBC-McMaster dataset used in this study is identical to that used in FEAPAS [9], providing continuity and ensuring comparability across studies.

3.1.4. The BioVid Heat Pain Database

The BioVid Heat Pain Database on the other hand is relatively newer than UNBC-McMaster dataset, but has a higher variety, and is thus chosen to be used to further test the capability of the SAFEPA system in real-world situations. This database was collected by the collaboration of the Neuro-Information Technology group of the University of Magdeburg and the Medical Psychology group of the University of Ulm. The participants in the study were captured in short videos while being exposed to four levels of experimentally induced heat pain of subject-specific stimulation temperatures. The dataset comprised videos from 69 participants, each with 20 baseline (no pain) samples and 4 levels of 20 pain samples, resulting in a total of 6900 samples. It is worth noting that the participants were explicitly given the freedom to move their heads, providing an even greater range of motion in the captured footage. In Table 1, we present a range of statistical and functional information for the four datasets described above, providing a comprehensive overview of the scope and detail of each dataset.

3.2. The Model

SAFEPA model builds on the strengths of our earlier FEAPAS framework [9] which features two concurrent and coupled pretrained Convolutional Neural Networks (CNNs) for high-performance facial expression-based pain assessment. By integrating custom-designed Sparse Autoencoders (SAE) into the FEAPAS classifier, the SAFEPA model can eliminate the facial detection step and enable more accurate pain assessment across a wide range of facial poses and expressions. In this section, we delve into the details of FEAPAS, SAE, and the combination of these two techniques that underpin the SAFEPA model. Section 3.2.1 focuses on the FEAPAS methodology, which utilizes two concurrent InceptionV3 models to construct the system architecture. We provide a detailed description of this approach, highlighting the key features that make it effective for feature extraction and classification. Section 3.2.2 provides a description of the encoder and decoder layers employed in the SAE model and data preparation for SAE training. Finally, in Section 3.2.3, we describe the implementation of the SAFEPA classifier, which is the result of the amalgamation of the SAE and the FEAPAS techniques.

3.2.1. FEAPAS

In FEAPAS, we used InceptionV3 as the building block of our classifier. The choice was made because InceptionV3 can provide accurate and efficient results in the absence of a very large amount of training data [29]. The InceptionV3 architecture is inspired by the Network in Network (NIN) model, which employs 1 × 1 filters for feature extraction. InceptionV3 takes this approach a step further by utilizing 1 × 1 filters before the larger filters, which are then replaced with small and asymmetric filters. This design enables the model to effectively capture features at different scales and resolutions, leading to improved performance in various image recognition tasks. The use of 1 × 1 filters in this manner is a key feature of the InceptionV3 architecture and has been shown to be effective in improving accuracy and reducing computational complexity [30]. The FEAPAS model utilizes two concurrent InceptionV3 networks for feature extraction. The first processes the upper part of the detected face, while the second processes the entire detected face. The features extracted from both branches are then concatenated and fed into a dense layer with 1024 neurons, followed by a dropout layer with a probability of 0.25. The resulting features are then fed into the classification layer, as illustrated in Figure 3. This approach enables FEAPAS to capture detailed information from both the upper and whole face, important for accurately recognizing and assessing pain. The use of two concurrent InceptionV3 branches with different inputs allows the model to capture different types of features and improve the robustness of the system. Overall, this methodology represents an effective approach for pain assessment using deep Convolutional Neural Networks.

3.2.2. Sparse Autoencoders SAE

The developed SAE in this study consists of an encoder followed by a decoder. The SAE includes a total of six layers: one input, four hidden, and one output. The input and output layers each contain 150 neurons, which match the shape of the data. The encoder is composed of two dense layers, each followed by a LeakyReLU activation layer. LeakyReLU has been shown to work well in autoencoders, as it helps to prevent the issue of “dead neurons”, which can occur when ReLU activation outputs zero for a given input, leading to zero gradients during backpropagation and effectively stopping learning. The use of LeakyReLU can improve the stability and performance of the network during training. The decoder is developed in a similar but reverse way to the encoder and includes a Sigmoid activation function in the output layer. The goal is to train our SAE to reconstruct the upper part of the face from input images without the help of the face detection step used in other methods.

In general, autoencoders

A

with input

I = {x_{1,} x_{2,} \dots, x_{n}}

, hidden representation

R_{g} = {h_{1,} h_{2,} \dots, h_{i}}

, and output

R_{f} = {x_{1}^{~}, x_{2}^{~}, \dots, x_{n}^{~}}

can be represented as in Equation (1).

\begin{matrix} x_{i}^{~} = f (g (x_{i})) \end{matrix}

(1)

The encoder function

g

transfers

x_{i} \in I

to

h_{i} \in R_{g}

as shown in Equation (2).

\begin{matrix} h_{i} = g (x_{i}) \end{matrix}

(2)

Equation (3) explains how the decoder function

f

transfers

h_{i} \in R_{g}

to

x_{i} \in I

.

\begin{matrix} x_{i}^{~} = f (h_{i}) \end{matrix}

(3)

Training the autoencoders aims to minimize

Δ

, where

Δ

represents the difference between the input and the output of the autoencoder as shown in Equation (4).

\begin{matrix} \min (f \cdot g) < ∆ (x_{i}, x_{i}^{~}) > \end{matrix}

(4)

The mean squared error

(M S E)

is commonly used as a loss function in the autoencoder. Assuming

M

equals the number of the observations in the training dataset, then

M S E

focuses on the distance between compressed data and reconstructed data as shown in Equation (5) [29,30].

\begin{matrix} L_{M S E} = \frac{1}{M} \sum_{i = 1}^{M} {|x_{i} - x_{i}^{~}|}^{2} \end{matrix}

(5)

To prepare the data for the training and testing phase of SAE, we applied an MTCNN detector on all frontal images in KDEF dataset, then we used the OpenCV library to extract the upper part of the face. Finally, we matched each image of the same person with the same expression but taken from different angles (full left, full right, half left, half right, and straight-on) to the same extracted upper part of the frontal view image. Figure 4 shows an example of our data preparation method.

SAE was trained to 100 epochs on the prepared KDEF dataset using an Adaptive Moment estimation (ADAM) optimizer [31] and a 46 batch size. The validation loss was 0.036 as shown in Figure 5.

An illustration of SAE input and output is presented in Figure 6. SAE is designed to process an input image in various poses and generate the upper portion of the face in that image. The output image produced by SAE demonstrated excellent performance in the SAFEPA model, exhibiting promising levels of accuracy. Note that the blurriness of the output does not impact the network’s ability to detect pain assessment.

3.2.3. The Sparse Autoencoders for Facial Expressions-Based Pain Assessment SAFEPA

The SAFEPA model takes a 150 × 150 colored image as input and utilizes SAE to reconstruct a colored image of the same size, which will contain the upper part of the face if the input image contains a face in any poses. We did not consider the case when the input image did not contain a face since all used datasets had samples with faces. This approach enables the SAFEPA model to work on all frames without crashing or malfunctioning when there is a full left or right profile in the input image. In contrast, other systems that rely solely on face detection algorithms may fail to detect a face, resulting in no input to the classifier and subsequent system failure. After obtaining the reconstructed upper part of the face and the input image, the SAFEPA model feeds them into two concurrent InceptionV3 models. The resulting outputs from the InceptionV3 models are concatenated into a single feature vector. This concatenated feature vector is then passed through a dense layer with 1024 neurons, followed by dropout layers with a probability of 0.25. These dropout layers are used to prevent overfitting, which is a common problem in deep learning models. Finally, the features are fed into a fully connected layer for classification. The overall structure of the SAFEPA model is illustrated in Figure 7.

SAFEPA monitoring system operates by reading online video frames in real-time and utilizing the SAFEPA classifier for pain detection. When a video frame is processed, the SAE reconstructs a colored image as explained above. The processed frame and SAE output are subsequently fed into the InceptionV3 classifier. If the classifier output is not “No Pain”, an alarm is activated, and critical data such as time, frame, and pain level are recorded. Figure 8 shows a high-level flow chart of the automatic pain assessment system SAFEPA. It is important to note that the SAFEPA model was trained and tested on datasets that contain faces on all of their samples.

3.3. The Evaluation

CNNs Model’s performance is often measured with accuracy, precision, recall, and an F1-Score which are calculated from the confusion matrix (CM) [32]. Accuracy is the ratio of the number of samples correctly classified by the classifier to the total number of samples in the testing data as in Equation (6).

\begin{matrix} Accuracy = \frac{T P + T N}{(T P + T N + F P + F N)} \end{matrix}

(6)

In the equations, TP, TN, FP, and FN refer to true positive (observation is predicted positive and is actually positive), true negative (observation is predicted negative and is actually negative), false positive (observation is predicted positive and is actually negative), and false negative (observation is predicted negative and is actually positive), respectively. Where accuracy shows the gap between the real values and the predicted values, precision deals with the fraction of positive predictions as in Equation (7) and recall deals with the actual positive fraction as in Equation (8).

\begin{matrix} Precision = \frac{T P}{T P + F P} \end{matrix}

(7)

\begin{matrix} Recall = \frac{T P}{T P + F N} \end{matrix}

(8)

The F1-Score combines both precision and recall in a single value as in Equation (9). The F1-Score is indicative of a model’s balanced ability to describe both positive cases (recall) as well as be accurate with the cases that it captures (precision).

\begin{matrix} F1 Score = \frac{2 T P}{2 T P + F P + F N} \end{matrix}

(9)

The performance evaluation of the SAFEPA system was conducted using various datasets, as follows:

UNBC-McMaster Dataset

A 10-fold cross-validation accuracy was measured on a subset of the UNBC-McMaster dataset, consisting of 6000 samples. The obtained results were compared with recently published models designed for pain assessment, as described in references [3,22,23]. Subsequently, the confusion matrix and the Receiver Operating Characteristic (ROC) curve were generated to evaluate the performance of SAFEPA’s best performing model, achieved through 10-fold cross-validation.

2.: KDEF Dataset—Frontal View Subset

A 10-fold cross-validation accuracy was measured on a subset of the KDEF dataset, focusing specifically on the frontal view. This subset comprised a total of 980 samples. The obtained results were compared with four recently published models for Facial Expression Recognition (FER) mentioned in references [17,24,25,26].

3.: KDEF Dataset—Full and Half Left/Right Poses

The evaluation was performed on a subset of the KDEF dataset, which included full and half left/right poses. This subset contained a total of 3920 samples. The results obtained were compared with the outcomes of models mentioned in reference [17].

4.: KDEF Dataset—Entire Dataset

The evaluation was further performed on the entire KDEF dataset, consisting of 4900 samples. The results obtained were compared with the outcomes of models mentioned in reference [17].

5.: Analysis Metrics on KDEF Subset (Frontal View)

The evaluation involved analyzing a confusion matrix, precision, recall, F1-Score, and accuracy. This analysis was conducted using a training dataset of 70% and a testing dataset of 30% on the KDEF subset with 980 samples.

6.: Evaluation on BioVid Dataset

To provide a comprehensive evaluation of the SAFEPA system’s performance, accuracy and processing time were measured. The system was tested on the entire unseen BioVid dataset. Processing time was considered a critical factor, especially in real-world pain assessment scenarios where efficient and prompt processing is vital for timely pain management.

These evaluations and measurements collectively provide a thorough assessment of the SAFEPA system’s performance across different datasets and scenarios.

4. Experimentation and Results

To implement the deep learning experiments Anaconda 5.3.1. [33], Keras library [34], OpenCV library [35], and Python 3.7.3. programming languages are used. All the experiments were run on the Hercules cluster in our PDS lab [pds.ucdenver.edu] on GPU Tesla P100-SXM2 [36].

To test the SAFEPA system, we used an Intel Core i7-8750 CPU (2.20 GHz, 8 GB RAM), with a Windows 10 64-bit operating system.

This study describes three main experiments that were conducted to evaluate the proposed SAFEPA system.

In Experiment 1, SAFEPA and three models called VGG16, ResNet50, and Ensemble described in [17] were trained and tested on a subset of the UNBC-McMaster dataset to measure their performance on a pain assessment task and compare them to the recent study in [3,22,23].
In Experiment 2, the SAFEPA model was extended to recognize seven facial expressions on a subset of KDEF containing only frontal facial images and compare their performance with four recent studies in [17,24,25,26]. As another but related experiment, FEAPAS was separately trained and tested on the JAFFE dataset (gray scaled frontal face images) and its results were compared with the state-of-the-art models: VGG16, ResNet50, Ensemble, and HOG, which are described in [17] and [25], respectively. All models were run on both KDEF and JAFFE datasets.
To assess the model’s ability to generalize, experiment 3 was conducted by running the SAFEPA system which is described in Figure 8 on the BioVid dataset that consists of 6900 videos with various poses and measured the accuracy and average processing time.
Subsequently, we proceeded to conduct ablation tests as a follow-up to the three experiments.

4.1. Experiment 1: How Well SAFEPA Performs in Pain Assessment with a Frontal and Partial Veiw

In Experiment 1, we conducted a rigorous evaluation of SAFEPA’s pain level recognition capabilities. Leveraging a subset of the UNBC-McMaster dataset containing 6000 samples, we utilized the same training and testing process as described in [9]. Specifically, we trained SAFEPA for 20 epochs using the SGD optimizer and a batch size of 32 and measured 10-fold cross-validation for each epoch.

In addition, we trained the three distinct models VGG16, ResNet50, and Ensemble in [17] to 500 epochs using the same parameters as mentioned in [17]. Specifically, we classified extracted features by VGG16 with 50 neurons, ResNet50 with 15 neurons, and the concatenation features extracted by VGG16 and ResNet50 with 50 neurons. These models were run on images of size 300 × 300, and the experiment was repeated for size 150 × 150 to match the input size of the SAFEPA models. Once again, we measured 10-fold cross-validation for each epoch. After running the models on the two different image sizes, the best model from each type was selected based on their performance. The results in Table 2 show that the SAFEPA demonstrated the highest accuracy of 98.93% for 10-fold cross-validation.

Figure 9 presents the confusion matrix for the best performing model achieved through 10-fold cross-validation of the SAFEPA on the UNBC-McMaster dataset. This matrix provides valuable insights into the model’s performance in classifying four pain levels: ‘No Pain’, ‘Low Pain’, ‘Moderate Pain’, and ‘Severe Pain’. Upon examination, it is observed that the model exhibits a high accuracy in pain classification. Specifically, there is one misclassification between the ‘No Pain’ and ‘Low Pain’ classes and one additional misclassification between the ‘Low Pain’ and ‘No Pain’ classes. Notably, two misclassifications occur between the ‘Moderate Pain’ and the ‘Low Pain’ classes, reflecting a challenge in accurately distinguishing between these categories. However, it is noteworthy that the model demonstrates exceptional performance for the ‘Severe Pain’ class.

Figure 10 presents the Receiver Operating Characteristic (ROC) curve for the best performing model achieved through 10-fold cross-validation of the SAFEPA on the UNBC-McMaster dataset. The ROC curve provides a comprehensive visualization of the model’s performance in distinguishing between different pain levels.

4.2. Experiment 2: How Well SAFEPA Performs in the Facial Expression Recognition Task

Of all the models for the case of FER, in Experiment 2, we explore how well the SAFEPA model can be extended to recognize facial expressions. We first provide a comparison and analysis of all models for the case of FER, restricting our datasets to full frontal face images. Although such a restriction will limit us in showing all the potentials of SAFEPA, it will demonstrate it to be compatible with the state -of-the-art models for FER, even in the restricted cases. We then present the results showing the SAFEPA’s ability to outperform the models in [17] when various facial poses are present utilizing a subset of the KDEF dataset (consisting of 3920 samples) that excludes the frontal view, as well as the complete KDEF dataset (4900 samples). To do this, we trained the SAFEPA on a subset of the KDEF dataset consisting of 980 samples with frontal faces and an input size of 150 × 150. The model was trained using the Stochastic Gradient Descent (SGD) optimizer with a batch size of 32 for 50 epochs. The 10-fold cross-validation results show that the SAFEPA model achieved a competitive accuracy of 97.94%. These results are particularly noteworthy when compared to those obtained in [17,24,26], where VGG16 and Ensemble were used to extract features from the same subsets of KDEF but with an image size of 300 × 300 and trained for 500 epochs. The results of the experiment are summarized in Table 3.

We also sought to evaluate the performance of the SAFEPA and models described in [17] on facial expression recognition (FER) under different poses, including full left and right profiles. To execute this, we trained the models in [17] on a subset of the KDEF dataset (consisting of 3920 samples) that excludes the frontal view, as well as the complete KDEF dataset (4900 samples) using the same approach as described in Experiment 1, but without the face detection step, which is not applicable for full left and right profiles.

In contrast, the SAFEPA model was trained specifically for FER on the same subset of 3920 samples of KDEF and on the whole KDEF dataset, utilizing an ADAM optimizer and a batch size of 32 for 75 epochs. The input size was set to 150 × 150.

For 10-fold cross-validation, the SAFEPA achieved the highest performance with a 95.51% accuracy on the subset of 3920 samples of KDEF and 94.29% on the entire KDEF outperforming the other models.

We conducted a follow-up experiment on the subset of KDEF with 980 samples, using 70% of the data for training and 30% for testing as in [25], with the same parameters as Experiment 2. SAFEPA achieved an accuracy of 84.19%, which is superior to the accuracy obtained by the model presented in [25], which employed HOG for feature extraction and SVM for classification. The results are presented in Table 4.

The performance of the SAFEPA model was evaluated on a 30% test set from the subset of KDEF (980) dataset. The data were split using a split function, resulting in a total of 291 samples for testing. The evaluation process included the confusion matrix, precision, recall, and F1-Score. The results of the SAFEPA model are presented in Figure 11. Table 5 indicates that SAFEPA achieved high accuracy in predicting the ‘Happy’ emotion with 100.00%. However, the ‘Sad’ emotion was the least accurately predicted by SAFEPA model with an accuracy of 53.85%. Notably, the overall accuracy of SAFEPA model on the test set was higher than that of the model designed for the FER task, as described in [25]. These results demonstrate the effectiveness of the proposed model in facial emotion recognition on the subset of KDEF (980) dataset.

Furthermore, FEAPAS model was trained on JAFFE dataset using an ADAM optimizer and 150 × 150 input size and achieved a remarkable accuracy of 96.80% in 10-fold cross-validation, which is a highly competitive result compared to the Ensemble classifier of VGG16 and ResNet50 models used in [17] that achieved an accuracy of 96.40%. Moreover, when trained on 70% training data and 30% testing data using RMSProp optimizer, the FEAPAS achieved an accuracy of 78.13%, outperforming the model in [25] which achieved an accuracy of 76.19%. However, it is important to note that the SAFEPA model cannot be applied to gray scaled images as it was specifically trained on colored images, and hence, no result is provided for the SAFEPA on the JAFFE dataset.

4.3. Experiment 3: Test in Real-Time: How Well the SAFEPA System Performs with a New Pain Assessment Dataset

In Experiment 3, we tested the SAFEPA system on unseen data, the BioVid dataset that was collected for pain assessment and contains 6900 videos. Since the BioVid Heat dataset only offers video-level labels, it does not provide frame-level information. To address this, using the SAFEPA system, which is described in Figure 7 and operates on a frame-by-frame basis, we selected the highest-level prediction from all predictions made for the frames belonging to that video. The accuracy of the SAFEPA system on the entire unseen Biovid dataset was 33.28%, which is an improvement over the results reported in [37,38]. In [37], the authors proposed facial activity descriptors to detect pain and estimate its intensity. Their model was applied on the BioVid dataset and for leave-one-subject-out cross-validation; to recognize five levels of pain, an accuracy of 30.80% was achieved. In [38], Bourou and his colleagues proposed a feature selection and inter-subject variability modeling approach for video-based pain level assessment. They utilized lasso regression to select the most informative features from the extracted geometric and color-based features, and then modeled inter-subject variability using a generalized linear mixed effects probit model. To evaluate their approach, the authors extracted a set of features from the facial regions of each video frame and trained a regression model to predict pain intensity [38]. They reported an accuracy of 27.13% based on 10-fold cross-validation [38].

However, the SAFEPA’s accuracy is slightly lower than the accuracy of 37.42% reported in [39].

In [39], Xiang and his colleagues proposed a face analysis CNN combined with an LSTM network for pain assessment. They trained and tested their model on selected data belonging to 30 participants. They do not report which subset of the dataset was used for their result. Table 6 presents the performance of the SAFEPA model on the entire BioVid dataset, presenting accuracy and processing time metrics alongside accuracy scores obtained from three other studies (including each method’s main characteristics) conducted on the entire dataset or a subset of it. Notably, previous studies did not take processing time into account and in contrast with results in the SAFEPA used various validation methods instead of testing using the entire dataset for testing.

The average processing time for the SAFEPA system is 17.82 s per video. It is important to note that each video in the BioVid dataset lasts for 5 s. Thus, the SAFEPA can efficiently process each video in the dataset within a reasonable time, indicating its potential for real-world application.

4.4. Ablation Test

We conducted a decomposition of the SAFEPA into its fundamental models, as depicted in Figure 12. The fundamental model comprises a single branch of InceptionV3, which is subsequently followed by dense and dropout layers. In contrast, the FEAPAS consists of two branches of InceptionV3, where the outputs of these branches are concatenated and forwarded to dense and dropout layers. As an extension to the FEAPAS, the SAFEPA introduces the inclusion of SAE into the first branch.

To evaluate the performance of each model, we utilized the UNBC-McMaster dataset for pain assessment, along with three sets of KDEF. The KDEF dataset encompassed three distinct variations: (1) only frontal view, comprising 980 samples, (2) full and half left and right poses, amounting to 3920 samples, and (3) the complete dataset encompassing 4900 samples for facial expression recognition (FER). Table 7 shows the accuracy for each case.

Table 7 does not provide the results for the FEAPAS in the last two cases of the KDEF dataset, as the FEAPAS relies on MTCNN which is unable to handle full left/right profile samples. Consequently, the FEAPAS cannot be applied to these specific samples within the KDEF dataset. However, while the FEAPAS exhibits satisfactory performance with front view samples in both the UNBC-McMaster and KDEF datasets, it fails to deliver successful outcomes in other scenarios.

Overall, the findings presented in Table 7 demonstrate that all models perform optimally when confronted with frontal view samples in both the UNBC-McMaster and KDEF datasets. In the case of full left and right profile samples, the FEAPAS encounters limitations and the base model’s performance experiences a substantial decline. In contrast, the SAFEPA operates effectively without any complications and demonstrates superior performance, surpassing the base model’s capabilities.

5. Conclusions and Future Works

The combination of sparse autoencoders (SAE) and concurrent CNNs, as implemented in the SAFEPA system, has demonstrated remarkable efficacy in overcoming the limitations of face detection algorithms that can hinder model performance in certain poses. By leveraging the SAE to reconstruct the upper part of the face, the proposed model can operate on different poses without relying on a face detection algorithm. Our experiments, which utilized the KDEF dataset for facial expression recognition and the UNBC-McMaster and BioVid datasets for pain assessment, have validated the efficacy and versatility of the SAFEPA system. Notably, the SAFEPA model achieved a 98.93% accuracy in 10-fold cross-validation for pain assessment recognition and an 84.06% accuracy for unseen subjects when applied to a subset of the UNBC-McMaster dataset. Moreover, its competitive performance compared to state-of-the-art studies in facial expression recognition confirms its efficiency in handling such tasks. Importantly, when other systems relying on face detection algorithms like the FEAPAS failed to work properly in samples with full left and right poses, the SAFEPA model was able to work smoothly and achieve high accuracy. Finally, the performance of the SAFEPA system on the BioVid dataset, as demonstrated by its accuracy and processing time, highlights its generality and potential for real-world applications. These findings underscore the potential of the SAFEPA to accurately recognize pain assessment and facial expressions, process video data efficiently, and benefit various domains such as healthcare, education, and entertainment. Furthermore, considering the significant potential of augmented reality [40], our work holds promising implications in this area. Our future plan is to explore additional scenarios, such as when only partial facial information is available or when the input images may include non-facial images. Additionally, we aim to investigate the integration of voice analysis as an auxiliary component to enhance pain detection capabilities.

Author Contributions

Conceptualization, T.A. and G.A.; methodology, T.A.; software, T.A.; validation, T.A. and G.A.; formal analysis, T.A.; investigation, T.A.; resources, T.A. and G.A.; data curation, T.A.; writing—original draft preparation, T.A.; writing—review and editing, G.A.; visualization, T.A.; PhD supervision, G.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

All utilized datasets in this study have been obtained with the necessary permissions and are publicly available. We have obtained explicit permission to use these datasets in our research.

Informed Consent Statement

In this study, we utilized four distinct datasets, each contributing valuable insights to our research on “SAFEPA: An expandable multi-pose facial expressions pain assessment Method”. Prior to their inclusion, appropriate consent was obtained from the respective dataset providers to ensure compliance with ethical guidelines and data usage permissions. The datasets used in this study include the UNBC-McMaster shoulder pain expression archive, the KDEF dataset, the JAFFE dataset, and the BioVid dataset. The permission was granted to use image AK064 from the UNBC-McMaster shoulder pain expression archive with copyright notice by @jeffery Cohn, and image AF01 from the KDEF dataset with the corresponding image ID.

Data Availability Statement

Data were obtained from [11,15,16,28] and are available at http://www.pitt.edu/~emotion/um-spread.htm, accessed on 9 November 2021, https://www.kdef.se, accessed on 13 July 2022, NIT—BioVid Heat Pain Database (ovgu.de), accessed on 27 September 2022, and https://zenodo.org/record/3430156, accessed on 7 November 2022, with the permission of [11,15,16,28].

Conflicts of Interest

The authors declare no conflict of interest.

References

Taggart, S.; Skylas, K.; Brannelly, A.; Fairbrother, G.; Knapp, M.; Gullick, J. Using a Clinical Judgement Model to Understand the Impact of Validated Pain Assessment Tools for Burn Clinicians and Adult Patients in the ICU: A Multi-Methods Study. Burns 2021, 47, 110–126. [Google Scholar] [CrossRef] [PubMed]
Lalloo, C.; Kumbhare, D.; Stinson, J.N.; Henry, J.L. Pain-QuILT: Clinical Feasibility of a Web-Based Visual Pain Assessment Tool in Adults with Chronic Pain. J. Med. Internet Res. 2014, 16, e127. [Google Scholar] [CrossRef] [PubMed]
Semwal, A.; Londhe, N.D. ECCNet: An Ensemble of Compact Convolution Neural Network for Pain Severity Assessment from Face images. In Proceedings of the 2021 11th International Conference on Cloud Computing, Data Science & Engineering, Noida, India, 28–29 January 2021; IEEE: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
Lints-Martindale, A.; Hadjistavropoulos, T.; Lix, L.M.; Thorpe, L. A Comparative Investigation of Observational Pain Assessment Tools for Older Adults with Dementia. Clin. J. Pain 2012, 28, 226–237. [Google Scholar] [CrossRef]
Salekin, S.; Zamzmi, G.; Goldgof, D.; Kasturi, R.; Ho, T.; Sun, Y. Multimodal Spatio-Temporal Deep Learning Approach for Neonatal Postoperative Pain Assessment. Comput. Biol. Med. 2020, 129, 104150. [Google Scholar] [CrossRef]
Semwal, A.; Londhe, N.D. Computer aided pain detection and intensity estimation using compact CNN based fusion network. Appl. Soft Comput. 2021, 112, 107780. [Google Scholar] [CrossRef]
Rudovic, O.; Pavlovic, V.; Pantic, M. Automatic Pain Intensity Estimation with Heteroscedastic Conditional Ordinal Random Fields. In Advances in Visual Computing; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar] [CrossRef]
Qazi, A.S.; Farooq, M.S.; Rustam, F.; Villar, M.G.; Rodríguez, C.L.; Ashraf, I. Emotion Detection Using Facial Expression Involving Occlusions and Tilt. Appl. Sci. 2022, 12, 11797. [Google Scholar] [CrossRef]
Alghamdi, T.; Alaghband, G. Facial Expressions Based Automatic Pain Assessment System. Appl. Sci. 2022, 12, 6423. [Google Scholar] [CrossRef]
Chen, Z.; Ansari, R.; Wilkie, D. Automated Pain Detection from Facial Expressions using FACS: A Review. arXiv 2018, arXiv:1811.07988. [Google Scholar]
Lucey, P.; Cohn, J.F.; Prkachin, K.M.; Solomon, P.E.; Matthews, I. Painful data: The UNBC-McMaster shoulder pain expression archive database. In Proceedings of the 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), Santa Barbara, CA, USA, 21–23 March 2021. [Google Scholar]
Görgel, P.; Simsek, A. Face recognition via Deep Stacked Denoising Sparse Autoencoders (DSDSA). Appl. Math. Comput. 2019, 355, 325–342. [Google Scholar] [CrossRef]
Yu, J.; Zheng, X.; Liu, J. Stacked Convolutional Sparse Denoising Auto-Encoder for Identification of Defect Patterns in Semiconductor Wafer Map. Comput. Ind. 2019, 109, 121–133. [Google Scholar] [CrossRef]
Abdolahnejad, M.; Liu, P.X. A Deep Autoencoder with Novel Adaptive Resolution Reconstruction Loss for Disentanglement of Concepts in Face Images. IEEE Trans. Instrum. Meas. 2022, 71, 5008813. [Google Scholar] [CrossRef]
Lundqvist, D.; Flykt, A.; Öhman, A. The Karolinska Directed Emotional Faces; KDEF, CD ROM from Department of Clinical Neuroscience, Psychology section; Karolinska Institutet: Solna, Sweden, 1998; ISBN 91-630-7164-9. [Google Scholar]
Walter, S.; Gruss, S.; Ehleiter, H.; Tan, J.; Traue, H.C.; Crawcour, S.; Werner, P.; Al-Hamadi, A.; Andrade, A.O. The Biovid Heat Pain Database Data for the Advancement and Systematic Validation of an Automated Pain Recognition System. In Proceedings of the 2013 IEEE International Conference on Cybernetics (CYBCO), Lausanne, Switzerland, 13–15 June 2013. [Google Scholar] [CrossRef]
Bentoumi, M.; Daoud, M.; Benaouali, M.; Ahmed, A.T. Improvement of Emotion Recognition from Facial Images Using Deep Learning and Early Stopping Cross Validation. Multimed. Tools Appl. 2022, 81, 29887–29917. [Google Scholar] [CrossRef]
Dharanya, V.; Raj, A.N.J.; Gopi, V.P. Facial Expression Recognition through Person-Wise Regeneration of Expressions Using Auxiliary Classifier Generative Adversarial Network (AC-GAN) based model. J. Vis. Commun. Image Represent. 2021, 77, 103110. [Google Scholar] [CrossRef]
Rodriguez, P.; Cucurull, G.; Gonalez, J.; Gonfaus, J.; Nasrollahi, K.; Moeslund, T.; Roca, F. Deep pain: Exploiting long short-term memory networks for facial expression classification. IEEE Trans. Cybern. 2017, 52, 3314–3324. [Google Scholar] [CrossRef] [Green Version]
Al-Qerem, A. An Efficient Machine-Learning Model Based on Data Augmentation for Pain Intensity Recognition. Egypt. Inform. J. 2020, 21, 241–257. [Google Scholar] [CrossRef]
Kharghanian, R.; Peiravi, A.; Moradi, F.; Iosifidis, A. Pain Detection Using Batch Normalized Discriminant Restricted Boltzmann Machine Layers. J. Vis. Commun. Image Represent. 2021, 76, 103062. [Google Scholar] [CrossRef]
Bargshady, G.; Zhou, X.; Deo, R.C.; Soar, J.; Whittaker, F.; Wang, H. Ensemble Neural Network Approach Detecting Pain Intensity from Facial Expressions. Artif. Intell. Med. 2020, 109, 101954. [Google Scholar] [CrossRef] [PubMed]
Semwal, A.; Londhe, N.D. MVFNet: A multi-view fusion network for pain intensity assessment in unconstrained environment. Biomed. Signal Process. Control 2021, 67, 102537. [Google Scholar] [CrossRef]
Barra, S.; Hossain, S.; Pero, C.; Umer, S. A Facial Expression Recognition Approach for Social IoT Frameworks. Big Data Res. 2022, 30, 100353. [Google Scholar] [CrossRef]
Eng, S.K.; Ali, H.; Cheah, A.Y.; Chong, Y.F. Facial Expression Recognition in JAFFE and KDEF Datasets Using Histogram of Oriented Gradients and Support Vector Machine. IOP Conf. Ser. Mater. Sci. Eng. 2019, 705, 12031. [Google Scholar] [CrossRef]
Yaddaden, Y. An efficient Facial Expression Recognition System with Appearance-Based Fused Descriptors. Intell. Syst. Appl. 2023, 17, 200166. [Google Scholar] [CrossRef]
Kumar, N.; Kumar, A.S.; Prasad, G.; Shah, M.A. Automatic Facial Expression Recognition Combining Texture and Shape Features from Prominent Facial Regions. IET Image Process. 2023, 17, 1111–1125. [Google Scholar] [CrossRef]
Lyons, M.; Akamatsu, S.; Kamachi, M.; Gyoba, J. Coding Facial Expressions with Gabor Wavelets. In Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, 14–16 April 1998; pp. 200–205. [Google Scholar] [CrossRef] [Green Version]
Jena, B.; Nayak, G.K.; Saxena, S. Convolutional Neural Network and Its Pretrained Models for Image Classification and Object Detection: A Survey. Concurr. Comput. Pr. Exp. 2021, 34, e6767. [Google Scholar] [CrossRef]
Li, Y.; Liu, L. Image quality classification algorithm based on InceptionV3 and SVM. MATEC Web Conf. 2019, 277, 02036. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Chen, D.; Lu, Y.; Hsu, C.-Y. Measurement Invariance Investigation for Performance of Deep Learning Architectures. IEEE Access 2022, 10, 78070–78087. [Google Scholar] [CrossRef]
Anaconda|The World’s Most Popular Data Science Platform. Available online: https://www.anaconda.com (accessed on 12 February 2023).
Keras. The Python Deep Learning API. Available online: https://keras.io (accessed on 26 December 2018).
OpenCV 4.4.0—OpenCV. Available online: https://opencv.org/opencv-4-4-0/ (accessed on 12 February 2023).
Parallel Distributed Systems Lab—PDS Lab. PDS Laboratory. Available online: Ucdenver.edu (accessed on 4 February 2023).
Werner, P.; Al-Hamadi, A.; Limbrecht-Ecklundt, K.; Walter, S.; Gruss, S.; Traue, H.C. Automatic Pain Assessment with Facial Activity Descriptors. IEEE Trans. Affect. Comput. 2016, 8, 286–299. [Google Scholar] [CrossRef]
Bourou, D.; Pampouchidou, A.; Tsiknakis, M.; Marias, K.; Simos, P. Video-based Pain Level Assessment: Feature Selection and Inter-Subject Variability Modeling. In Proceedings of the 2018 41st International Conference on Telecommunications and Signal Processing (TSP), Athens, Greece, 4–6 July 2018; IEEE: New York, NY, USA, 2018. [Google Scholar] [CrossRef]
Xiang, X.; Wang, F.; Tan, Y.; Yuille, A.L. Imbalanced Regression for Intensity Series of Pain Expression from Videos by Regularizing Spatio-Temporal Face Nets. Pattern Recognit. Lett. 2022, 163, 152–158. [Google Scholar] [CrossRef]
Shervin, M.; Liang, X.; Yan, S. Modern Augmented Reality: Applications, Trends, and Future Directions. arXiv 2022, arXiv:2202.09450. [Google Scholar]

Figure 1. Samples of KDEF dataset with different facial expressions from different angles. Images (AF01DIFL, AF01DIFR, AF01DIHL, AF01DIHR, AF01DIS, AF01HAFL, AF01HAFR, AF01HAHL, AF01HAHR, AF01HAS, AF01NEFL, AF01NEFR, AF01NEHL, AF01NEHR, AF01NES, AF01SAFL, AF01SAFR, AF01SAHL, AF01SAHR, and AF01SAS) from KDEF dataset. Reprinted/adapted with permission from Ref. [15].

Figure 2. Samples of the UNBC-McMaster shoulder pain expression archive dataset with different levels of pain. Reprinted/adapted with permission from Ref. [11]. Copyright (@jeffery Cohen).

Figure 3. FEAPAS structure.

Figure 4. Data preparation for training and testing SAE. Images (AF01AFFL, AF01AFFR, AF01AFHL, AF01AFHR, and AF01AFS) from KDEF dataset. Reprinted/adapted with permission from Ref. [15].

Figure 5. Training and validation loss for SAE.

Figure 6. SAE’s ability to generate the upper part of the face from input images regardless of their pose. Image (AF01NES) from KDEF dataset. Reprinted/adapted with permission from Ref. [15].

Figure 7. The structure of the proposed model SAFEPA.

Figure 8. SAFEPA flow chart.

Figure 9. Confusion matrix for the best performing model: 10-fold cross-validation of SAFEPA on UNBC-McMaster dataset.

Figure 10. The ROC curve for the best performing model: 10-fold cross-validation of SAFEPA on UNBC-McMaster dataset.

Figure 11. The confusion matrix of testing SAFEPA on 30% testing data on a subset of KDEF with 980 samples.

Figure 12. SAFEPA components.

Table 1. Summary of the three used datasets.

Dataset	Samples	Participants	Brief Description	Format	Number of Classes	Role in Our Work
JAFEE [28]	213	10	Front view	Gray scaled images	7 facial expressions	Train and test FEAPAS on FER.
KDEF [15]	4900	70	Five different poses including full left/right profiles	Colored images	7 facial expressions	Train and test SAE. Train and test Basic, FEAPAS, SAFEPA, and models in [17] on FER, with and without full left/right profiles.
UNBC McMaster [11]	200 sequences with 48,398 labeled frames.	25	Very imbalanced [the images were captured from a frontal perspective or partial view]	Sequences of colored frames	16 pain levels, but often studies use fewer classes	Train and test basic, SAFEPA, and models in [17] for PA *.
BioVid [16]	6900 labeled videos	69	Free range of head motions	Colored videos	5 levels of pain	Unseen test of SAFEPA for PA *.

* PA is an abbreviation for Pain Assessment.

Table 2. Comparison of 10-fold cross-validation accuracies obtained from running models in [17], executing SAFEPA, and three recent studies focused on pain assessment, all performed on a subset of the UNBC-McMaster dataset.

The Model	Input Size	Accuracy %
VGG16 [17]	300 × 300	52.57
VGG16 [17]	150 × 150	55.25
ResNet50 [17]	300 × 300	51.06
ResNet50 [17]	150 × 150	44.29
Ensemble [17]	300 × 300	75.23
Ensemble [17]	150 × 150	75.52
Semwal and Londhe [3]	128 × 128	94.00
Ghazal, et al. [22]	224 × 224	90.50
Semwal and Londhe [23]	128 × 128	96.00
SAFEPA	150 × 150	98.93

Table 3. The 10-fold cross-validation accuracy of the models presented in [17,24,26], running SAFEPA and models in [17] for FER on three different sets of KDEF dataset.

The Model	Input Size	Accuracy % KDEF (980)	Accuracy % KDEF (3920)	Accuracy % KDEF (4900)
VGG16 [17]	300 × 300	98.78	73.64	74.28
Ensemble [17]	300 × 300	98.47	82.88	92.95
Barra, et al. [24]	Not available	82.71	Not available	Not available
Yaddaden [26]	128 × 128	88.88	Not available	Not available
SAFEPA	150 × 150	97.94	95.51	94.29

Table 4. The performance obtained by models presented in [25] and running SAFEPA for FER on 70% training and 30% testing of a subset of KDEF with 980 samples.

	HOG [25]	SAFEPA
Accuracy	80.95	84.19%
Precision	Not available	85.69%
Recall	Not available	83.39%
F1-Score	Not available	83.54%

Table 5. Prediction detail of SAFEPA model for 7 emotions on frontal face samples of KDEF dataset.

Emotions	Samples	Correct Prediction	Incorrect Prediction	Percentage %
Afraid	40	29	11	72.50
Angry	36	31	5	86.11
Disgusted	52	50	2	96.15
Happy	41	41	0	100.0
Neutral	46	42	4	91.30
Sad	39	21	18	53.85
Surprised	37	31	6	83.78
Total	291	245	46	84.19

Table 6. Performance of SAFEPA system on pain assessment task when evaluated on BioVid dataset.

The Model	Percentage of Used Data	Use Face Detection	Cross Validation	Accuracy %	Processing Time in Second
Werner, et al. [37]	100%	Yes	leave-one-out	30.80	Not available
Bourou, et al. [38]	100%	Yes	10-fold	27.13	Not available
Xiang, et al. [39]	33.71% (unknown selection method)	Yes	k-fold	37.42	Not available
SAFEPA	100%	No	Testing all the data	33.28	17.82

Table 7. The performance of SAFEPA components.

The Model	UNBC-McMaster	UNBC-McMaster	KDEF (980-Frontal)	KDEF (3920-Excluding Frontal)	KDEF (4900-Entire Dataset)
Cross-Validation	10-Fold	Unseen Subject	10-Fold	10-Fold	10-Fold
Basic	98.67%	82.69%	98.35%	92.02	91.74%
FEAPAS	99.10%	90.56%	98.66%	Not available	Not available
SAFEPA	98.93%	84.06%	97.94%	95.51	94.29

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alghamdi, T.; Alaghband, G. SAFEPA: An Expandable Multi-Pose Facial Expressions Pain Assessment Method. Appl. Sci. 2023, 13, 7206. https://doi.org/10.3390/app13127206

AMA Style

Alghamdi T, Alaghband G. SAFEPA: An Expandable Multi-Pose Facial Expressions Pain Assessment Method. Applied Sciences. 2023; 13(12):7206. https://doi.org/10.3390/app13127206

Chicago/Turabian Style

Alghamdi, Thoria, and Gita Alaghband. 2023. "SAFEPA: An Expandable Multi-Pose Facial Expressions Pain Assessment Method" Applied Sciences 13, no. 12: 7206. https://doi.org/10.3390/app13127206

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SAFEPA: An Expandable Multi-Pose Facial Expressions Pain Assessment Method

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Datasets

3.1.1. JAFFE Dataset

3.1.2. KDEF Dataset

3.1.3. The UNBC-McMaster Dataset

3.1.4. The BioVid Heat Pain Database

3.2. The Model

3.2.1. FEAPAS

3.2.2. Sparse Autoencoders SAE

3.2.3. The Sparse Autoencoders for Facial Expressions-Based Pain Assessment SAFEPA

3.3. The Evaluation

4. Experimentation and Results

4.1. Experiment 1: How Well SAFEPA Performs in Pain Assessment with a Frontal and Partial Veiw

4.2. Experiment 2: How Well SAFEPA Performs in the Facial Expression Recognition Task

4.3. Experiment 3: Test in Real-Time: How Well the SAFEPA System Performs with a New Pain Assessment Dataset

4.4. Ablation Test

5. Conclusions and Future Works

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI