Facial Beauty Prediction Using an Ensemble of Deep Convolutional Neural Networks

Boukhari, Djamel Eddine; Chemsa, Ali; Taleb-Ahmed, Abdelmalik; Ajgou, Riadh; Bouzaher, Mohamed taher

doi:10.3390/ASEC2023-15400

Open AccessProceeding Paper

Facial Beauty Prediction Using an Ensemble of Deep Convolutional Neural Networks^†

by

Djamel Eddine Boukhari

^1,*

,

Ali Chemsa

¹,

Abdelmalik Taleb-Ahmed

²,

Riadh Ajgou

¹ and

Mohamed taher Bouzaher

³

¹

LGEERE Laboratory Department of Electrical Engineering, University of El Oued, El-Oued 39000, Algeria

²

Institut d’Electronique de Microélectronique et de Nanotechnologie (IEMN), UMR 8520, Université Polytechnique Hauts de France, Université de Lille, CNRS, 59313 Valenciennes, France

³

Scientific and Technical Research Centre for Arid Areas (CRSTRA), Biskra 07000, Algeria

^*

Author to whom correspondence should be addressed.

^†

Presented at the 4th International Electronic Conference on Applied Sciences, 27 October–10 November 2023; Available online: https://asec2023.sciforum.net/.

Eng. Proc. 2023, 56(1), 125; https://doi.org/10.3390/ASEC2023-15400

Published: 27 October 2023

(This article belongs to the Proceedings of The 4th International Electronic Conference on Applied Sciences)

Download

Browse Figures

Versions Notes

Abstract

:

The topic of facial beauty analysis has emerged as a crucial and fascinating subject of human culture. With various applications and significant attention from researchers, recent studies have investigated the relationship between facial features and age, emotions, and other factors using multidisciplinary approaches. Facial beauty prediction is a significant visual recognition problem in the assessment of facial attractiveness, which is consistent with human perception. Overcoming the challenges associated with facial beauty prediction requires considerable effort due to the field’s novelty and lack of resources. In this vein, a deep learning method has recently demonstrated remarkable abilities in feature representation and analysis. Accordingly, this paper proposes an ensemble based on pre-trained convolutional neural network models to identify scores for facial beauty prediction. These ensembles are three separate deep convolutional neural networks, each with a unique structural representation built by previously trained models from Inceptionv3, Mobilenetv2, and a new simple network based on Convolutional Neural Networks (CNNs) for facial beauty prediction problems. According to the SCUT-FBP5500 benchmark dataset, the obtained 0.9350 Pearson coefficient experimental result demonstrated that using this ensemble of deep networks leads to a better prediction of facial beauty closer to human evaluation than conventional technology that spreads facial beauty. Finally, potential research directions are suggested for future research on facial beauty prediction.

Keywords:

deep learning; convolutional neural networks; facial beauty prediction; performance evaluation

1. Introduction

The human face holds a distinct importance in our social interactions, and the pursuit of beauty, particularly facial beauty, is an enduring and ubiquitous feature of human nature. In recent years, there has been a significant increase in the demand for aesthetic surgery, underscoring the importance of a nuanced understanding of beauty in medical settings [1]. Remarkably, the exploration of physical beauty in humans has a storied history dating back over 4000 years, demonstrating the enduring relevance of this topic [2]. The importance of physical beauty in the face has been studied for hundreds of years, and its influence on social decisions such as partner choices and hiring decisions is well documented [3]. The perception of facial attractiveness is considered a highly desirable physical trait, with philosophers, artists, and scientists attempting to understand the secrets of beauty [4].

Facial beauty prediction is an emerging topic that is receiving increasing attention from researchers and users alike, particularly in the field of facial recognition and understanding [5]. Beauty is viewed as a form of information in computer-based face analysis, and it is linked to how people perceive attractiveness. In the field of psychology, several theories have been established on how people observe facial attractiveness. However, studying face attractiveness using computers is a relatively recent research area, with limited resources and few articles published on this subject. Several works have focused on analysing the irregular features of facial attractiveness [6].

The analysis of facial attractiveness presents two main challenges. First, the complexity of human perception and the wide variety of facial features make it difficult to build robust and effective models for evaluating beauty. Secondly, many face reference databases are primarily configured for face recognition problems and are not suitable for attractiveness prediction [7]. Therefore, most facial beauty studies focus on designing facial beauty descriptors [8].

In recent years, most research on facial beauty prediction has been based on deep learning methods [9]. The development of deep learning architecture has been driven by the strength and adaptability of these algorithms, particularly convolutional neural networks (CNNs) [10]. These algorithms offer a novel perspective on the facial beauty prediction problem and have shown promising results for several computer vision applications, such as face recognition, object identification, semantic segmentation, image classification, biomedical analysis, captioning, and biometrics [11]. DCNNs perform much better [12]. We created new ensemble models for facial attractiveness evaluation. As a result, this study proposes ensembles as three separate deep convolutional neural networks for the facial beauty prediction (FBP) problem. The contributions of this paper are as follows:

The investigation of the effectiveness of conventional transfer learning techniques for facial beauty prediction.
We provide an ensemble regression for facial attractiveness evaluation using the projected scores of networks with with InceptionV3 and MobileNetV2, and a new simple network based on Convolutional Neural Networks is proposed with loss functions.
The efficiency of the suggested approach is demonstrated using the specialized FBP dataset, SCUT-FBP-5500. The efficiency of merging the assessments of several predictors in the proposed ensemble DCNNs regression model, which is considerably compatible with the ground truth of the dataset used, is demonstrated by the findings, which are encouraging. We have made our scripts and pre-processed pictures available to the general public at (https://github.com/DjameleddineBoukhari/ENCNN, accessed on 28 January 2023).

The structure of this paper is as follows: Some related research on facial attractiveness prediction is included in Section 2. Section 3 explains the selection process of the architectures used. On the SCUT-FBP5500 dataset, Section 4 provides the experimental findings and performance assessments.

2. Convolution Neural Networks Architecture for FBP

The initial concept of neural networks was extended by the human nervous system by mimicking the human nervous system. Scientists propose the concept of neural networks. Convolutional neural networks are further improvements of the neural network concept. The arrival of this model is good news for auto-vision [13].

Numerous techniques based on deep neural networks (DNN) have been developed for FBP. One of the most popular CNN architectures, ResNet [14], has been utilized in a number of computer vision applications.

K. Cao et al. [5] used residual-in-residual (RIR) groups to build a deeper network. A combined spatial-wise and channel-wise attention mechanism was introduced for better feature comprehension. The authors presented their facial beauty database SCUT-FBP5500 [15], with two evaluation protocols (five-fold cross-validation 80–20% and 60–40% split). They tested three CNN architectures, namely Alexnet [10], Resnet-18 [14], and ResneXt-50 [14]. The results show better feature comprehension. In [16], R3CNN architecture is proposed to integrate relative ranking into regression to improve the performance of FBP, and it can be flexibly implemented using existing CNNs as a backbone network. This architecture provides better results than the SCUT-FBP [17] and SCUT-FBP5500 [15] datasets.

3. Methodology

In order to predict facial beauty, this work builds an ensemble of trained models. Mainly, our proposed EN-CNN architecture approach focuses on three pre-trained models, and the estimates of both are combined to create a final prediction of facial beauty [18]. Transfer learning is used by the pre-trained models to reduce their weights so that they can perform a comparable regression task. For facial beauty, an ensemble learning of trained models achieves greater performance. Consequently, in this study, we transfer the weights of a set of three potent pre-trained CNN models. The next part presents the planned ensemble learning as well as the pre-trained Deep CNN models [19].

3.1. Pre-Trained InceptionV3

The Inception V3 [20] CNN was introduced by Google teams in [11]. Based on the Inception V1 model, Inception V3’s architecture was updated. The InceptionV3’s design incorporates numerous kernel types at the same level. Instead of using a large filter size of 7 × 7 and 5 × 5, the InceptionV3 uses a modest filter size of 1 × 7 and 1 × 5. Furthermore, a bottleneck of 1 × 1 convolutions is used, improving feature representation as a result. Beginning with the input data, three distinct convolutional layers with 3 × 3 or 5 × 5 filter sizes are created by mapping parallel calculations. These layers’ output is combined into one layer, which is known as the output layer.

3.2. Pre-Trained MobileNetV2

A lightweight CNN model called MobileNet [21] is built on inverted residuals and a linear bottleneck, which provide quick connections between thin layers. It is a low-latency model that uses a small amount of power; therefore, it can be used to manage limited hardware resources. The MobileNet’s key benefit is the trade-off it makes between several elements including latency, accuracy, and resolution. In MobileNet, feature maps are generated using point-wise convolutional kernels and depth separable convolutional (DSC) kernels. In MobileNet [21], DSC first filters the input image’s spatial dimensions using depth-wise kennel 2-D filters. The depth-wise filter has a size of Dk × Dk × 1, which is significantly smaller than the size of the input images [22].

3.3. S-CNNs Network

In order to predict facial beauty, this work builds simple CNNs. Mainly, our proposed S-CNN architecture approach is constructed of several convolution layers and one fully connected layer at the end. Within each convolution layer, a 2D convolution is carried out, followed by ReLU activation. To demonstrate the effectiveness of the present S-CNNs, we control the recent progress in neural architecture search to develop a new family of MixConv-based models. Our neural architecture is an ensemble of simple CNNs models, where the contribution resides in the method of layer mixing with a kernel size of 3 × 3, 5 × 5, illustrated in Figure 1.

3.4. The Proposed EN-CNNs

The proposed ensemble of deep CNNs (EN-CNNs) architecture follows the three previously trained models (InceptionV3, MobileNetV2, and S-CNN). In the proposed EN-CNNs for the automated classification system, they act as basic classifiers of facial beauty. The details of the proposed EN-CNNs are depicted in Figure 2.

4. Experiment

This section describes the experiments and assessment findings from the models (EN-CNN) used in this study. The SCUT-FBP5500 dataset [15] is used for network training. Our network is trained for 200 iterations with a batch size of b = 25. The Adam optimizer updates the parameters with a learning rate of lr = 1e − 6. The selected loss function was MSE.

The SCUT-FBP5500 Dataset

The standard SCUT-FBP5500 dataset [15] is introduced in this study, which comprises 5500 frontal face images at 350 × 350 resolutions with various attributes, including race (Asian/Caucasian), gender (female/male), and age (15–60) [23,24].

As shown in Figure 3, the ground truth rating for each face in the dataset is the average of all evaluations given on a scale from 1 to 5 by the 60 ratters. This enables the use of various computational models with various facial attractiveness prediction paradigms. The 2000 Asian females (AF), 2000 Asian men (AM), 750 Caucasian females (CF), and 750 Caucasian males (CM) are the four subsets of the SCUT-FBP5500 Dataset that may be separated according to race and gender [25].

5. Results and Discussion

An automatic face attractiveness estimate is achieved using an ensemble DCNNs regression. The experimental findings from the suggested EN-CNN are presented in this subsection. We carry out comparisons utilizing a range of techniques, including geometric feature-based and deep learning-based techniques, such as AlexNet, ResNet-18, ResNeXt-50, CNN–SCA, R3CNN, and Semi-supervised, etc. MAE, RMSE, and PC are chosen as the metrics. Table 1 shows the performance comparisons of the SCUT-FBP5500 dataset of facial beauty prediction to testify the EN-CNN capacity via comparison, which holds 80–20% splitting.

Typically, the quantity of parameters imposes a limit on the performance improvement. The present model performs better than other models (AlexNet, ResNet-18, ResNeXt-50, CNN–SCA, R3CNN, and Semi-supervised, etc.). It uses an ensemble of deep CNNs (EN-CNNs) architectures for the three models. Our EN-CNNs hold 26.99 M parameters, InceptionV3 with 22.85 M parameters model, MobileNetV2 with 2.91 M parameters model, and our S-CNN network holds 1.23 M parameters and 224.16 MFlops. CNN–SCA has 6.75 M parameters and 34.25 BFlops. ResNeXt-50 has 25.03 M parameters and 5.56 BFlops. AlexNet has 62.38 M parameters and 1. 5 BFlops. This comparison reveals that our network is better than those of the cited works. This confirms that both the proposed EN-CNNs network played a crucial role in outperforming the state-of-the-art methods. In Figure 4, we present the comparisons of the predicted scores.

6. Conclusions

In this paper, an ensemble of deep CNNs for facial beauty prediction is proposed. We present an ensemble regression for facial beauty estimation by studying the effect of the standard transfer learning approaches on the facial beauty prediction problem and by combining the predicted scores of networks with a three-branch network (InceptionV3, MobileNetV2, and S-CNN) trained with loss functions. We describe and optimize a set of hyper-parameters for the new set of pre-trained models to classify facial beauty. By utilizing an ensemble of all the previously mentioned transfer learning techniques, an ensemble (EN-CNN) was developed to predict scores in facial beauty. The experimental findings show that our network can perform better than the previous CNN baseline approaches. The experimental results showed that the proposed network achieved better performance compared to several works available in the available literature. It improves the assessment’s congruence with human judgment. From this perspective, we propose to expand the scope of the database and improve the network using different architectures collected from Transfomer and ResNeSt.

Author Contributions

D.E.B.: the proposition of the present method, design, and writing a draft of the manuscript; A.C. and A.T.-A.: interpretation, revision and proofreading; R.A.: formal analysis; M.t.B.: concepts, data analysis, and discussion of the results. All authors have read and agreed to the published version of the manuscript.

Funding

This work funded by the Directorate General for Scientific Research and Technological Development (DGRSDT), Ministry of Higher Education and Scientific Research–Algeria.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset SCUT-FBP5500 analyzed during the current study is available in the github repository, https://github.com/HCIILAB/SCUT-FBP5500-Database-Release (accessed on 15 June 2023).

Acknowledgments

The authors express their appreciation to the technical reviewers for their constructive comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhang, D.; Chen, F.; Xu, Y. Computer Models for Facial Beauty Analysis; Springer International Publishing: Cham, Switzerland, 2016. [Google Scholar]
Fan, J.; Chau, K.; Wan, X.; Zhai, L.; Lau, E. Prediction of facial attractiveness from facial proportions. Pattern Recognit. 2012, 45, 2326–2334. [Google Scholar] [CrossRef]
Helen, K.; Keith, O. Ranking facial attractiveness. Eur. J. Orthod. 2005, 27, 340–348. [Google Scholar]
Gan, J.; Xie, X.; Zhai, Y.; He, G.; Mai, C.; Luo, H. Facial Beauty Prediction Fusing Transfer Learning and Broad Learning System. Soft Comput. 2022, 27, 13391–13404. [Google Scholar] [CrossRef]
Cao, K.; Choi, K.-N.; Jung, H.; Duan, L. Deep learning for facial beauty prediction. Information 2020, 11, 391. [Google Scholar] [CrossRef]
Saeed, J.N.; Abdulazeez, A.M. Facial beauty prediction and analysis based on deep convolutional neural network: A review. J. Soft Comput. Data Min. 2021, 2, 1–12. [Google Scholar] [CrossRef]
Gray, D.; Yu, K.; Xu, W.; Gong, Y. Predicting facial beauty without landmarks. In Proceedings of the Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Greece, 5–11 September 2010; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Gan, J.; Xiang, L.; Zhai, Y.; Mai, C.; He, G.; Zeng, J.; Bai, Z.; Labati, R.D.; Piuri, V.; Scotti, F. 2M BeautyNet: Facial beauty prediction based on multi-task transfer learning. IEEE Access 2020, 8, 20245–20256. [Google Scholar] [CrossRef]
Diao, H.; Hao, Y.; Xu, S.; Li, G. Implementation of Lightweight Convolutional Neural Networks via Layer-Wise Differentiable Compression. Sensors 2021, 21, 3464. [Google Scholar] [CrossRef] [PubMed]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
Gan, J.; Jiang, K.; Tan, H.; He, G. Facial beauty prediction based on lighted deep convolution neural network with feature extraction strengthened. Chin. J. Electron. 2020, 29, 312–321. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Liang, L.; Lin, L.; Jin, L.; Xie, D.; Li, M. SCUT-FBP5500: A diverse benchmark dataset for multi-paradigm facial beauty prediction. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; IEEE: Piscataway, NJ, USA, 2018. [Google Scholar]
Lin, L.; Liang, L.; Jin, L. Regression guided by relative ranking using convolutional neural network (R3CNN) for facial beauty prediction. IEEE Trans. Affect. Comput. 2019, 13, 122–134. [Google Scholar] [CrossRef]
Xie, D.; Liang, L.; Jin, L.; Xu, J.; Li, M. Scut-fbp: A benchmark dataset for facial beauty perception. In Proceedings of the 2015 IEEE International Conference on Systems, Man, and Cybernetics, Kowloon Tong, Hong Kong, 9–12 October 2015; IEEE: Piscataway, NJ, USA, 2015. [Google Scholar]
Lou, G.; Shi, H. Face image recognition based on convolutional neural network. China Commun. 2020, 17, 117–124. [Google Scholar] [CrossRef]
Peng, S.; Huang, H.; Chen, W.; Zhang, L.; Fang, W. More trainable inception-ResNet for face recognition. Neurocomputing 2020, 411, 9–19. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Srinivasu, P.N.; SivaSai, J.G.; Ijaz, M.F.; Bhoi, A.K.; Kim, W.; Kang, J.J. Classification of skin disease using deep learning neural networks with MobileNet V2 and LSTM. Sensors 2021, 21, 2852. [Google Scholar] [CrossRef] [PubMed]
Dornaika, F.; Moujahid, A. Multi-View Graph Fusion for Semi-Supervised Learning: Application to Image-Based Face Beauty Prediction. Algorithms 2022, 15, 207. [Google Scholar] [CrossRef]
Lebedeva, I.; Ying, F.; Guo, Y. Personalized facial beauty assessment: A meta-learning approach. Vis. Comput. 2022, 39, 1095–1107. [Google Scholar] [CrossRef]
Boukhari, D.; Chemsa, A.; Ajgou, R.; Bouzaher, M. An Ensemble of Deep Convolutional Neural Networks Models for Facial Beauty Prediction. J. Adv. Comput. Intell. Intell. Inf. 2023, 27, 1209–1215. [Google Scholar] [CrossRef]
Bougourzi, F.; Dornaika, F.; Taleb-Ahmed, A. Deep learning based face beauty prediction via dynamic robust losses and ensemble regression. Knowl.-Based Syst. 2022, 242, 108246. [Google Scholar] [CrossRef]
Zhang, P.; Liu, Y. NAS4FBP: Facial Beauty Prediction Based on Neural Architecture Search. In Artificial Neural Networks and Machine Learning—ICANN 2022, Lecture Notes in Computer Science, Proceedings of ICANN 2022: 31st International Conference on Artificial Neural Networks, Bristol, UK, 6–9 September 2022; Springer: Cham, Switzerland, 2022; Volume 13531, pp. 225–236. [Google Scholar]

Figure 1. The architecture of S-CNNs network.

Figure 2. The proposed deep CNN ensemble networks (EN-CNNs).

Figure 3. Images of various facial features and beauty ratings from the SCUT-FBP5500 benchmark dataset.

Figure 4. Comparisons of the ground-truth and predicted scores given by EN-CNNs.

Table 1. Performance comparisons of the SCUT-FBP5500 dataset.

Methods	Pre-Training	MAE ↓	RMSE ↓	PC ↑
AlexNet [15]	ImageNet	0.2651	0.3481	0.8634
ResNet-18 [15]	ImageNet	0.2419	0.3166	0.8900
ResNeXt-50 [15]	ImageNet	0.2291	0.3017	0.8997
CNN–SCA [5]	ImageNet	0.2287	0.3014	0.9003
R3CNN [16]	ImageNet	0.2120	0.2800	0.9142
Semi-supervised [23]	VGGFace2	0.2210	0.2870	0.9113
CNN-ER [26]	VGGFace2	0.2009	0.2650	0.9250
NAS4FBP Net [27]	ImageNet	0.1939	0.2579	0.9275
EN-CNN Ours	ImageNet	0.1933	0.2482	0.9350

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Boukhari, D.E.; Chemsa, A.; Taleb-Ahmed, A.; Ajgou, R.; Bouzaher, M.t. Facial Beauty Prediction Using an Ensemble of Deep Convolutional Neural Networks. Eng. Proc. 2023, 56, 125. https://doi.org/10.3390/ASEC2023-15400

AMA Style

Boukhari DE, Chemsa A, Taleb-Ahmed A, Ajgou R, Bouzaher Mt. Facial Beauty Prediction Using an Ensemble of Deep Convolutional Neural Networks. Engineering Proceedings. 2023; 56(1):125. https://doi.org/10.3390/ASEC2023-15400

Chicago/Turabian Style

Boukhari, Djamel Eddine, Ali Chemsa, Abdelmalik Taleb-Ahmed, Riadh Ajgou, and Mohamed taher Bouzaher. 2023. "Facial Beauty Prediction Using an Ensemble of Deep Convolutional Neural Networks" Engineering Proceedings 56, no. 1: 125. https://doi.org/10.3390/ASEC2023-15400

Article Menu

Facial Beauty Prediction Using an Ensemble of Deep Convolutional Neural Networks^†

Abstract

1. Introduction

2. Convolution Neural Networks Architecture for FBP