Fusion Vision Transformers and Convolutional Neural Networks for Facial Beauty Predictions

Djamel Eddine Boukhari

Previous Article in event

Acquiring News Texts about Public Security for the construction of Corpora in Portuguese

Next Article in event

Asymmetric logistic model applied as an activation function in artificial neural networks

Fusion Vision Transformers and Convolutional Neural Networks for Facial Beauty Predictions

Djamel Eddine Boukhari

¹ University of Eloued, PO Box 789, 39000, El Oued, Algeria

Academic Editor: Eugenio Vocaturo

Published: 04 December 2024 by MDPI in The 5th International Electronic Conference on Applied Sciences session Computing and Artificial Intelligence

Abstract:

We aimed to develop a system that can analyze faces and predict how attractive humans will find them. This is a complex task because beauty perception is subjective and influenced by cultural background. Facial beauty prediction (FBP) is a significant visual recognition problem for the assessment of facial attractiveness, which is consistent with human perception. A deep learning method has recently demonstrated an amazing ability for feature representation and analysis; in particular, Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) are powerful tools for image analysis. The CNNs learn to identify features associated with attractiveness and use them to predict beauty scores for new faces. This paper proposes a new fusion ViTs–CNN network which incorporates the strengths of combining ViTs with CNNs to lead to improved performance and efficiency in predicting beauty scores. This approach takes pre-trained models from Mobilenetv3, DenseNet121 and InceptionV3, combines them with ViTs, and fine-tunes them to predict facial beauty. This approach can provide insights into how the models are leveraging the strengths of both architectures. Testing this method on the SCUT-FBP5500, the ViTs–CNN network achieved a Pearson coefficient of 0.9480. This indicates that the fusion ViTs–CNN network's facial beauty predictions are closer to human evaluation compared to traditional methods for assessing facial attractiveness.

Keywords: Vision Transformers, Convolutional Neural Networks, Facial beauty prediction, Deep learning.

0 Reads
0 Recommendations

Djamel Eddine Boukhari