Please login first
Fusion Vision Transformers and Convolutional Neural Networks for Facial Beauty Predictions
1  University of Eloued, PO Box 789, 39000, El Oued, Algeria
Academic Editor: Eugenio Vocaturo

Abstract:

We aimed to develop a system that can analyze faces and predict how attractive humans will find them. This is a complex task because beauty perception is subjective and influenced by cultural background. Facial beauty prediction (FBP) is a significant visual recognition problem for the assessment of facial attractiveness, which is consistent with human perception. A deep learning method has recently demonstrated an amazing ability for feature representation and analysis; in particular, Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) are powerful tools for image analysis. The CNNs learn to identify features associated with attractiveness and use them to predict beauty scores for new faces. This paper proposes a new fusion ViTs–CNN network which incorporates the strengths of combining ViTs with CNNs to lead to improved performance and efficiency in predicting beauty scores. This approach takes pre-trained models from Mobilenetv3, DenseNet121 and InceptionV3, combines them with ViTs, and fine-tunes them to predict facial beauty. This approach can provide insights into how the models are leveraging the strengths of both architectures. Testing this method on the SCUT-FBP5500, the ViTs–CNN network achieved a Pearson coefficient of 0.9480. This indicates that the fusion ViTs–CNN network's facial beauty predictions are closer to human evaluation compared to traditional methods for assessing facial attractiveness.

Keywords: Vision Transformers, Convolutional Neural Networks, Facial beauty prediction, Deep learning.
Comments on this paper
Currently there are no comments available.



 
 
Top