We aimed to develop a system that can analyze faces and predict how attractive humans will find them. This is a complex task because beauty perception is subjective and influenced by cultural background. Facial beauty prediction (FBP) is a significant visual recognition problem for the assessment of facial attractiveness, which is consistent with human perception. A deep learning method has recently demonstrated an amazing ability for feature representation and analysis; in particular, Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) are powerful tools for image analysis. The CNNs learn to identify features associated with attractiveness and use them to predict beauty scores for new faces. This paper proposes a new fusion ViTs–CNN network which incorporates the strengths of combining ViTs with CNNs to lead to improved performance and efficiency in predicting beauty scores. This approach takes pre-trained models from Mobilenetv3, DenseNet121 and InceptionV3, combines them with ViTs, and fine-tunes them to predict facial beauty. This approach can provide insights into how the models are leveraging the strengths of both architectures. Testing this method on the SCUT-FBP5500, the ViTs–CNN network achieved a Pearson coefficient of 0.9480. This indicates that the fusion ViTs–CNN network's facial beauty predictions are closer to human evaluation compared to traditional methods for assessing facial attractiveness.
Previous Article in event
Next Article in event
Fusion Vision Transformers and Convolutional Neural Networks for Facial Beauty Predictions
Published:
04 December 2024
by MDPI
in The 5th International Electronic Conference on Applied Sciences
session Computing and Artificial Intelligence
Abstract:
Keywords: Vision Transformers, Convolutional Neural Networks, Facial beauty prediction, Deep learning.
Comments on this paper