Please login first
A Robust Approach for Emotional Assessment: The Employment of Power-Normalized Cepstral Coefficient and Stacked Classifiers
* , , ,
1  Department of Engineering and Geology, University G. D’Annunzio of Chieti-Pescara, 65127 Pescara, Italy
Academic Editor: Andrea Cataldo

Abstract:

Introduction: Emotional assessment has become a primary focus in multiple fields, thanks to its power to encompass the real-time status of individuals. One of the traits most affected by emotional status is the voice, recognized as a signal carrying a great deal of information. Effective computing through voice recordings holds significant importance in various fields, ranging from healthcare to human–computer interaction. In fact, by analyzing vocal cues, effective computing systems can detect emotional states, providing critical insights into a person's mental health and well-being. This study aims to develop a machine learning (ML) approach for emotion recognition from vocal recordings.

Methods: Emotion classification was performed using audio recordings from the EMOVO dataset (EMOVO Corpus: an Italian Emotional Speech Database), comprising syntactically neutral phrases spoken by six actors across seven emotions: neutral, disgust, anger, surprise, fear, sadness, and joy. The approach began with audio preprocessing, where a set of 20 power normalized cepstral coefficients was extracted. Crucially, the training and testing sets were divided in a manner to ensure an equal representation of each emotion class, maintaining balanced compositions, bolstering the reliability of the model proposed. Subsequently, a stacked ML model was employed, comprising kNN and SVM as base models, augmented with Extreme Gradient Boosting.

Results: This model achieved remarkable accuracies of 87% during training and 81% during testing, showcasing robustness and premises for novel and diverse applications. The methodology emphasized maintaining balanced distributions in predictions, ensuring unbiased and non-overfitted results.

Conclusion: This comprehensive approach integrated advanced features and a systematic classification strategy, contributing significantly to the advancement of emotion analysis in audio data, and fostering the development of more intuitive, responsive, and human-centered technology solutions.

Keywords: Machine Learning; Speech Analysis; Cepstral Coefficients; Emotional Assessment
Top