Please login first
Effect of Data Collection and Environment on Machine Learning Performance in Screening Dysphonia
* ,
1  Department of Communication Sciences and Disorders, University of Iowa, Iowa City, Iowa, USA
Academic Editor: Andrea Cataldo

Abstract:

Objectives/Introduction: Machine learning (ML) is a promising tool for assessing voice quality and dysphonia. Several public datasets containing recordings of both normal and pathological voices are available online. Since ML benefits from larger datasets, combining these available datasets could enhance ML performance. However, the varying environmental conditions under which these recordings were collected may impact ML accuracy, and the extent of this impact is unclear. This work aims to investigate how different data collection procedures affect ML efficacy in screening dysphonia.

Methods: Two datasets were considered. The first dataset included voice samples from 198 participants: 148 individuals with voice disorders and 50 vocally normal subjects. The second dataset, the publicly available PVQD database, included 276 subjects: 187 patients with voice problems and 89 without vocal issues. Various acoustic measurements (including perturbation, noise, cepstral, and spectral analyses) were estimated from the recordings using MATLAB scripts and Praat software. These measurements were derived from multiple types of speech productions: a sustained vowel /a:/ and running speech. Different ML models were trained on the extracted acoustic features from each recording and evaluated for accuracy, sensitivity, specificity, and Receiver Operating Characteristic (ROC) curves to compare the impact of each dataset, collected under different procedures, on dysphonic voice classification.

Results and Conclusions: Accurate acoustic metrics were generated from the two datasets. Using these measurements, ML models were successfully trained and evaluated to classify dysphonic versus non-dysphonic speakers. The comparative analysis revealed discrepancies in classification accuracy among the models between the two datasets and when the datasets were combined. Identifying which ML models are robust or sensitive to changes in data collection environments helps in selecting appropriate models for tasks involving different datasets with varying data collection procedures. The outcome is an important step towards more reliable/effective ML tools in screening voice disorders.

Keywords: Voice Disorders; Voice Assessment; Machine Learning; Speech Acoustics

 
 
Top