Objectives/Introduction: In acoustic voice assessment, recordings are typically collected from diverse environments with varying levels of noise and reverberation. These room acoustics are known to affect the quality of recordings and acoustic analysis, but their impact on advanced tools like machine learning remains little understood. This paper investigates how different room acoustics, particularly reverberation, influence machine learning performance in assessing voice quality and dysphonia.
Methods: This retrospective study utilized voice recordings of sustained /a:/ samples from 193 subjects (145 with voice disorders and 48 without vocal problems). The recordings were modified to add on different levels of reverberation and noise using Audacity software, simulating various room acoustic environments. Using a MATLAB script and Praat software, we extracted different acoustic measurements (temporal- and spectral-based metrics) from the original and corrupted recordings. Various machine learning models were then trained on the generated acoustic features. The models were evaluated for accuracy, sensitivity, and specificity to compare the impact of the recordings, both before and after adding reverberation and noise effects, on machine learning performance in detecting voice disorders.
Results and Conclusions: The recordings were successfully mixed with varying levels of reverberation and noise, creating a diverse set of datasets. Machine learning models were trained and evaluated on these datasets to classify normal and pathological voices under different noise and reverberation conditions. A comparison of the models demonstrated that higher levels of reverberation and noise degrade classification performance. Identifying the acceptable room acoustic conditions where machine learning models produce reliable results helps in optimizing and standardizing environmental conditions for data collection, ensuring accurate voice assessment outcomes.