After an Earthquake or a building collapse, victim recovery is a challenging task. In such cases, recovery methods must prioritize fast detection and procurement of the location of victims. Human speech is one such parameter that can be used in rescue operations. This research work discusses the application of Voice Activity Detection (VAD) techniques for detecting and discriminating human speech from noise. In this paper, VAD is performed on three important spectral parameters of signals namely: flux, roll-off, and centroid. Using all the three parameters and their combinations, the VAD algorithm is tested for their success rate on a set of audio samples, containing studio-recorded speech, outdoor speech recording with background noise, and pure noise signals from different sources. The change of the signal parameters over time was plotted in separate graphs. For further processing, the information from the change of speech properties over time had to be reduced to a small set of parameters. Our new approach compresses the audio signal to the average values of positive and negative peaks. The research progresses from a method of manual threshold selection technique to machine learning-based linear discriminant method and a comparative study was made to find the best performing method for detection of speech. Using the cross-validation tests based on the linear discriminant analysis model, flux and centroid individually displayed the highest success rate for all categories of test samples with a recognition rate of 78 % to 83 %. However, stability was further improved by combing these two parameters increasing the rate to 88%.
Previous Article in event
Next Article in event
Next Article in session
Finding earthquake Victims by Voice Detection Techniques
Published:
01 November 2021
by MDPI
in 8th International Electronic Conference on Sensors and Applications
session Sensing for Robotics and Automation
Abstract:
Keywords: Voice Activity Detection, Noise separation, Audio signal processing