Please login first
Comparison of machine learning algorithms for processing of original data of electronic nose for analysis of biological samples of humans and animals
* 1 , 2 , 2 , 3
1  Voronezh State University of Engineering Technologies, Voronezh, Voronezhskaya Oblast’, Russia
2  Voronezh State University of Engineering Technologies
3  National Research University Higher School of Economics
Academic Editor: Giovanni Neri

https://doi.org/10.3390/CSAC2021-10465 (registering DOI)
Abstract:

The study's goal was to build a mathematical model capable of classifying biosamples with minor errors into groups corresponding to clinical diagnoses by the original output data of the mass-sensitive sensor array. The nasal secretion from humans and animals was investigated. One hundred forty-four calves were clinically and laboratory examined and divided by the health of respiratory organs into three groups. A sample of nasal secretion was taken from each calf. The gaseous phase over samples was measured using an array of 8 mass-sensitive sensors with solid-state nanostructured coatings in the open detection cell. During the sorption and desorption of volatile substances excreted from the samples, the sensor responses were recorded in software and then processed by un-, semi – and supervised machine learning methods. In total, 50 algorithms for processing sensor data were studied, including t-SNA, self-learning model DBSCAN, Yarovsky algorithm, BOSSVS, SAXVSM, LearningShapelets, MultivariateClassifier. The semisupervised model based on the Yarovsky algorithm had good classification reliability and gave all samples a confidence gap of more than 0.5, and for the majority of samples, the gap was no less than 0.9. Also, the nonlinear transformation of the original sensor data was used in order to obtain the simplest two-dimensional manifold on which all data points will be located separately, such as Locally Linear Embedding, Local Tangent Space Alignment, Hessian Eigenmapping, Modified Locally Linear Embedding, Isomap, Multi-dimensional Scaling, Spectral Embedding, t-distributed Stochastic Neighbor Embedding. The supervised machine learning models using the Dynamic Time Warping metric of similarity between two time series and the k-NN algorithm for classification achieved a correct classification accuracy equal 0.83. Recommendations by application of the different machine learning algorithms depending on the task of diagnostics were formulated.

Keywords: sensor array; machine learning; transformation original data; classification; diagnostics ; nasal secretion

 
 
Top