Please login first
Data-centric Performance Improvement Strategies for Few-shot Classification of Chemical Sensor Data
* 1 , 1 , 2 , 1 , 1
1  Fraunhofer Institute for Integrated Circuits IIS
2  University of Erlangen-Nuremberg (Friedrich-Alexander-Universität Erlangen-Nürnberg)
Academic Editor: Stefano Mariani


Metal-oxide (MOX) sensors offer a low-cost solution to detect volatile organic compound (VOC) mixtures. However, their operation involves time-consuming heating cycles, leading to a slower data collection and data classification process. This work introduces a few-shot learning (FSL) approach that promotes rapid classification. In this approach, a model trained on several base classes is finetuned to recognize a novel class using a small number (n = 5, 25, 50, and 75) of randomly selected novel class measurements/shots. The used dataset comprises MOX sensor measurements of four different juices (apple, orange, blackcurrant and multivitamin) and air, collected over 10-minute phases using a pulse heater signal. While a high average accuracy of 82.46 is obtained for 5-class classification using 75 shots, the model’s performance depends on the juice type. One-shot validation showed that not all measurements within a phase are representative, forcing careful shot selection to achieve a high classification accuracy. Error analysis revealed contamination of some measurements by the previously measured juice, a characteristic of MOX sensor data that is often overlooked and equivalent to mislabelling. Three strategies are adopted to overcome this: (E1) and (E2) fine-tune after dropping initial/final measurements and the first half of each phase, respectively, (E3) pretrained with data from the second half of each phase. Results show that each of the strategies performs best for a specific number of shots. E3 results in the highest performance for 5-shot learning (accuracy 63.69), whereas E2 yields best results for 25-/50-shot learning (accuracies 79/87.1) and E3 predicts best for 75-shot learning (accuracy 88.6). Error analysis also showed that for all strategies more than 50% of air misclassifications resulted from contamination, but E2 was affected the least. This work demonstrates how strongly data quality can affect prediction performance especially for FSL methods and that a data-centric approach can improve results.

Keywords: Arftificial olfaction; Metaloxide sensors; Few-shot classification; Convolutional neural networks; Data-centric