Please login first
Missing data imputation using machine learning techniques applied to IoT air quality sensors: A case study in Amazonia
* , ,
1  University of Acre, Rio Branco, Acre, Brazil
Academic Editor: Eugenio Vocaturo

Abstract:

The problem of poor air quality in the Amazon is a serious issue, as air pollution in the region negatively affects public health, resulting in thousands of premature deaths and severe damage to the environment. Monitoring emissions is crucial for enforcing laws that restrict these emissions and for preventing fires and their devastating consequences. For this reason, an air quality monitoring network has been implemented in the Amazon region, currently with several sensors distributed throughout the state of Acre/Brazil. However, many sensors have significant data gaps, in some cases with more than 80% loss. This is due to power failures, internet connection problems and device defects, thus compromising the consistency and accuracy of air quality measurements. This paper investigates the use of imputation techniques applied to estimate missing data from Amazon sensors collected from January 1, 2020 to December 31, 2023. Simple imputation techniques (Mean, Median) and those based on machine learning (MICE, KNN and MissForest) were selected. In the experiments, missing data was randomly introduced into the complete dataset (from 10% to 50%), and the techniques were compared using the following evaluation metrics: Mean Square Error (MSE), Root Mean Square Error (RMSE) and coefficient of determination (R²). The results showed that advanced techniques such as KNN and MICE are superior to simpler techniques, with lower MSE and RMSE, as well as a higher R². Even for the most critical case (50% missing data), KNN achieved an MSE of 0.0013 and an R² of 0.85, and MICE presented an MSE of 0.0013 and an R² of 0.93, standing out as effective methods for data imputation.

Keywords: Missing data; Data imputation; Air quality; Machine Learning

 
 
Top