Air pollution is a growing concern in urban areas, and fine particulate matter poses significant risks to public health. Fine particulate matter is defined as particles that are 2.5 microns or less in diameter (PM2.5). Emissions from the combustion of gasoline, oil, diesel fuel, and wood produce much of the PM2.5 pollution found in outdoor air.
This work explores the use, for the first time, of machine learning techniques to predict PM2.5 air quality levels in Tashkent, Uzbekistan. The primary goal is to develop robust predictive models that can accurately estimate PM2.5 concentrations based on environmental and temporal factors. Open-source air quality datasets from ten automated air quality monitoring stations were utilized, and additional features, such as weather conditions and seasonal trends, were implemented to improve model accuracy. A hypothesis-driven approach was adopted to test the relevance of these features and assess their impact on model performance. This study employed a range of regression models, starting with linear regression and progressively advancing to more sophisticated methods, including ensemble models such as Random Forest and Gradient Boosting.
The performance of these models was evaluated using the R² metric, with a focus on balancing accuracy and model interpretability.
Our results exhibit the great potential of machine learning in addressing urban air quality challenges and pave the way for informed environmental strategic decision making in Tashkent city and similar urban contexts.