Forecasting PM2.5 Concentrations with Machine Learning: Accuracy, Efficiency, and Public Health Implications

Kyriakos Ovaliadis; Spyridon Mitropoulos; Vassilios Tsiantos; Ioannis Christakis

Previous Article in event

MarineSumm: A Multi-Objective, Semantic-Aware Optimization Framework for Extractive Summarization of Legal Texts

Next Article in event

FusionX-Net: Cross-Attention Enhanced Masked Autoencoders for Multi-Modal Remote Sensing Data Fusion

Forecasting PM2.5 Concentrations with Machine Learning: Accuracy, Efficiency, and Public Health Implications

¹,

^{*

2},

³,

⁴

¹ Department of Electrical Engineering, Eastern Macedonia and Thrace Institute of Technology, Agios Loukas, 654 04 Kavala, Greece
² Department of Surveying and Geoinformatics Engineering, University of West Attica, 28, Ag. Spyridonos Str., 12243 Egaleo, Greece
³ Department of Physics, School of Sciences, Democritus University of Thrace, Kavala Campus, 65404 Kavala, Greece
⁴ Department of Electrical and Electronics Engineering, School of Engineering, Ancient Olive Grove Campus, University of West Attica, Athens - Egaleo, GR-12244, Greece

Academic Editor: Lucia Billeci

Published: 03 December 2025 by MDPI in The 6th International Electronic Conference on Applied Sciences session Computing and Artificial Intelligence

Abstract:

Nowadays air quality is a major issue, especially in large cities. Apart from air pollution, particulate matter (PM), especially PM2.5, poses serious health risks to individuals with respiratory conditions. Accurate forecasting of PM levels is crucial to warn vulnerable populations and reduce exposure. Machine learning models can effectively predict PM concentrations based on historical data and barometric conditions such as temperature and humidity. Such predictions can support timely public health interventions and environmental policy decisions. The selection of the optimal machine learning model for time series forecasting requires a careful balance between predictive accuracy and computational efficiency. This study evaluates a number of widely used models, such as Random Forest (RF), Long Short-Term Memory (LSTM), Convolutional Neural Network-LSTM (CNN-LSTM), and Extreme Gradient Boosting (XGB), in the context of time series forecasting for particulate matter (PM) concentrations.

Performance is assessed using three key error metrics: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Scaled Error (MASE). Additionally, the computational demands and development complexity of each model are analyzed.

The overall results are of great interest for each application model, and in more detail, it is shown that the best compromise between accuracy and efficiency can be achieved, while a corresponding prediction model with satisfactory predictive performance can be implemented.

Keywords: Particulate Matter; Air Quality; Time Series Forecasting; Machine Learning; Model Evaluation; Computational Efficiency

25 Reads
0 Recommendations