Nowadays air quality is a major issue, especially in large cities. Apart from air pollution, particulate matter (PM), especially PM2.5, poses serious health risks to individuals with respiratory conditions. Accurate forecasting of PM levels is crucial to warn vulnerable populations and reduce exposure. Machine learning models can effectively predict PM concentrations based on historical data and barometric conditions such as temperature and humidity. Such predictions can support timely public health interventions and environmental policy decisions. The selection of the optimal machine learning model for time series forecasting requires a careful balance between predictive accuracy and computational efficiency. This study evaluates a number of widely used models, such as Random Forest (RF), Long Short-Term Memory (LSTM), Convolutional Neural Network-LSTM (CNN-LSTM), and Extreme Gradient Boosting (XGB), in the context of time series forecasting for particulate matter (PM) concentrations.
Performance is assessed using three key error metrics: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Scaled Error (MASE). Additionally, the computational demands and development complexity of each model are analyzed.
The overall results are of great interest for each application model, and in more detail, it is shown that the best compromise between accuracy and efficiency can be achieved, while a corresponding prediction model with satisfactory predictive performance can be implemented.
