A Two-Stage Random Forest Analysis Framework for Electricity Price Forecasting and Spike Driver Interpretation

Wei Lu; Jay Wang; Caiyi Song

Abstract:

Introduction:

Electricity price forecasting in power markets is critical for energy trading, grid dispatch, and risk management. However, traditional forecasting models often focus on improving accuracy while lacking the ability to explain the underlying drivers of predictions, especially during key scenarios such as price spikes. To address this challenge, this study focuses on the high volatility and pronounced price spikes observed in the Australian National Electricity Market (NEM). It proposes a two-stage integrated framework that combines time-series feature engineering with interpretability analysis. The framework is designed to achieve both high predictive accuracy and strong interpretability, providing decision-makers in this market with analytical tools that offer both robust performance and meaningful insight.

Methods:

This study focuses on a systematic two-stage random forest analysis framework, empirically validated using historical operational data from the Australian NEM. The first stage involves dynamic feature engineering, constructing a multi-dimensional feature pool that includes original variables, lagged features (1–24 hours, 48 hours, 168 hours), rolling statistics (mean, standard deviation, extreme values), time indicators (hour, day of week, month, etc.), interaction terms, differentials and change rates, and statistical features, generating over a hundred candidate features in total. The second stage focuses on feature selection and modelling: first, feature importance is evaluated based on out-of-bag error increment from the random forest, selecting key features that either cumulatively account for more than 95% of importance or the top 30 features; subsequently, a final random forest model is trained using the selected features. To further enhance model interpretability, the framework integrates SHapley Additive exPlanations (SHAP) analysis, obtaining feature contribution values through simplified calculations, with special attention given to identifying driving factors during the Australian market's electricity price spike periods.

Results:

Testing results show that the proposed framework achieves prediction performance on the validation set with a root mean square error (RMSE) of 10.1626, a mean absolute error (MAE) of 3.8179, and a mean absolute percentage error (MAPE) of 33.36%. Compared to the baseline model using only original features, both RMSE and MAPE are significantly reduced. Compared to the model using all features without selection, RMSE improves by 1.01% and MAPE by 12.56%, indicating that feature selection effectively enhances the model’s generalization capability and robustness. Interpretability analysis further reveals that interaction features (such as price–demand interaction terms), price rolling statistics, and key variables among the original features contribute most significantly to predictions. Among the 209 identified Australian market electricity price spike moments, SHAP analysis clearly points out the main driving factors influencing spike formation, providing a quantitative basis for understanding extreme price volatility in this market.

Conclusion:

The integrated framework of time-series feature engineering and interpretability analysis proposed in this study not only improves the accuracy of Australian electricity market price forecasting through a two-stage feature selection mechanism, but also achieves transparency and interpretability in model decision-making via the SHAP method. This framework can effectively identify key causes of this market's electricity price spikes, providing electricity market operators, traders, and policymakers with an analytical tool that combines predictive performance with mechanistic explanation.