Please login first
Model Capacity Alignment under Data Scarcity: Tree-Based Ensembles versus Deep Recurrent Networks for Daily Solar Radiation Forecasting in West Africa
1  DAIM, University of Hull, Hull HU6 7RX, United Kingdom
Academic Editor: El Manaa Barhoumi

Abstract:

While deep learning architectures dominate contemporary solar forecasting literature, their effectiveness under limited data regimes remains insufficiently examined. This study investigates model capacity alignment under data scarcity using daily irradiance datasets from Nigeria, Ghana, and Senegal, encompassing approximately 700 observations from September 2021 to November 2023. With only ~569 training samples per country following an 80/20 chronological split, the dataset represents a small-sample regime typical of emerging measurement infrastructures in Sub-Saharan Africa. A controlled comparative analysis is conducted between tree-based ensemble models (Random Forest, XGBoost) and deep recurrent architectures (LSTM, CNN-LSTM) under identical experimental settings. The methodology incorporates 81 engineered temporal features, including multi-day lags, rolling statistics, and seasonal encodings. Results indicate that gradient-boosted ensembles achieve superior performance, with R² reaching 0.9777 and RMSE as low as 7.41 kWh/m² in Nigeria, whereas recurrent architectures remain near baseline performance (R² ≈ 0.05). These findings are interpreted through bias-variance trade-offs and parameter-to-sample scaling, revealing that forecast accuracy follows a geographic gradient (Nigeria > Ghana > Senegal) linked to atmospheric persistence. The evidence demonstrates that model capacity must align with dataset scale; in small-sample regimes, structured feature engineering combined with ensemble trees yields robust generalization, while high-capacity deep learning models exhibit convergence collapse. Such outcomes support the necessity of context-aware model selection for solar energy forecasting in regions where historical observations remain limited.

Keywords: Solar forecasting; data scarcity; ensemble learning; XGBoost; LSTM; West Africa; renewable energy; machine learning

 
 
Top