While deep learning architectures dominate contemporary solar forecasting literature, their effectiveness under limited data regimes remains insufficiently examined. This study investigates model capacity alignment under data scarcity using daily irradiance datasets from Nigeria, Ghana, and Senegal, encompassing approximately 700 observations from September 2021 to November 2023. With only ~569 training samples per country following an 80/20 chronological split, the dataset represents a small-sample regime typical of emerging measurement infrastructures in Sub-Saharan Africa. A controlled comparative analysis is conducted between tree-based ensemble models (Random Forest, XGBoost) and deep recurrent architectures (LSTM, CNN-LSTM) under identical experimental settings. The methodology incorporates 81 engineered temporal features, including multi-day lags, rolling statistics, and seasonal encodings. Results indicate that gradient-boosted ensembles achieve superior performance, with R² reaching 0.9777 and RMSE as low as 7.41 kWh/m² in Nigeria, whereas recurrent architectures remain near baseline performance (R² ≈ 0.05). These findings are interpreted through bias-variance trade-offs and parameter-to-sample scaling, revealing that forecast accuracy follows a geographic gradient (Nigeria > Ghana > Senegal) linked to atmospheric persistence. The evidence demonstrates that model capacity must align with dataset scale; in small-sample regimes, structured feature engineering combined with ensemble trees yields robust generalization, while high-capacity deep learning models exhibit convergence collapse. Such outcomes support the necessity of context-aware model selection for solar energy forecasting in regions where historical observations remain limited.
Previous Article in event
Next Article in event
Model Capacity Alignment under Data Scarcity: Tree-Based Ensembles versus Deep Recurrent Networks for Daily Solar Radiation Forecasting in West Africa
Published:
22 June 2026
by MDPI
in The 1st International Online Conference on Inventions
session Energy system analysis and modelling
Abstract:
Keywords: Solar forecasting; data scarcity; ensemble learning; XGBoost; LSTM; West Africa; renewable energy; machine learning