This study explores a regression-based non-destructive testing framework for estimating electrical fault severity in grid-connected photovoltaic (PV) systems using machine learning. The study is based on an open access dataset generated from a 250 kW PV plant modeled in MATLAB/Simulink, including inverter control, distributed string current sensing, and environmental variability. Unlike conventional diagnostic approaches that formulate PV fault detection as a classification problem, the proposed methodology estimates fault resistance as a continuous severity indicator under string fault, string-to-ground fault, and string-to-string fault conditions. The input space includes non-invasive electrical and environmental descriptors, namely distributed string currents, total plant current, DC voltage, DC power, irradiance, temperature, and engineered range-based features extracted from inter-sensor current deviations. The three supervised regression models implemented and compared are Random Forest, XGBoost, and an Artificial Neural Network. The methodological workflow includes data preprocessing, feature engineering, training and testing partitioning, hyperparameter tuning, and model evaluation using R², RMSE, MAE, and residual trend analysis. The results indicate that the regression-based formulation can capture the relationship between electrical variables and fault-related operating conditions. Among the evaluated models, Random Forest achieved the best predictive performance, obtaining an R² of 0.898 on the test set, which suggests a strong capacity to estimate the proposed continuous fault indicator. XGBoost also showed competitive performance, reaching an R² of 0.835, confirming the suitability of ensemble learning methods for modeling nonlinear patterns in PV fault behavior. These results suggest that regression-based fault scoring can complement conventional fault classification approaches by providing a continuous decision-support indicator for condition monitoring and maintenance prioritization. Although the dataset is simulation-based, the selected variables correspond to measurable quantities commonly available in monitored PV plants, supporting the potential application of the proposed framework to real-world photovoltaic systems after experimental validation.
Previous Article in event
Next Article in event
Regression-based estimation of fault detection in photovoltaic devices using Machine Learning
Published:
26 June 2026
by MDPI
in The 1st International Online Conference on Non-Destructive Testing
session Artificial Intelligence and Machine Learning for NDT
Abstract:
Keywords: Machine learning, solar energy, fault prediction, predictive model.
