Comparison of Statistical and Machine Learning models for pipe failure modeling in Water Distribution Networks (WDN)

Mónica Giraldo; Juan Rodríguez

doi:10.3390/ECWS-4-06441

Abstract:

Pipe failures in Water Distribution Networks (WDN) may cause economic, environmental and social costs. The application of statistical and Machine Learning (ML) models play a critical role in planning and decision support processes for WDN management. Failure models can provide valuable information for prioritizing the system rehabilitation even in data scarcity scenarios (such as developing countries). This study compares several statistical and ML pipe failure models thus providing useful information to practitioners to select a suitable model according to their needs.

Three statistical models (i.e. Linear, Poisson and Evolutionary Polynomial Regressions) were used for pipe failures prediction based on diameter, age of pipes and length as explanatory variables. The K-means clustering approach was applied to improve the performance of the statistical models. The performance indicators used were the coefficient of determination (R²) and the root mean square error (RMSE). ML approaches - namely Gradient Boosted Tree (GBT), Bayes, Support Vector Machine and Artificial Neuronal Networks (ANNs) - were compared in predicting individual pipe failure rates. The pipe’s attributes, environmental and operational variables were included as input variables. Their performance was evaluated using confusion matrices and receiver operating characteristic curves. The proposed approach was applied to a WDN in Bogotá (Colombia).

The results showed that the cluster-based prediction model reduces the prediction error of pipe failures. All the models demonstrated acceptable results in terms of their performance (R² between 0.695-0.927 and RMSE between 45-22 for the test sample). Regarding ML models, all methods but the ANNs show acceptable performance. The GBT approach has the best performing classifier (79.41% correct predictions in the test sample). This model was used to calculate the failure rate of individual pipes for rehabilitation planning. Furthermore, a sensitivity analysis of the GBT model to the input variables was performed to provide information on its generalization capability.