Please login first
Unsupervised Timely Detection of Leaks in Water Network DMAs using a Robust Regression Ensemble Method
* 1, 2 , * 1, 2 , 2 , 1 , 3 , 1 , 2 , 1
1  Pattern Recognition Lab, Technical Faculty, FAU Erlangen-Nuremberg, Martensstr. 3, Erlangen 91058, Germany
2  SBU Analytics and Services, Diehl Metering GmbH, Donaustr. 120, Nuremberg 90451, Germany
3  Department of Data Science (DDS), Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 91054 Erlangen, Germany
Academic Editor: Abbas Roozbahani

Abstract:

Leakage in water distribution networks (WDNs) threatens water conservation and supply reliability, making timely leak detection essential for effective localization and remediation. However, limited labeled leak events associated with DMA-level measurements hinder the use of supervised learning methods. Unsupervised anomaly detection methods based on a single machine learning model and single anomaly scores, while label-independent, often suffer from imbalanced trade-offs between sensitivity and specificity, resulting in false alarms or delayed detections. This study proposes a robust approach based on an ensemble of regression models, for detecting newly emerging leakages at the DMA level, even under conditions where background leakage is present in the broader network. We estimate DMA-wise water supply patterns from net consumption using an ensemble of regression models—Random Forest, Support Vector Regression, XGBoost, and Multi-Layer Perceptron—trained on one year of hourly smart meter data that was preprocessed to correct for newly emerging leaks, while preserving the effects of steady background leakage. Discrepancies between predicted and actual supply are evaluated using Pearson’s correlation, Z-score, and Kendall–Tau, combined via majority voting to produce a regression model-corresponding leak decision, which is then aggregated using weights proportional to each regression model’s prediction accuracy. A leakage event is confirmed when this confidence score exceeds a threshold, optimized by varying the threshold within a predefined range and selecting the value that maximizes classification accuracy based on the area under the ROC curve (AUC). The proposed method detects leaks within 8–12 hours of onset. Simulated leak scenarios of varying severity achieved 90% accuracy, while validation on historical leaks from a Danish utility reached 98%. Compared to an Isolation Forest baseline, the method improves accuracy by 31% on simulated and 39% on real-world data. These results highlight the potential of smart meter-driven ensemble analytics for rapid and reliable leak detection, supporting global water sustainability.

Keywords: Leak Detection; Anomaly Detection; Unsupervised; Machine Learning; Ensemble; Real time detection; Urban water management; Water Distribution Networks; DMA; Non-revenue Water loss

 
 
Top