With the increased population in urban areas worldwide, the security of water supply is gaining in importance. Water scarcity accelerated by climate change poses additional stress to water supply infrastructures. Water consumption data transmitted by smart water meters form the foundation of advanced data analysis, such as water end-use classification, with which the resilience of water supply can be improved. Especially with large amounts of high-resolution data, the accurate categorization of data from smart water meters into different end-uses such as toilets, showers or dishwashers is challenging and cannot be performed by humans. To this end, machine-learning (ML) approaches provide several benefits, such as real-time capability, scalability and generalizability.
State-of-the-art methods to identify residential water end-uses include both unsupervised methods and supervised approaches. However, a comprehensive comparison of unsupervised and supervised techniques is still missing. In this study, we are aiming at a quantitative evaluation of various ML techniques for water end-use classification. Furthermore, we focus on deriving general implications on the setting and conduction of ML-based experiments for water end-use classification. For these purposes, a stochastic water consumption simulation tool with high capability to model the real-world water consumption pattern is applied to generate residential data. Subsequently, unsupervised clustering methods, such as dynamic time warping, k-means, DBSCAN, OPTICS and Hough transform, are compared to supervised methods based on SVM.
The quantitative results demonstrate that supervised approaches are capable to classify common residential end-uses (toilet, shower, faucet, dishwasher, washing machine, bathtub and mixed water-uses) with accuracies up to 0.99, whereas unsupervised methods fail to detect those consumption categories. The major implications drawn from the quantitative results are two-fold: clustering is not suitable to separate end-use categories. Hence, accurate labels are essential for the end-use classification of water events, where crowdsourcing and citizen science approaches pose feasible solutions for this purpose.