Machine learning models have shown promising results for streamflow prediction. However, they are commonly difficult to interpret because they are not based on physical principles but on the relationship between inputs and outputs. Consequently, this work proposes to evaluate two main aspects: First, the advantages and drawbacks of the methods for assessing the influence of the variables inside a machine learning model for short-term streamflow prediction. Second, whether the influence of variables has an acceptable physical interpretation. In this sense, a Random Forest model was trained with the Upper Ter Catchment (Spain) data. The model employs different accumulated precipitation and streamflow variables. The methodology evaluates the influence of the variables globally, by intervals, and in specific predictions. The results show that the mean decrease in accuracy, the mean decrease in node impurity and the mean SHAP (Shapley Additive Explanations) confer the main influence to a similar group of variables. Nonetheless, the Tornado method shows discrepancies. The partial dependence analysis results suggest that this method cannot accurately portray the influence of the variables in poorly represented ranges. In contrast, the ALE (Accumulated Local Effects) results acceptably capture the influence of the variables for the whole data spectrum. The Shapley and SHAP values indicate that the accumulated precipitation in less than six hours acquires the highest importance during the rising limb of the hydrograph, which is consistent with the catchment system. During the falling limb, the streamflow values closer to the output horizon are the most influential, suggesting that the shape of the limb is guided by them. Machine learning models for streamflow prediction can obtain physically acceptable relations between variables and outputs, which may lead to a further description of the catchment system. However, the methods used to represent the influence of the variables must be selected carefully to avoid misleading interpretations.
Previous Article in event
Next Article in event
Interpretation of variables in a machine learning model for short-term streamflow prediction
Published:
11 October 2024
by MDPI
in The 8th International Electronic Conference on Water Sciences
session Numerical and Experimental Methods, Data Analyses, Digital Twin, IoT Machine Learning and AI in Water Sciences
Abstract:
Keywords: Influence of the variables; model interpretation; machine learning; streamflow prediction