Prediction of Tropical Monsoon Hydrology Using Gridded Meteorological Products Over the Cau River Basin in Vietnam

Gridded meteorological products are generated with different spatial/data and methods, and it will be sensitive to different regions for hydrological models. Therefore variables including temperature and precipitation should be evaluated before applying them in studies. To improve knowledge of this matter, the potential of two reanalysis products (RPs) including the China Meteorological Assimilation Driving Datasets for the SWAT model (CMADS) and Climate Forecast System Reanalysis (CFSR) is for the first time compared with the ground-based meteorological data in 5 years from 2008 to 2013 over the Cau River Basin (CRB), Northern of Vietnam. The statistical indicators, and the Soil and Water Assessment Tool (SWAT) model are employed to investigate the hydrological performance of the CRPs against the 17 rain gauses placed across the CRB. The result showed that there is a strong correlation of the temperature reanalysis in both CMADS, CFSR with ground-observed (correlation coefficient-CC is from 0.92 to 0.97). The division indicated clearly when CFSR data overestimated precipitation (about 88%) at both daily and monthly scales, whereas a slight variation of CMADS product was found in the high terrain. The flow simulation results also show that the performance of CMADS-SWAT (with value R2 > 0.75 and NSE > 0.78) is more accurate than CFSR-SWAT on the monthly scale. The assessment of the potential of CRPs especially CMADS will further provide an additional quick alternative for water resource research and management in basins with similar hydro-meteorological conditions.


Introduction
Accurate and complete weather information has always been an important input, which is widely used in hydrological models, flood forecasting, and climate change as well as providing scientific guidance for water resources management [1], [2]. Practically, satellite estimates with wide coverage and high resolution have become potential data sources to supplement the rapid substitution in flow patterns especially in areas where surface observations are limited [3]. Together with The Soil and Water Assessment Tool -SWAT [4], the commonly used climate products from satellite data include Climate Forecast System Reanalysis NECP-CFSR [4] and China Meteorological Assimilation Driving Datasets for the SWAT model -CMADS. The study's results have shown encouraging performances of these products at different catchment scales [2], [8]. According to our latest understanding, the applicability of the two data above has yet been confirmed outside of China.
In this study, for the first time, the performance of CFSR and CMADS reanalysis data products (hereinafter referred collectively to as Climate Reanalysis Products -CRPs) as well as its reliability, were used in hydrological evaluation on a specific river basin in Vietnam. Therefore, the Cau River Basin (CRB) has been selected to carry out the studies with the following steps: (1) Compare and evaluate the efficiency of temperature and precipitation estimates of CFSR, CMADs data using Ground-based meteorological station (GMS) data with statistical indicates at temporal, spatial scales; (2) Analyze the capture of extreme weather events, especially (temperature, precipitation) occurring over the CRB; (3) Use the SWAT model to evaluate the capabilities of these products, which is the input data in hydrological research from 2008 to 2013.

Study area and data
The Cau river basin (21.070N-22.180N; 105.280E-106.080E) originates in the high mountains in the Northwest of Bac Kan province with an area of nearly 6300 km2. The river flows in a north-south direction with a total length of about 1602 km, of which the maintream length is 290 km. The average annual temperature in the CRB ~ 220C but there are 3 months (from December to February) when the average monthly temperature is below 200C. The mean annual precipitation ~ 1680mm with the rainy season (May to October) accounts for more than 80%, the rest is in the dry season. The mountains which are over 1000m high in the Northern, Northeast and Southwest have created climatic changes according to elevation and ridge effect ( Figure 1).

Evaluation indicators
Within the CRB, there are only 4 meteorological stations that have temperature data information (maximum, minimum), while the numbers of grid points of CMADS / CFSR are relatively high, therefore the temperature data authentication was conducted only for climate stations. Meanwhile, the precipitation data are pulled together from 13 stations from 2008-2013 at the catchment scale. For GMS stations, the point to pixel assessment was applied by selected the closest CRPs grid points as references for data validation.
Aim for assessing the quantity of CRPs in collecting temperature and precipitation, the following indicators have been used: (i) Four basic statistical indicators such as Correlation coefficient (CC), Mean absolute error (MAE), Root mean square error (RMSE), and Percentage bias (PBIAS); (ii) Three statistical-categorical indicators to evaluate precipitation events, including Probability of detection (POD), False alarm ratio (FAR) and Critical success index (CSI). Calculation formula, unit, range of values, and their significance synthesized from other studies [2], [7], [8].
To assess the performance of CRPs in capturing extreme weather events, this study selected indices recommended by the Expert Team on Climate change Detection and Indices [9] and Circular regulates techniques and processes for dangerous hydro-meteorological forecasting of the Ministry of Natural Resources and Environment, Vietnam (2016). For precipitation, the values were chosen are the number of days with precipitation ≥ 10mm/50mm/100mm/day per year (R10, R50, R100 mm), and the extreme phenomena of temperature were: the number of cold days (with daily average temperature -Tav≤15 0 C) per year. While the number of days when the strong sun occurs (with the daily maximum temperature range from 37 0 C to 39 0 C -Tmax37-39 0 C) and the number of days scorching sun occurs (with daily maximum temperature ≥39 0 C -Tmax39 0 C) for the 2008-2013 period. Meanwhile, the performance of flow simulations in the CRB was assessed by NSE, R2, and PBIAS statistics with ranking criteria applied for daily and monthly scales [10].

CFSR and CMADS temperature validation using GMS data
Mean, CC, MAE, RMSE, MBE values are chosen to evaluate the accuracy of daily and monthly temperature data. The results from Table 2 show that CMADS as well as CFSR temperature data at all stations have a strong correlation with GMS data. With temperature data from CFSR data, the CC mean value is 0.92, and with CMADS data, it's 0.97. Simultaneously, ME value (Tmin ranging from 1.0 to 2.81 and Tmax ranging from 0.95 to 2.59), RMSE value (Tmax ranging from 1.39 to 2.47 and Tmin from 1.27 to 1.48) indicated that there's only a negligible difference in temperature between CRPs and GMS.

CFSR and CMADS Precipitation validation using GMS data
3.2.1. At the temporal scales Table 3 describes the results of continuous statistical indicators of precipitation on daily, monthly and seasonal time scales at the CRB. At the daily scale, CMADS data tends to underestimate precipitation, with PBIAS is -16.64% while CFSR data overestimate, with PBIAS is 99.2%. Therefore, MAE values are also quite different with 8.01mm / day for CFSR and 5.7mm / day for CMADS. As expected, this trend is also expressed on the monthly scale when the MAE and RMSE indices of CFSR data are larger than CMADS. The errors on the daily scale are canceled out due to aggregation so higher accuracy can be seen more at monthly scale (the CC value of CFSR, CMADS is 0.82, 0.84, respectively), but it still does not affect the major differences in its rating trends. Overall, CMADS data is more accurate than CFSR data and has a relatively good alignment with observed precipitation at monthly steps.
The PBIAS value distribution indicates that CFSR data has the worst performance during the rainy season (from May to October). CMADS rainfall distribution between dry and rainy seasons was 16% and 84%, while these values for the CMADS data were 11% and 89%, respectively (2008-2013 period). CMADS precipitation in the dry season is underestimated compared to the gauge observation, with the PBIAS value is -40.9% (Table 3). This is possibly related to the reduced efficiency of CMADS due to the CMORPH satellite data having rain detection errors below 4mm [6].

Accuracy of rainfall events detection
Values of 0.1mm / day were selected as the rainfall detection threshold [7], and POD, FAR, CSI indices were used to evaluate the ability of CRPs to detect precipitation. The POD average value with CFSR data is 0.98, indicates that it tends to capture all daily rain events. Concurrently, the FAR average value is 0.72 (vary from 0.56 to 0.74) indicates that only nearly 30% of the rain events forecasted from CFSR data are accurate. On the contrary, CMADS data shows the harmony in the forecasting, with POD and FAR of 0.6 and 0.2, respectively; consistent with rain forecasting success CSI of 43%. Overall, CMADS precipitation data is more accurate in estimating rainfall events while the CFSR data excels at its ability to detect rain but should still be validated with rain gauge data.

Ability evaluation of capturing extreme weather events
Statistics show that from 2008 to 2013, CFSR data had a total number of 252 hot days (with Tmax≥370C), larger than CMADS (118 days), and GMS (117 days). This difference relates to excessive misunderstandings about the buffer surface between datasets. For example, at Dinh Hoa station, the number of hot days is very high (32 days Tmax37-390C and 35 days Tmax390C), possibly due to the widely captured data of CFSR for low mountainous areas lying on the sheltered slopes in summer, meanwhile Bac Ninh station is located in the plain area with many industrial-construction activities thus the estimate of maximum temperature was incorrect due to the buffer surface.
With the cold days (Tav≤150C), CMADS is compared parallel with GMS because CFSR data is not available for this information. Statistics show that both of these datasets have similar changes with the number of cold days decreasing in high latitudes, mountainous terrain (Bac Kan, Dinh Hoa) to low latitudes, flat terrain (Thai Nguyen, Bac Ninh). The number of cold days is relatively high in both CMADS and GMS with 17.2 and 19.3 days/year, respectively, showing a strong influence of winter monsoons on CRB. The appearance of cold airwaves not only lower the area's heat base (from December to February, the temperature drops below 200C) but also very little rainfall during this period. It is clear that the assessment of extreme weather events at time scales is closely related to the precipitation, which directly affects the flow distribution in the CRB. Visual inspection at temporal scale distributions reveals that CFSR data had superior hot days (Tmax≥37 0 C) compared to CMADS and CFSR data ( Figure 2). Notably, an unusual increase in the number of hot days was found in all three datasets in 2010 (except GMS data at Tmax39 0 C). According to information collected, in May and June 2010, the North of Vietnam experienced the longest heatwave in 27 years. These findings suggest that while there is a similar error at maximum temperature events, but when calibrated with ground observed temperature, the CRPs can provide an additional viable alternative to predicting and capturing extreme events on the temporal and spatial scale.
The assessment results of the ability to collect extreme heavy rainfall are calculated by the average value of the corresponding grid-points/station during the 2008-2013 period. From the CFSR precipitation, there are 609 R10mm days, captured more R10mm days than GMS (265 days), and CMADS data (224 days) at the coincident station. Whereas at the R50mm event, GMS and CFSR precipitation both captured the value of 49 days compared to 34 days for CMADS data. The R100mm value for GMS precipitation is 9 days, while for CFSR and CMADS, the values are 5 days, which indicates that CRPs tend to underestimate the results at heavy rain threshold (50mm, 100mm). Although the frequency of occurrence is low (2.24%), the heavy rain class (> 50mm) contribute up to 37% of the total annual rainfall as it is related to the typical summer rain as well as the effects of tropical storms on this area.

Evaluate the Performance of Hydrological processes in the Cau river basin
The statistical indices (R2, NSE and PBIAS) are summarized in Table 4 for the SWAT simulations driven by GMS, CFSR and CMADS data in the period of 2009-2013. Overall, the SWAT model based on GMS data was the most suitable in both daily and monthly scales during the calibration and validation period. The simulated streamflow reproduced by GMS data at Gia Bay station is "good", with NSE > 0.67 and R2 > 0.77. The simulation using the CMADS-driven model tends to appreciate the observed flow with the PBIAS value varying from 17.35 to 18.95% however with R2> 0.84 and NSE> 0.73 also identified as "satisfactory" at monthly scale. Finally, the CFSR data resulted in a relatively high overestimation of observed streamflow throughout the simulation period (as indicated by the high PBIAS value of -38%) and tends to capture the peaks streamflow ( Figure 3). Generally, the CFSR-driven model is not suitable for flow simulation on the CRB basin with the R2 and NSE values as "unsatisfactory" based on the given criteria [10]. These results are consistent with some studies in the Asian monsoon region where CFSR data are not really suitable for flow simulation [10], [11]. It is really difficult for estimated products like CFSR to accurately capture climatic conditions in areas with very complex climates like northern Vietnam (tropical monsoon climate with cold winters). Furthermore, differences in catchment area and topography (including the elevation and direction of the ridge) will also lead to changes in the algorithm, interpolation and model parameters. The CMADS data set is integrated with the CMROPH data and is collected from the automatic measuring stations in the region for reverse interpolation, so it can be widely used and increase accuracy in Chinese territory. Compared to published studies, we find that the performance of this data needs to be validated in areas within the coverage. Generally, the analytical results show that the CMADS-driven model will have a good performance if the input data was validated with the gauge observation.

Conclusions
The usefulness and suitability of the climate reanalysis products have been evaluated in this study. The CMADS and CFSR temperature datasets both performed well in comparison to GMS and promise rapid replacement in areas with a low number of observation stations. Verification of rainfall of CRPs as well as flow simulation results of the SWAT model on CRB shows that CMADS data has more suitable results; meanwhile, it is recommended that the overall CFSR data be evaluated before application in hydrological research where the conditions are similar. The advantages and disadvantages of CFSR, CMADS, and GMS data suggest that local knowledge/information is also very useful in hydro-meteorological research to avoid excessive misunderstandings of gridded climate products.