A proposed robust approach for calculating the Standardized Evapotranspiration Deficit Index ( SEDI ) at the global scale

This study proposes a new methodology for calculating the Standardized Evapotranspiration Deficit Index (SEDI) at the global scale based on the log-logistic distribution to fit the evaporation deficit (ED). The SEDI has been proposed recently to quantify drought severity based on the difference between actual evapotranspiration (ET) and the atmospheric evaporative demand (AED). Our findings demonstrate that, regardless of the AED dataset used for calculations, a log-logistic distribution is needed in order to fit the ED time series.


Introduction
Different studies have demonstrated the importance of the atmospheric evaporative demand (AED) in triggering drought or intensifying drought severity [1,2].For these reasons, several drought indices use the AED in their formulations.In fact, it has been suggested that the AED may be the single most useful variable to quantify drought severity [3].Accordingly, drought indices based only on AED have been recently formulated under the premise that AED anomalies are strongly connected with precipitation, soil moisture and actual evapotranspiration (ET) anomalies [4,5].Kim and Rhee [6] proposed the Standardized Evapotranspiration Deficit Index (SEDI) using ET data estimated based on the Bouchet hypothesis.Here, we follow the same nomenclature to refer to a standardized drought index based on the ED.Overall, the overriding objective of this study is to find a robust probability distribution to fit the ED series worldwide to calculate the SEDI

Data
We used the actual evapotranspiration (ET) estimates from the Global Land Evaporation Amsterdam Model (GLEAM) v3a [7].GLEAM datasets are openly available at global scale, daily temporal resolution, 0.25º spatial resolution and for the period 1980-2016 (https://www.gleam.eu).Here, we resampled the data to monthly, 0.5º resolution.AED was also obtained from GLEAM v3a, which uses the formulation by Priestley and Taylor, as potential evapotranspiration (Ep) estimate, which is primarily sensitive to incoming radiation and air temperature.Here, these Ep estimates are used as a proxy of the AED.

Methods
Recall that we define the evapotranspiration deficit (ED) as ET -AED.We tested eight probability distributions (General Extreme Value, Log-logistic, Log-normal, Pearson III, Generalized Pareto, Weibull, Normal, and Exponential) to transform ED values to a standardized normal variable (SEDI).The parameters of the different distributions were calculated using the method of unbiased Probability Weighted Moments (UB-PWMs), following Hosking [8].Calculations were made independently for each ED monthly series to take into account the strong seasonality of ED in the majority of the world climates.Once the monthly ED series were fit to a probability distribution, cumulative probabilities of the ED values were obtained and transformed to standardized units (SEDI).To cope with zero values, we followed the approach proposed by Stagge et al. [9] to calculate the SPI.This is based on the 'centre of mass' of the zero distribution rather than the maximum probability.
To assess the performance and robustness of the eight probability distributions used for the calculation of the SEDI, we firstly calculated the percentage of monthly ED series that cannot be fitted by each of them, and distributions with high percentages were discarded.With the remaining distributions, we tested the normality of the resulting SEDI series at the global scale with the Shapiro-Wilks (SW) test.A rejection rate of p < 0.05 (corresponding to 95% confidence level) was used to discriminate the SEDI series that follow a normal standard variable.We also analyzed the frequencies of high and low SEDI values obtained by the different probability distributions and compared the associated return periods.

Results
Table 1 shows the percentage of monthly series for which the SEDI could not be calculated for each of the eight probability distributions used for standardization.The log-normal and Weibull distributions showed a markedly high percentage of series with no solution for the SEDI suggesting that they are least suited for SEDI calculation, so they were removed from further analyses.The remaining six distributions showed smaller percentages of cases for which no solution could be found, with Normal and Exponential being slightly better.The Shapiro-Wilks normality test applied to the SEDI series computed using the six remaining distributions indicated a poor performance of the Generalized Pareto, Normal and Exponential distributions, which had large percentages of monthly series for which the null hypothesis of normality was rejected (Table 2).The remaining three distributions had a lower percentage of rejections, with the log-Logistic distribution having the lowest overall.We applied an additional analysis to assess the goodness of the log-logistic distribution to calculate SEDI in comparison to the generalized extreme value distribution (GEV) and Pearson-III distributions. Figure 1 shows the relationship between the return periods and raw SEDI values obtained using log-logistic and GEV distributions (a); and log-logistic and Pearson-III distributions (b).The SEDI values obtained with GEV and Pearson-III distributions show more extreme values in both the lower and higher tails than those obtained with the log-logistic.This translates to higher return periods and more extreme SEDI values with GEV and Pearson-III distributions in comparison to the log-logistic.The frequencies of high and low SEDI events using the GEV are unrealistically high using a sample of 35 cases.Figure 2 shows the frequency of values below -2.58 sigmas (which corresponds to a return period of 1 in 200 years) in each time series.As expected, the majority of series do not show values below the threshold, but lower percentages dominate for the log-logistic distribution.The SEDI series obtained with GEV and Pearson III distributions show higher percentage of very extreme values.Given the short sample used here , it is unlikely to find such a high frequency of SEDI cases corresponding to a return period higher than 200 years.GEV, Pearson-III and log-logistic distributions provided solutions for the SEDI over most of the world, and provided SEDI series that most frequently followed a standard normal distribution.In this study we found that the Pearson-III distribution yielded a higher number of SEDI series that did not follow a normal distribution compared to the log-logistic distribution.Moreover, the Pearson-III distribution tended to overestimate the frequency of extreme SEDI values recorded at the upper and lower tails of the distribution.Based on our results we recommend the use of the log-logistic distribution to fit monthly ED series at the global scale and obtain the SEDI.

Figure 1 .Figure 2 .
Figure 1.Global relationship between SEDI and (and return period -1 event in number of cases-) obtained from GEV (a) and Pearson-III (b) distributions and log-logistic distribution.Colors represent the density of points (dark red being the highest)

Table 1 .
Percentage of monthly time series of ED with no fitting solution using different probability distributions.

Table 2 .
Percentage of monthly SEDI series calculated using the different probability distributions for which the null hypothesis of normality was rejected by the Shapiro-Wilks test at a confidence level p = 0.05.