Distribution of measured data is important in applied statistic to conduct a appropriate statistical analysis. Different statistics are use to assess a general null hypothesis (H0): data follow a specific distribution. The Shannon’s entropy (H1) is introduced as statistic and its evaluation was conducted compared with Anderson-Darling (AD), Kolmogorov-Smirnov (KS), Cramér-von Mises (CM), Kuiper V (KV), and Watson U2 (WU) statistics.
A contingency containing four continuous distributions (error function, generalized extreme value, normal, and lognormal), six statistics (including Shannon’s entropy as statistic), and fifty datasets with sample sizes from 14 to 1714 of active chemical compounds was constructed. Fisher's combined probability test was applied to obtain the overall p-value from different tests bearing upon the same null hypothesis for each data set. Two scenarios were analyzed, one without (Scenario 1: AD & KS & CM & KV & WU) and one with (Scenario 2: AD & KS & CM & KV & WU & H1) inclusion of Shannon’s entropy as statistic.
One hundred and sixty-eight rows of cases were valid and included in the analysis. The number of H0 rejections of varied from 0 to 14:
Distribution AD KS CM KV WU H1 Scenario 1 Scenario 2
Err 1 3 2 10 8 0 10 10
Gen 2 1 0 9 7 0 9 9
Lognormal 0 3 0 14 12 1 16 14
Normal 1 5 1 12 11 0 12 12
The Shannon’s entropy (H1) was the statistic with smallest number of rejections. The overall combine test showed identical results in assessment of Error, Generalized Extreme value and Normals distribution when inclusion (Scenario 2) or not (Scenario 1) of Shannon’s statistic led to the same results. In the case of lognormal distribution, inclusion of Shannon’s statistic decreases the number of rejections from 16 to 14.