Please login first
Shannon’s entropy usage as statistic
* 1, 2 , * 3
1  Technical University of Cluj-Napoca, Department of Physics and Chemistry, Muncii Bvd. 103-105, 400641 Cluj-Napoca, Romania
2  Babeş-Bolyai University, Institute for Doctoral Studies, Kogălniceanu Street no. 1, 400084 Cluj-Napoca, Romania
3  Iuliu Haţieganu University of Medicine and Pharmacy, Department of Medical Informatics and Biostatistics, Louis Pasteur Street no. 6, 400349 Cluj-Napoca, Romania

Abstract:

Distribution of measured data is important in applied statistic to conduct a appropriate statistical analysis. Different statistics are use to assess a general null hypothesis (H0): data follow a specific distribution. The Shannon’s entropy (H1) is introduced as statistic and its evaluation was conducted compared with Anderson-Darling (AD), Kolmogorov-Smirnov (KS), Cramér-von Mises (CM), Kuiper V (KV), and Watson U2 (WU) statistics.

A contingency containing four continuous distributions (error function, generalized extreme value, normal, and lognormal), six statistics (including Shannon’s entropy as statistic), and fifty datasets with sample sizes from 14 to 1714 of active chemical compounds was constructed. Fisher's combined probability test was applied to obtain the overall p-value from different tests bearing upon the same null hypothesis for each data set. Two scenarios were analyzed, one without (Scenario 1: AD & KS & CM & KV & WU) and one with (Scenario 2: AD & KS & CM & KV & WU & H1) inclusion of Shannon’s entropy as statistic.

One hundred and sixty-eight rows of cases were valid and included in the analysis. The number of H0 rejections of varied from 0 to 14:

Distribution  AD  KS  CM  KV  WU  H1  Scenario 1  Scenario 2

Err                   1      3      2      10     8    0                 10               10

Gen                  2      1      0       9     7    0                  9                 9

Lognormal        0      3      0     14    12    1                 16              14

Normal             1     5       1     12    11    0                 12              12

The Shannon’s entropy (H1) was the statistic with smallest number of rejections. The overall combine test showed identical results in assessment of Error, Generalized Extreme value and Normals distribution when inclusion (Scenario 2) or not (Scenario 1) of Shannon’s statistic led to the same results. In the case of lognormal distribution, inclusion of Shannon’s statistic decreases the number of rejections from 16 to 14.

Keywords: distribution; Shannon’s entropy; statistic
Top