Shannon&rsquo;s entropy usage as statistic

Lorentz Jäntschi; Sorana Bolboacă

doi:10.3390/ecea-2-B001

Previous Article in event

Can we describe the evolution of the cosmic event horizon with the maximum entropy production principle?

Next Article in event

Local effect of asymmetry deviations from Gaussianity using information-based measures

Shannon’s entropy usage as statistic

Lorentz Jäntschi

^{*

1, 2},

Sorana D. Bolboacă

^{*

3}

¹ Technical University of Cluj-Napoca, Department of Physics and Chemistry, Muncii Bvd. 103-105, 400641 Cluj-Napoca, Romania
² Babeş-Bolyai University, Institute for Doctoral Studies, Kogălniceanu Street no. 1, 400084 Cluj-Napoca, Romania
³ Iuliu Haţieganu University of Medicine and Pharmacy, Department of Medical Informatics and Biostatistics, Louis Pasteur Street no. 6, 400349 Cluj-Napoca, Romania

Published: 13 November 2015 by MDPI in 2nd International Electronic Conference on Entropy and Its Applications session Information Theory

https://doi.org/10.3390/ecea-2-B001

Abstract:

Distribution of measured data is important in applied statistic to conduct a appropriate statistical analysis. Different statistics are use to assess a general null hypothesis (H₀): data follow a specific distribution. The Shannon’s entropy (H1) is introduced as statistic and its evaluation was conducted compared with Anderson-Darling (AD), Kolmogorov-Smirnov (KS), Cramér-von Mises (CM), Kuiper V (KV), and Watson U² (WU) statistics.

A contingency containing four continuous distributions (error function, generalized extreme value, normal, and lognormal), six statistics (including Shannon’s entropy as statistic), and fifty datasets with sample sizes from 14 to 1714 of active chemical compounds was constructed. Fisher's combined probability test was applied to obtain the overall p-value from different tests bearing upon the same null hypothesis for each data set. Two scenarios were analyzed, one without (Scenario 1: AD & KS & CM & KV & WU) and one with (Scenario 2: AD & KS & CM & KV & WU & H1) inclusion of Shannon’s entropy as statistic.

One hundred and sixty-eight rows of cases were valid and included in the analysis. The number of H₀ rejections of varied from 0 to 14:

Distribution AD KS CM KV WU H1 Scenario 1 Scenario 2

Err 1 3 2 10 8 0 10 10

Gen 2 1 0 9 7 0 9 9

Lognormal 0 3 0 14 12 1 16 14

Normal 1 5 1 12 11 0 12 12

The Shannon’s entropy (H1) was the statistic with smallest number of rejections. The overall combine test showed identical results in assessment of Error, Generalized Extreme value and Normals distribution when inclusion (Scenario 2) or not (Scenario 1) of Shannon’s statistic led to the same results. In the case of lognormal distribution, inclusion of Shannon’s statistic decreases the number of rejections from 16 to 14.

Keywords: distribution; Shannon’s entropy; statistic

View paper View Poster

82 Reads

Lorentz Jäntschi

Sorana Bolboacă