Shannon entropy (H) and Mutual Information (MI) are practically estimated using first-order statistics as it is easier and convenient. However, the estimated first-order H and MI values could be grossly incorrect as we shall demonstrate with 3 carefully designed short linearly independent sequences X1, X2 and X3. These are carefully constructed to take values from the set {0, +1, -1} such that the time stamps of zeroes exactly coincide. Being linearly independent, the pairwise correlation coefficients are zeros. X2 and X3 are cyclic permutations of each other (X1 is unrelated other than zero values being at identical locations). The estimated pair-wise first-order MI values turn out to be the same for all the 3 pairs, thus showing the inability of MI to capture the additional mutual dependence between X2 and X3. After puncturing the zero-values from all the sequences (along with the time stamps) the first-order MIs turn out to be zeros for all pairs - wrongly implying that the sequences are independent, whereas X2 and X3 continue to be cyclic permutations of each other (and hence completely dependent). Compression-complexity measures such as the Effort-To-Compress complexity measure can correctly capture the non-linear dependencies in this case. First-order estimation of H and MI is thus fraught with danger in practical applications, especially on short data lengths and in such situations, it is preferable to employ compression-complexity measures.
Previous Article in event
Next Article in event
Problems with first-order infotheoretic measures on short sequences
Published:
16 December 2021
by MDPI
in The 1st International Electronic Conference on Information
session Information Systems and Applications
https://doi.org/10.3390/IECI2021-12073
(registering DOI)
Abstract:
Keywords: shannon entropy; mutual information; short sequences; compression-complexity; effort-to-compress