A Fast Multivariate Symmetrical Uncertainty based heuristic for high dimensional feature selection

Miguel Garcia Torres; Federico Divina; Francisco A. Gómez Vela; José Luis Vázquez Noguera

doi:10.3390/Entropy2021-09864

Previous Article in event

Stability under limited control in weakly dissipation cyclic heat engines

Previous Article in session

Cross recurrence quantification analysis as a tool for detecting rotors in atrial fibrillation: an in silico study

Previous Article in panel

Cross recurrence quantification analysis as a tool for detecting rotors in atrial fibrillation: an in silico study

Next Article in event

ON SUPERSTAR-GENERALIZED STATISTICAL REGRESSION

Next Article in session

Application of Rényi entropy-based 3D electromagnetic centroids to segmentation of fluorescing objects in tissue sections

Next Article in panel

Application of Rényi entropy-based 3D electromagnetic centroids to segmentation of fluorescing objects in tissue sections

A Fast Multivariate Symmetrical Uncertainty based heuristic for high dimensional feature selection

Miguel Garcia Torres

^{*

1},

Federico Divina

¹,

Francisco A. Gómez Vela

¹,

José Luis Vázquez Noguera

^{2, 3}

¹ Universidad Pablo de Olavide
² Universidad Nacional de ASunción
³ Universidad Americana

Published: 05 May 2021 by MDPI in Entropy 2021: The Scientific Tool of the 21st Century session Entropy in Multidisciplinary Applications

https://doi.org/10.3390/Entropy2021-09864

Abstract:

In classification tasks the increase in the number of dimensions of a data makes the learning process harder. In this context feature selection usually allows to induce simpler classifier models while keeping the accuracy. However, some factors, such as the presence of irrelevant and redundant features, make the feature selection process challenging. Symmetrical Uncertainty (SU) is an entropy-based measure widely used to identify subsets of useful features for the classification task. However, SU is a bivariate measure and, so, it ignores possible dependencies among more than two features. In order to overcome this issue, SU has been extended to the multivariate case. This extension, called Multivariate Symmetrical Uncertainty (MSU), is time-consuming and may become impracticable when evaluating larger subsets of features during the search. In this work we propose a MSU based Feature Selection (MSUFS) heuristic to address feature selection on high-dimensional data. In order to design MSUFS, the concept of Approximate Markov Blanket is redefined to take into account the MSU measure. The performance of MSUFS is tested on high-dimensional datasets from different domains and its results where compared with popular and competitive techniques. Results show that MSUFS is capable of identifying possible correlations and interaction among features and, therefore, it achieves competitive results. Finally, the proposed strategy is also applied to a case study regarding melanoma skin cancer.

Acknowledgments: This work is partially supported by the research project PINV15-0257 from CONACyT-Paraguay. Authors are also thankful to the Andalusian Scientific Computer Science Centre (CICA) for allowing us to use their computing infrastructures.

Keywords: Feadure selection; high dimensionality; classification

View paper View Poster View video presentation

104 Reads

Miguel Garcia Torres

Federico Divina

Francisco A. Gómez Vela

José Luis Vázquez Noguera