Please login first
Classifying dengue cases using CatPCA in combination with the MSU correlation
* 1 , 2 , 1 , 1 , 1
1  Universidad Nacional de Asunción, Paraguay
2  Universidad Pablo de Olavide, Spain

Abstract:

Dengue is a mosquito-borne viral infection that is a leading cause of serious illness and death among children and adults in many countries across the world. In Paraguay, dengue incidence has been increasing especially in urban areas, becoming endemic and epidemic in the last few years.

This work seeks to understand what factors are responsible for the epidemic and hemorrhagic varieties of dengue. Considering that collected data are of mixed nature (nominal and continuous), Categorical Principal Components Analysis (CatPCA) is adopted as a first tool. However, interpretation of CatPCA output can be challenging, partly because the same variable may appear throughout several of the principal components.

Multivariate Symmetrical Uncertainty (MSU), an entropy-based similarity measure, allows quantifying correlations in a multivariate environment and detecting both linear and nonlinear associations. In this work, the MSU measure is used in combination with CatPCA to obtain greater insight regarding the relevance of each variable.

We apply the two techniques combined in stages, using nation-wide data collected by the country's Sanitary Surveillance Department from nearly 200,000 suspected and confirmed cases throughout 5 years. The first few runs of CatPCA help to discard the less relevant attributes. A subsequent run of CatPCA provides principal components that account for a high percentage of the total variance. Working with the attribute sets identified by CatPCA, MSU finds $n$-way interactions and correlations, and groups those attributes for further selection. Segregation of attributes in disjoint groups can be done at this stage; this allows for an easier interpretation of groupings including those containing the key linear and nonlinear correlations.

The outcomes from this combined approach are better than the CatPCA alone, identifying individual and grouped variables that contribute to the behavior of the class.

Keywords: multivariate correlation; multivariate sample size; entropic measures; principal components analysis; dengue fever
Top