Prediction models are gaining importance in many areas such as medicine, meteorology, finance, toxicology, etc. In this context, a common distribution for the response variable is the binomial distribution and hence the logistic regression model is a commonly used regression modelling approach. Although it is not recommended from a statistical points of view due to loss of information and power, the categorisation of continuous variables is a common practice in the development of prediction models. However, there are no unified criteria for the selection of the cut points in the categorisation process. In order to provide valid cut points whenever a categorisation is going to be performed, we have developed a valid methodology to categorise continuous variables in a logistic regression model based on the maximisation of the AUC. This methodology has been implemented in an R package called CatPredi . This is a package of R functions that allows the user to categorise a continuous predictor variable in a univariate or multiple logistic regression model. It provides the optimal location of cut points for a chosen number of cut points, fits the prediction model with the categorised predictor variable and returns the estimated and bias-corrected discriminative ability index for this model. Additionally, it allows a comparison of two categorisation proposals for different number of cut points and the selection of the optimal number of cut points.
Previous Article in event
The symmetry-adapted configurational ensemble approach to the computer simulation of site-disordered solidsPrevious Article in congress
Next Article in event Next Article in congress
Categorisation of continuous variables in a logistic regression model using the R package CatPredi
Published: 04 December 2015 by MDPI in MOL2NET'15, Conference on Molecular, Biomed., Comput. & Network Science and Engineering, 1st ed. congress USEDAT-01: USA-Europe Data Analysis Training Congress, Cambridge, UK-Bilbao, Spain-Miami, USA, 2015
Keywords: categorisation, R package, prediction model