Please login first
Model selection based on the principle of parsimony - Is the principle of parsimony the key for selection of the design space model or the key to Pandora\'s box?
* 1 , 2 , 1
1  Department of Pharmaceutics and Analytical Chemistry, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
2  Quality and Technology, Department of Food Science, Faculty of LIFE Sciences, University of Copenhagen, Denmark

Abstract: A typical approach to establish a Quality by Design (QbD) design space is to base it on data collected by a Design of Experiment (DoE) followed by ANOVA model building. Though the ANOVA model is useful in describing mathematically how critical quality parameters and quality attributes are linked, often the model contains higher order interaction terms, making interpretation and understanding difficult. On the other hand, the GEneralized Multiplicative ANOVA (GEMANOVA) model assumes the presence of higher order interaction term from the beginning, and typically yields a model that is intuitively easier to understand compared to the additive ANOVA models. The comparison between the two mentioned alternatives can be based on the principle of parsimony or Occam\'s razor dating back to the 14th century. In short the principle of parsimony states that for multiple models with equal predictive performance, the model that uses the fewest number of parameters should be preferred. Though this principle might be useful from a mathematical point of view as impartial selection tool for the final model, in practice the data analyst often prioritizes the model that is easier to understand rather than the solution having the fewest parameters. In the present study two different models (ANOVA and GEMANOVA) were compared on the same data set from a DoE. Applying different model selection criteria such as the principle of parsimony, ease of model interpretation, understanding and visualization, the study discusses the strengths and limitations in selecting the suitable model from a QbD design space perspective.
Keywords: ANOVA, GEMANOVA, parsimony, multiway data analysis