New tool useful for drug discovery validated through benchmark datasets

¹ Unidad de Toxicología Experimental, Universidad de Ciencias Médicas de Villa Clara, Santa Clara, Villa Clara, Cuba. CP: 50200, Cuba
² Bioinformatic Research in Systems & Computer Engineering, Carleton University, Ottawa, Canada
³ Department of Computer Sciences, Faculty of Informatics, Camaguey University, Camaguey City, 74650, Camaguey Cuba
⁴ Departamento de Química, Universidade Federal de Lavras, CP 3037, 37200-000, Lavras, MG, Brazil
⁵ Grupo de Investigación en Estudios Químicos y Biológicos, Facultad de Ciencias Básicas, Universidad Tecnológica de Bolívar, Cartagena de Indias, Bolívar, Colombia
⁶ Institut Universitari de Ciència Molecular, Universitat de València, Edifici d'Instituts de Paterna, P. O. Box 22085, 46071 Valencia, Spain

Published: 16 March 2018 by MDPI in MOL2NET'18, Conference on Molecular, Biomed., Comput. & Network Science and Engineering, 4th ed. congress USEDAT-04: USA-Europe Data Analysis Training Program Workshop, Cambridge, UK-Bilbao, Spain-Miami, USA, 2018

https://doi.org/10.3390/mol2net-04-05132

Abstract:

Atomic Weighted Vectors (AWVs) are vectors that contain the codified information of molecular structures, which can apply to a set of Aggregation Operators (AOs) to calculate total and local molecular descriptors (MDs). This article presents an exploratory study of a new tool useful for drug discovery using different datasets, such as DRAGON and Sutherland’s datasets, as well as their comparison with other well-known approaches. In order to evaluate the performance of the tool, several statistics and QSAR/QSPR experiments were performed. Variability analyses are used to quantify the information content of the AWVs obtained from the tool, by the way of an information theory-based algorithm. Principal Components Analysis (PCA) is used to analyze the orthogonality of these descriptors, for which the new MDs from AWVs provide different information from those codified by DRAGON descriptors (0-2D). The QSAR models are obtained for every Sutherland’s dataset, according to the original division into training/test sets, by means of the Multiple Linear Regression with Genetic Algorithm (MLR-GA). These models have been validated and compare favorably to several approaches previously published, using the same benchmark datasets. The obtained results show that this tool should be a useful strategy for the QSAR/QSPR studies, despite its simplicity.

Keywords: Aggregation; Atomic weighted vector; Multiple linear regression; Operator; Principal components analysis; QSAR; Variability

View Poster