Please login first
Predicting Proteasome Inhibition using Atomic Weighted Vector and Machine Learning
* 1, 2 , 3 , 3 , 3 , 4 , 5
1  Unidad de Toxicología Experimental, Universidad de Ciencias Médicas de Villa Clara, Santa Clara, Villa Clara, Cuba. CP: 50200, Cuba
2  Bioinformatic Research in Systems & Computer Engineering, Carleton University, Ottawa, Canada
3  Department of Computer Sciences, Faculty of Informatics, Camaguey University, Camaguey City, 74650, Camaguey Cuba
4  Departamento de Química, Universidade Federal de Lavras, CP 3037, 37200-000, Lavras, MG, Brazil
5  Institut Universitari de Ciència Molecular, Universitat de València, Edifici d'Instituts de Paterna, P. O. Box 22085, 46071 Valencia, Spain


Ubiquitin/Proteasome System (UPS) is a highly regulated mechanism of intracellular protein degradation and turnover. Through the concerted actions of a series of enzymes, proteins are marked for proteasomal degradation by being linked to the polypeptide co-factor, ubiquitin. The UPS participates in a wide array of biological functions such as antigen presentation, regulation of gene transcription and the cell cycle, and activation of NF-κB. Some researchers have applied QSAR method and machine learning in the study of proteasome inhibition (EC50(µmol/L)), such as: the analysis of proteasome inhibition prediction, in the prediction of multi-target inhibitors of UPP and in the prediction of protein contact map. Following this idea, we applied the new tool for obtaining molecular descriptor for modeling of proteasome Inhibition EC50 (µmol/L), in which has used this novel molecular descriptors (MDs) and different classification algorithms for these quantitative structure-activity relationship (QSAR) studies. In the present research, we use the Atomic Weighted Vector (AWV) as attributes with the objective to develop the QSAR modeling of this datasets and also compare a set of different machine learning (ML) techniques to solve this problem, such as: Linear Regression (LR), Multiple linear regression (MLR), Decision tree(DT), Regression Tree(RT), Random Forest(RF), M5P, K-nearest neighbors (IBK or kNN), Multi-Layer perceptron (MLP), Best-first search (BF) and Genetic Algorithm (GA). The figure shows the results of R2 of the ML-QSAR using ten- folds cross validation for 258 compounds. The results indicate that AWVs are very important tool for modeling the proteasome inhibitory regardless of the ML algorithm used. It can be suggested that the MD-AWV are suitable for codifying important structural information of the molecules and, thus, constitute an interesting alternative to building effective models for the prediction of the values of EC50 (µmol/L).

Keywords: Atom Weighted Vector; Genetic Algorithm; Molecular Descriptor; Machine Learning; Ubiquitin/Proteasome System