Please login first
A multi-target in silico model for anti haematological cancers drugs discovery.
* 1 , 2 , 1 , 2 , 1
1  Department for Life Quality Studies, Alma Mater Studiorum - University of Bologna, corso d'Augusto 237, 47921 Rimini, Italy
2  LAQV@REQUIMTE/Faculty of Sciences, University of Porto, Rua do Campo Alegre, s/n, 4169-007 Porto, Portugal
Academic Editor: Humbert G. Díaz

Abstract:

Haematological cancers are a heterogeneous family of tumours that manifests as clonal expansions of a single cell at different phases of haemopoietic development. They are divided into leukaemias, lymphomas and myelomas, each comprising a wide range of subtypes. They are often called "liquid tumours" since they do not produce nodules or masses, as other tumour types called "solid tumours" do. Due to this peculiarity, they cannot be surgically removed, hence chemotherapy is the mainstay of their treatment. Over the last 15 years, biological and chemical research has produced an enormous volume of data that has been digitalized, the majority of them are freely available in open access databases (Pubchem, ChEMBL, etc.). Through these, the modern drug discovery process has entered the Big Data era. Use of Big Data has transformed the way chemical molecules data are derived and used in research. Pivotal in this change has been the incorporation of artificial intelligence approaches, such as machine learning and deep learning algorithms, which have been successfully employed in Computer Aided Drug Design (CADD). By combining Big Data, CADD and artificial intelligence, it is possible to create computational models capable of predicting specific biological activity. The submentioned models can be employed for virtual screening, which allows to identify new molecules with the desired activity and exclude those lacking that activity and/or with adverse side effects, thereby creating a bottleneck process leading to the experimentation of the most promising molecules only. This in silico strategy could be effectively adopted in the research of new anticancer drugs rendering the drug discovery process more rapid, affordable and sustainable.

The purpose of this study was to create a multi-target Quantitative Structure-Activity Relationship (mt-QSAR) classification model, based on machine learning techniques, for the prediction of cytotoxic drugs simultaneously active against leukaemia, lymphoma and myeloma cell lines. More precisely, a dataset of about 11,000 molecules tested against 39 cell lines was extrapolated from the ChEMBL database. Although the data were extracted from a single database the bioactivity assays came from different experiments obtained by various research groups, thus they were sufficient to ensure a wide experimental diversity. The anti-tumour activity was reported as IC50 (concentration capable of inhibiting 50% of cell viability), and a cutoff value of 1 µM was chosen to discriminate active from inactive molecules. For each molecule, a set of 2D descriptors were calculated using AlvaDesc software.

The resulting dataset was submitted to the QSAR-Co-X software, which implements the Box-Jenkins moving average approach, allowing several experimental assay conditions to be incorporated into a single model for the prediction of simultaneous activity. This approach made it possible to discriminate the behaviour of molecules depending on the cell line, the type of assay and the time point used to investigate cytotoxic activity. In addition, the QSAR-Co-X software was employed to identify the best machine learning technique that yielded the best mt-QSAR model. This approach was proven to be Random Forest.

The result is a model with good predictive capabilities, as demonstrated by the accuracy metric (Acc), greater than 88%, and the Matthews correlation coefficient (MCC), greater than 0.83 in both the test set and the validation set.

Keywords: Big Data; mt-QSAR; machine learning; leukaemia lymphoma and myeloma cell lines; in vitro anticancer activity
Top