Please login first
Combinatorial Perturbation-Theory Machine Learning (CPTML) Models for Curation of Metabolic Reaction Networks
* 1 , 2, 3 , 4 , 5 , * 6, 7
1  Universidad Regional Amazónica Ikiam, Parroquia Muyuna km 7 vía Alto Tena, 150150, Tena-Napo, Ecuador
2  Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada
3  Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, ND, 58102, USA.
4  Department of Systems and Computer Engineering, Carleton University, K1S 5B6, Ottawa, ON, Canada
5  Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, ND, 58102, USA
6  IKERBASQUE, Basque Foundation for Science, 48011, Bilbao, Biscay, Spain.
7  Department of Organic Chemistry II, University of the Basque Country UPV/EHU, 48940, Leioa, Spain
Academic Editor: Humbert G. Díaz

Abstract:

Metabolic Reaction Networks (MRNs) are complex networks produced by thousands of chemical reactions or transformations (links) of metabolites (nodes) in a live organism. An essential goal of chemical biology is to test the connectivity (structure) of these complex MRNs models presented for new microorganisms with promising features. In theory, we can undertake hands-on testing (Manual Curation). However, due to the large number of possible combinations of node pairs, this is a difficult operation (possible metabolic reactions). We combined Combinatorial, Perturbation Theory, and Machine Learning approaches in this study to find a CPTML model for MRNs >40 organisms compiled by Barabasis' group. First, we used a novel type of node index termed Markov linear indices fk to quantify the local structure of a very large collection of nodes in each MRN. Next, for over 150 000 MRN query and reference node combinations, we computed CPT operators. Finally, we fed these CPT operators into several ML algorithms. The CPTML linear model obtained using the LDA algorithm is capable of distinguishing nodes (metabolites) with correct reaction assignment from nodes with incorrect reaction assignment with accuracy, specificity, and sensitivity values ranging from 85 to 100 % in both the training and external validation data series. Meanwhile, the top three non-linear models with more than 97.5 % accuracy were found to be PTML models based on Bayesian networks, J48-Decision Tree, and Random Forest algorithms. The new work sets the door for the investigation of MRNs from various organisms using PTML models. Finally, the new CPTML could be a useful tool for determining the structure of MRNs in new species in biotechnology.

Keywords: Barabasis' group; Machine learning; Markov linear indices; Metabolic Networks; Perturbation-Theory
Top