Please login first
Machine Learning Predicted Pathway Annotations Greatly Improves Pathway Enrichment Analysis of Metabolomics Datasets
* , , , *
1  University of Kentucky, Lexington, USA
Academic Editor: Reza Salek

Abstract:

Introduction: In metabolomics data analysis, annotation enrichment analysis (AEA) is a method for detecting over-representation (enrichment) of annotations for metabolite features that have appreciable covariance across an analytical dataset. Enriched annotations, especially enriched pathway annotations, provide biological insight and interpretation of the experimental groups represented in the dataset. AEA applied only to pathway annotations is often called pathway enrichment analysis (PEA). Several knowledgebases have pathway annotations associated with metabolites, including the Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, and MetaCyc. However, these three knowledgebases combined have pathway annotations for less than 19,000 metabolites, which is a small fraction of detectable metabolites. For most metabolomics datasets, only 30% to 40% of the detected metabolites have pathway annotations, greatly limiting PEA in detecting enriched pathways. Methods: In this work, we have developed an extreme classification model using a dataset constructed from a combination of KEGG, Reactome, and MetaCyc, which can predict metabolic pathways based on the metabolite chemical structure. Results: Our model scored a Matthews correlation coefficient (MCC) of 0.9036 ± 0.0033. Using predicted pathway annotations, we demonstrate a sizable improvement in PEA results for over 150 experimental datasets downloaded from Metabolomics Workbench. Conclusions: Based on our results, we can confirm that significant improvement can be made to PEA for metabolomics experimental datasets using predicted pathways. The improvement to PEA demonstrates the accuracy of the predictions of our machine learning model.

Keywords: Machine Learning; Deep Learning; Neural Networks; Multi-layer Perceptron; Pathways; Metabolites; Pathway Annotation Enrichment Analysis; Metabolomics

 
 
Top