Introduction: In metabolomics data analysis, annotation enrichment analysis (AEA) is a method for detecting over-representation (enrichment) of annotations for metabolite features that have appreciable covariance across an analytical dataset. Enriched annotations, especially enriched pathway annotations, provide biological insight and interpretation of the experimental groups represented in the dataset. AEA applied only to pathway annotations is often called pathway enrichment analysis (PEA). Several knowledgebases have pathway annotations associated with metabolites, including the Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, and MetaCyc. However, these three knowledgebases combined have pathway annotations for less than 19,000 metabolites, which is a small fraction of detectable metabolites. For most metabolomics datasets, only 30% to 40% of the detected metabolites have pathway annotations, greatly limiting PEA in detecting enriched pathways. Methods: In this work, we have developed an extreme classification model using a dataset constructed from a combination of KEGG, Reactome, and MetaCyc, which can predict metabolic pathways based on the metabolite chemical structure. Results: Our model scored a Matthews correlation coefficient (MCC) of 0.9036 ± 0.0033. Using predicted pathway annotations, we demonstrate a sizable improvement in PEA results for over 150 experimental datasets downloaded from Metabolomics Workbench. Conclusions: Based on our results, we can confirm that significant improvement can be made to PEA for metabolomics experimental datasets using predicted pathways. The improvement to PEA demonstrates the accuracy of the predictions of our machine learning model.
Previous Article in event
Next Article in event
Machine Learning Predicted Pathway Annotations Greatly Improves Pathway Enrichment Analysis of Metabolomics Datasets
Published:
10 October 2025
by MDPI
in The 4th International Electronic Conference on Metabolomics
session Advanced Metabolomics and Data Analysis Approaches
Abstract:
Keywords: Machine Learning; Deep Learning; Neural Networks; Multi-layer Perceptron; Pathways; Metabolites; Pathway Annotation Enrichment Analysis; Metabolomics
