Microbes and plants produce a gold mine of chemically diverse, high-value molecules like antibiotics. However, chemical structures of many natural products (NPs) remain currently unknown, hampering medicinal applications. A key challenge for natural product discovery is the metabolome complexity in natural extracts, from which mass spectrometry data needs to be coupled to chemical structures. Nevertheless, many NPs share molecular substructures and form structurally related molecular families (MFs), which has inspired metabolome mining tools exploiting these biochemical relationships.
Here, we introduce a workflow that combines two existing metabolome mining tools to discover MFs, subfamilies, and subtle structural differences between family members. Where tandem mass spectral Molecular Networking (1) efficiently groups natural products in molecular families, MS2LDA (2) discovers substructures that aid in further recognition of subfamilies and shared modifications. Furthermore, through the combined use of Network Annotation Propagation (3) and ClassyFire (4), we can automatically perform MF chemical classifications. When unexpected MF classifications are observed, they could represent novel chemical scaffolds, thereby guiding follow-up prioritization efforts towards unknown chemistry. Recognition of the smaller building blocks (substructures) that form the basis of molecular families also accelerates data analysis, especially for cases where hardly any reference MS/MS spectra or candidate structures from structural databases are available.
We demonstrate how our integrative workflow discovers dozens of MFs in large-scale metabolomics studies of plant and bacterial extracts. For example, Rhamnaceae plants contained triterpenoid chemistries in which several distinct phenolic acid modifications (e.g., vanillate, protocatechuate) were readily recognized. Furthermore, a previously not annotated tryptophan-based MF was uncovered in marine Streptomyces extracts. In Photo/Xenorhabdus strains, following leads from peptidic natural products finding software Dereplicator (5), a Xenoamicin-based peptidic MF was deciphered and Mass2Motifs for both the peptidic ring and tail were easily annotated highlighting ring-related modifications. Our workflow accelerates NP discovery by MF and substructure annotations and classifications on an unprecedented large scale that will aid in future integration with genome mining workflows. Finally, the workflow applications go beyond the natural products field into nutritional, clinical, and exposome metabolomics.
References:
- Wang, M.. et al., “Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking”., Biotech.34(8):828-837, 2016.
- Van der Hooft, J.J.J. et al., “Topic modeling for untargeted substructure exploration in metabolomics”. N.A.S.113(48):13738-13743, 2016.
- da Silva, R.R. et al., “Propagating annotations of molecular networks using in silico fragmentation”.PLoS Comp. Biol. 14(4):e1006089, 2018.
- Djoumbou Feunang, Y. et al.“ClassyFire: automated chemical classification with a comprehensive, computable taxonomy”. Cheminformatics8(1): 61, 2016.
- Mohimani, H. et al., “Dereplication of peptidic natural products through database search of mass spectra”, Chem. Biol.13(1):30-37, 2017.