In untargeted MS studies involving metabolomics the proportion of unknown or unidentifiable compounds (i.e. features) detected can often be >90%. Given that the proper identification of a true unknown can take many months or years of work, it is little wonder that few investigators are willing to undertake the task of rigorously identifying these unknowns. While experimental techniques such as suspect screening can lead to the occasional “lucky” hit, a more rapid and robust approach is needed for unknown identification. In this presentation I will introduce the concept of in silico metabolomics. This is a computational approach to unknown identification that combines the extensive knowledge of known compounds with the existing knowledge of how compounds are chemically or biologically transformed. In silico metabolomics fundamentally requires a large collection of known structures. Over the past 10 years we have created a number of compound databases that catalogue the known compounds, including human metabolites (HMDB), food constituents (FooDB), drugs (DrugBank), plant products (PhytoBank) and contaminants (ContaminantDB). We have also developed a software package called BioTransformer, that uses expert-knowledge combined with machine learning to accurately predict the biological and chemical transformations that known compounds may undergo in humans and in the environment. This software has been used to create a database called BioTranformerDB consisting of several million “biologically feasible” structures. By exploiting several in-house tools for accurate MS/MS and NMR spectral prediction we have been able to calculate the MS/MS and NMR spectra for all of the compounds in BioTransformerDB. Using these newly developed software tools and resources for in silico metabolomics, I will show how unknown compounds may be identified from untargeted MS studies.
Video from the Keynote Speaker Dr. David S. Wishart can be found:
https://www.youtube.com/watch?v=CAU_cWPtNHQ&feature=youtu.be