Please login first
Tackling noise and redundancy in LC-MS metabolomics data using an AI-powered analysis workflow
1 , * 2
1  Clarity Bio Systems India Pvt. Ltd., Mumbai, India.
2  Indian Institute of Technology Bombay, Clarity Bio Systems India Pvt. Ltd., Mumbai, India
Academic Editor: Shuhai Lin

Abstract:

Analysis of large-scale data acquired from untargeted high-resolution LC-MS metabolomics experiments requires time and significant manual effort due to the large data volume, high noise levels, and data redundancy. Most existing tools depend on user-defined signal-to-noise thresholds and often lack intuitive graphical interfaces, making them less accessible to users without programming expertise. To address these challenges, we developed a novel, end-to-end LC-MS data analysis algorithm featuring an AI-powered preprocessing pipeline and a user-friendly interface. Our approach integrates data quality checks, automated peak detection and alignment, and statistical analysis, while maintaining platform-agnostic compatibility and supporting large datasets.

The algorithm uses a one-dimensional convolutional neural network (CNN) classifier to distinguish true peaks from noise, significantly reducing false positives. A separate encoder–decoder CNN architecture accurately segments peaks and computes the area under the curve. These models were trained on over 10,000 synthetic and 5,000 human-annotated experimental data points, achieving over 97% classification accuracy and a mean IOU of 0.92 for peak boundary segmentation. The method achieves over 80% data reduction, lowering computation costs and manual workload.

When benchmarked against available open source data analysis tools such as MZmine, XCMS, and MS-Dial, our algorithm demonstrated superior quantification accuracy and reproducibility. Although it detects fewer total features, these included a higher number of true features — 929 compared to 857 (MS-Dial) and 866 (XCMS), with lower coefficients of variation across replicates. Overall, the AI-powered algorithm combines deep learning with scalable cloud computing to offer a powerful, accessible solution for robust, automated analysis of metabolomics data. Our algorithm facilitates faster identification of hidden patterns, biomarkers, and pathways, and enables data mining of public-domain metabolomics datasets.

The algorithm can be freely accessed at https://msone.claritybiosystems.com and is capable of analyzing DDA, DIA, and ion mobility data.

Keywords: Metabolomics, Artificial Intelligence, Noise and Redundancy, LC-MS

 
 
Top