Analysis of large-scale data acquired from untargeted high-resolution LC-MS metabolomics experiments requires time and significant manual effort due to the large data volume, high noise levels, and data redundancy. Most existing tools depend on user-defined signal-to-noise thresholds and often lack intuitive graphical interfaces, making them less accessible to users without programming expertise. To address these challenges, we developed a novel, end-to-end LC-MS data analysis algorithm featuring an AI-powered preprocessing pipeline and a user-friendly interface. Our approach integrates data quality checks, automated peak detection and alignment, and statistical analysis, while maintaining platform-agnostic compatibility and supporting large datasets.
The algorithm uses a one-dimensional convolutional neural network (CNN) classifier to distinguish true peaks from noise, significantly reducing false positives. A separate encoder–decoder CNN architecture accurately segments peaks and computes the area under the curve. These models were trained on over 10,000 synthetic and 5,000 human-annotated experimental data points, achieving over 97% classification accuracy and a mean IOU of 0.92 for peak boundary segmentation. The method achieves over 80% data reduction, lowering computation costs and manual workload.
When benchmarked against available open source data analysis tools such as MZmine, XCMS, and MS-Dial, our algorithm demonstrated superior quantification accuracy and reproducibility. Although it detects fewer total features, these included a higher number of true features — 929 compared to 857 (MS-Dial) and 866 (XCMS), with lower coefficients of variation across replicates. Overall, the AI-powered algorithm combines deep learning with scalable cloud computing to offer a powerful, accessible solution for robust, automated analysis of metabolomics data. Our algorithm facilitates faster identification of hidden patterns, biomarkers, and pathways, and enables data mining of public-domain metabolomics datasets.
The algorithm can be freely accessed at https://msone.claritybiosystems.com and is capable of analyzing DDA, DIA, and ion mobility data.