Antimicrobial peptides (AMPs), a class of innate immune molecules, have received significant attention for their capacity to combat a broad spectrum of pathogens, including viruses, bacteria, and fungi. Predicting AMPs has made it easy and efficient to find AMPs from large datasets with high accuracy. Recent years have witnessed wide applications of computational methods especially machine learning and deep learning for discovering and engineering AMPs. However, existing methods only use features including compositional, physiochemical, and structural properties of peptide sequences, which cannot fully capture information from AMPs. Here, we present SAMP, an ensemble random projection (RP) based computational model that leverages a new type of features called proportionalized split amino acid composition (PSAAC) in addition to conventional sequence-based features for AMP prediction. With this new feature set, SAMP captures the residue patterns like sorting signals at around both the N terminus and C terminus, while also retaining the sequence order information from the middle peptide fragments. Benchmarking tests on balanced and imbalanced datasets from different species demonstrate that SAMP consistently outperforms existing state-of-the-art methods, such as iAMPpred, in terms of accuracy, sensitivity, specificity and AUC. We further incorporate the ensemble RP architecture in our model, so that our model SAMP is scalable to processing large scale AMP screening with further performance improvement, compared to those without RP. To enhance the impact of SAMP, we have developed a Python package for it, which is freely and publicly available at https://github.com/wan-mlab/SAMP.
Previous Article in event
Previous Article in session
Next Article in event
SAMP: An Accurate Ensemble Model Based on Proportionalized Split Amino Acid Composition for Identifying Antimicrobial Peptides
Published:
12 October 2023
by MDPI
in Antimicrobial Peptides: Yesterday, Today and Tomorrow
session Database, design and prediction of antimicrobial peptides
https://doi.org/10.3390/APD20symposium-14949
(registering DOI)
Abstract:
Keywords: Antimicrobial peptide prediction; Proportionalized split amino acid composition; Support vector machines; Random projection; Ensemble learning