Please login first
SAMP: An Accurate Ensemble Model Based on Proportionalized Split Amino Acid Composition for Identifying Antimicrobial Peptides
1 , 2 , 3 , 3 , * 2
1  Department of Biostatistics, Harvard School of Public Health
2  Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center
3  Department of Pathology and Microbiology, University of Nebraska Medical Center
Academic Editor: Guangshun Wang (registering DOI)

Antimicrobial peptides (AMPs), a class of innate immune molecules, have received significant attention for their capacity to combat a broad spectrum of pathogens, including viruses, bacteria, and fungi. Predicting AMPs has made it easy and efficient to find AMPs from large datasets with high accuracy. Recent years have witnessed wide applications of computational methods especially machine learning and deep learning for discovering and engineering AMPs. However, existing methods only use features including compositional, physiochemical, and structural properties of peptide sequences, which cannot fully capture information from AMPs. Here, we present SAMP, an ensemble random projection (RP) based computational model that leverages a new type of features called proportionalized split amino acid composition (PSAAC) in addition to conventional sequence-based features for AMP prediction. With this new feature set, SAMP captures the residue patterns like sorting signals at around both the N terminus and C terminus, while also retaining the sequence order information from the middle peptide fragments. Benchmarking tests on balanced and imbalanced datasets from different species demonstrate that SAMP consistently outperforms existing state-of-the-art methods, such as iAMPpred, in terms of accuracy, sensitivity, specificity and AUC. We further incorporate the ensemble RP architecture in our model, so that our model SAMP is scalable to processing large scale AMP screening with further performance improvement, compared to those without RP. To enhance the impact of SAMP, we have developed a Python package for it, which is freely and publicly available at

Keywords: Antimicrobial peptide prediction; Proportionalized split amino acid composition; Support vector machines; Random projection; Ensemble learning