Antimicrobial peptides (AMPs) have emerged as a promising approach in the development of antibiotics. In contrast to traditional chemical-based antibiotics, AMPs exert their effects through a "physical" mechanism. Specific AMPs have the capability to physically disrupt the cell membrane of bacteria, leading to their demise. Nevertheless, it is crucial to consider the interaction between AMPs and normal cells. AMPs that indiscriminately eliminate all types of cells cannot be employed as pharmaceuticals, as they would also interfere with the regular physiological functions within our bodies.
The primary goal of this study is to mitigate the extent of hemolysis caused by the synthesized AMP sequences. Computational methods are employed to identify potential AMP sequences, as this approach proves to be cost-effective compared to the actual synthesis of the sequences. Hence, the early screening of sequences with the potential to induce hemolysis offers distinct advantages.
To accomplish this, a variety of ensemble classification models were constructed to ascertain whether a peptide sequence would induce a particular degree of hemolysis under specified peptide concentrations based on the dataset form DBAASP. These models were developed by integrating diverse machine learning techniques, including support vector machines, random forests, AdaBoost, multilayer perceptron, k-nearest neighbors, and XGBoost. In general, the results of this study demonstrate an accuracy of approximately 0.82, 0.8 and 0.81 in predicting whether a peptide sequence is hemolytic under a 10%, 20% and 40% hemolysis threshold, respectvely.