The rising incidence of fungal infections has prompted the exploration of novel therapeutic avenues, with antimicrobial peptides (AMPs) emerging as promising candidates for antifungal therapies. Various computational methodologies, including template-based approaches, docking simulations, alignment methods, and machine learning techniques, have been harnessed for predicting and designing antifungal peptides (AFPs).
In this study, we developed an artificial neural network (ANN) based deep-learning model to predict antifungal activity of peptides using their amino acid sequence. Leveraging a diverse dataset of experimentally validated antifungal peptides, our model predicts antifungal activity and facilitates the design of new peptides with high in silico predicted efficacy.
The positive dataset comprised 1478 unique AFPs from Antifp, Uniprot, and APD3 databases, while the negative dataset consisted of an equal number of a mix of random sequences from Uniprot (not classified as AFPs or AMPs) and randomly generated sequences. Employing an 80/20 train-test split, we used one-hot encoding to transform peptide sequences into a format suitable to neural network (NN) analysis, enabling the utilization of convolutional neural networks (CNN) and long short-term memory (LSTM) layers. LSTM layers are widely recognized for their utility in capturing sequential dependencies effectively, making them the go-to choice for peptide prediction using ANNs. Our model architecture incorporates combinations of CNN with LSTM layers of varying units, accompanied by dropout layers to reduce overfitting and dense layers to extract pivotal features from peptide sequences, enabling the detection of subtle patterns associated with antifungal activity. Model performance was evaluated based on ROC (Receiver Operating Characteristic) curve area, sensitivity [SE = true positive / (true positive + false negative)], and specificity [SP = true negative / (true negative + false positive)]. Models were built using Keras framework.
The best-performing model featured a single CNN layer followed by LSTM, dense, and dropout layers. While similar architectures have been explored in previous AFP studies, a notable distinction lies in the incorporation of dropout layers with distinct dropout rates in all LSTM and dense layers. The model achieved remarkable performance metrics: Accuracy = 92.46%, SE = 92.98%, SP = 91.95%, and ROC curve area = 0.98. Furthermore, we employed the model in reverse to generate sequences with high predicted antifungal activity. Random 5-40 mer sequences were iteratively generated and subjected to model prediction. The iterations stopped when 1000 sequences surpassing a prediction threshold of 0.99 were found. This calculation took 178534 iterations (approx. 3 h) on a T4 GPU. The 5 sequences with the highest predicted value were: NPTALKKLHKAR, CTRRPCIA, KSCNVNCACVR, TKCCGVMKAVNGPCYCW, ECKCYPSCPVRHKY.
Our work's next steps entail synthesizing and evaluating the identified sequences against various fungal species. Additionally, we plan to develop and make available an online application for public use, enabling the prediction and design of AFPs using our model. These outcomes underscore the potency of ANN-based approaches in predicting biological peptide activity solely from their amino acid sequences, with significant implications for the tailored design of novel AFPs.