Please login first
Machine Learning Analysis Suggests Relative Protein Abundance is Weakly Correlated with Snake Venom Toxicity
* 1 , 1 , 1 , 1 , 2 , * 1
1  Laboratory of AI and Biomedical – Informatics (LAB-I), Faculty of Medical Sciences (FMS), Mohammed VI Polytechnic University (UM6P), Ben Guerir, Morocco
2  Department of Human Genetics, University of Texas Rio Grande Valley, Brownsville, TX, USA.
Academic Editor: R. Manjunatha Kini

https://doi.org/10.3390/IECT2023-14785 (registering DOI)
Abstract:
  1. Snake bite is a neglected public health issue in many tropical and subtropical countries. Each year, about 5.4 million snake bites occur, resulting in 138,000 deaths, and over 400,000 amputations and other permanent disabilities. The bite causes severe neurotoxic, hemorrhagic and myotoxic damage, the extent of which depends on the toxicity and venom composition of the snake species. Therefore, predicting the toxicity from snake venom composition would vastly improve diagnosis, antivenom treatment, saving lives and limbs. Herein, we investigate the potential of Machine Learning (ML) in venomics, by training several models to predict Lethal Dose (LD50) from venom composition. The analysis was conducted on 130 snake species (15% of all venomous species), using five ML models: Support Vector Machine (SVM), Multilayer Perceptron (MLP), Linear Regression, Decision Tree, Random Forest, and four ensemble learning methods: Stacking, Voting, Bagging, AdaBoost, trained to predict LD50 from relative protein abundance. Although, data from 14 proteins and enzymes were combined, results showed an overall weak correlation between model prediction and LD50 (correlations ranging from 0.48 to 0.55, R2 ranging from 0.16 to 0.23), even when considering only the highly significant proteins and enzymes: SVMP, SVSP, 3FTx, and PLA2. These results, challenge the assumption that relative protein abundance is the main driver of toxicity. They suggest that toxicity is a multi-factor phenomenon influenced by different biological aspects, such as protein 3D structure and potential binding sites. This in turn highlights the need for high quality multi-modal venomics databases, combining toxicity with several biological factors such as protein structure and metabolic data to better understand the nature of snake venom toxicity.
Keywords: Machine Learning in Venom, LD50 Prediction, Venomics regression, ML Snake Venom, AI in Venomics

 
 
Top