Please login first
SU-QMI: A Feature Selection Method Based on Graph Theory for Prediction of Antimicrobial Resistance in Gram-Negative Bacteria
* 1, 2 , 2, 3, 4 , 2, 3, 4
1  Department of Immunobiology and Bioinformatics Research, National Marrow Donor Program, Minneapolis, Minnesota, USA
2  School of Electrical Engineering and Computer Science, Washington State University, Pullman, Washington, USA
3  Paul G. Allen School for Global Animal Health, Washington State University, Pullman, Washington, USA
4  Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, Washington, USA

Abstract:

Machine learning can be used as an alternative to similarity algorithms such as BLAST when the latter fail to identify highly dissimilar antimicrobial resistance (AMR) genes in bacteria; however, determining the most informative characteristics, known as features, for AMR is essential in order to obtain accurate predictions. In this paper we introduce a feature selection algorithm called symmetrical uncertainty-qualitative mutual information (SU-QMI) which selects features based on estimates of their relevance, redundancy, and interdependency. We use the concepts of symmetrical uncertainty and qualitative mutual information in addition to graph theory to derive a feature selection method for identifying putative AMR genes in Gram-negative bacteria. First we extract physicochemical, evolutionary, and structural features from the protein sequences of five genera of Gram-negative bacteria-Acinetobacter, Klebsiella, Campylobacter, Salmonella, and Escherichia-which confer resistance to acetyltransferase (aac), beta-lactamase (bla), and dihydrofolate reductase (dfr). Our SU-QMI algorithm is then used to find the best subset of features, and a support vector machine (SVM) model is trained for AMR prediction using this feature subset. We evaluate the performance using an independent set of protein sequences from three Gram-negative bacterial genera-Pseudomonas, Vibrio, and Enterobacter-and achieve prediction accuracy ranging from 88% to 100%. Compared to the SU-QMI method, BLASTp requires similarity as low as 53% for comparable classification results. Thus, our results indicate the effectiveness of the SU-QMI method for selecting the best protein features for AMR prediction in Gram-negative bacteria.

Keywords: Antimicrobial Resistance; BLASTp; Feature Selection; Machine Learning; Qualitative Mutual Information; Symmetrical Uncertainty
Top