Please login first
SU-QMI: A Feature Selection Method Based on Graph Theory for Prediction of Antimicrobial Resistance in Gram-Negative Bacteria
* 1, 2 , 2, 3, 4 , 2, 3, 4
1  Department of Immunobiology and Bioinformatics Research, National Marrow Donor Program, Minneapolis, Minnesota, USA
2  School of Electrical Engineering and Computer Science, Washington State University, Pullman, Washington, USA
3  Paul G. Allen School for Global Animal Health, Washington State University, Pullman, Washington, USA
4  Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, Washington, USA


Machine learning can be used as an alternative to similarity algorithms such as BLAST when the latter fail to identify highly dissimilar antimicrobial resistance (AMR) genes in bacteria; however, determining the most informative characteristics, known as features, for AMR is essential in order to obtain accurate predictions. In this paper we introduce a feature selection algorithm called symmetrical uncertainty-qualitative mutual information (SU-QMI) which selects features based on estimates of their relevance, redundancy, and interdependency. We use the concepts of symmetrical uncertainty and qualitative mutual information in addition to graph theory to derive a feature selection method for identifying putative AMR genes in Gram-negative bacteria. First we extract physicochemical, evolutionary, and structural features from the protein sequences of five genera of Gram-negative bacteria-Acinetobacter, Klebsiella, Campylobacter, Salmonella, and Escherichia-which confer resistance to acetyltransferase (aac), beta-lactamase (bla), and dihydrofolate reductase (dfr). Our SU-QMI algorithm is then used to find the best subset of features, and a support vector machine (SVM) model is trained for AMR prediction using this feature subset. We evaluate the performance using an independent set of protein sequences from three Gram-negative bacterial genera-Pseudomonas, Vibrio, and Enterobacter-and achieve prediction accuracy ranging from 88% to 100%. Compared to the SU-QMI method, BLASTp requires similarity as low as 53% for comparable classification results. Thus, our results indicate the effectiveness of the SU-QMI method for selecting the best protein features for AMR prediction in Gram-negative bacteria.

Keywords: Antimicrobial Resistance; BLASTp; Feature Selection; Machine Learning; Qualitative Mutual Information; Symmetrical Uncertainty