HRGPred: Prediction of herbicide resistant genes with k-mer nucleotide compositional features and support vector machine

Sci Rep. 2019 Jan 28;9(1):778. doi: 10.1038/s41598-018-37309-9.

Abstract

Herbicide resistance (HR) is a major concern for the agricultural producers as well as environmentalists. Resistance to commonly used herbicides are conferred due to mutation(s) in the genes encoding herbicide target sites/proteins (GETS). Identification of these genes through wet-lab experiments is time consuming and expensive. Thus, a supervised learning-based computational model has been proposed in this study, which is first of its kind for the prediction of seven classes of GETS. The cDNA sequences of the genes were initially transformed into numeric features based on the k-mer compositions and then supplied as input to the support vector machine. In the proposed SVM-based model, the prediction occurs in two stages, where a binary classifier in the first stage discriminates the genes involved in conferring the resistance to herbicides from other genes, followed by a multi-class classifier in the second stage that categorizes the predicted herbicide resistant genes in the first stage into any one of the seven resistant classes. Overall classification accuracies were observed to be ~89% and >97% for binary and multi-class classifications respectively. The proposed model confirmed higher accuracy than the homology-based algorithms viz., BLAST and Hidden Markov Model. Besides, the developed computational model achieved ~87% accuracy, while tested with an independent dataset. An online prediction server HRGPred ( http://cabgrid.res.in:8080/hrgpred ) has also been established to facilitate the prediction of GETS by the scientific community.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Gene Expression Regulation, Plant
  • Herbicide Resistance*
  • Models, Genetic
  • Plant Proteins / genetics*
  • Plants / genetics*
  • Sequence Analysis, DNA
  • Sequence Homology, Nucleic Acid
  • Support Vector Machine

Substances

  • Plant Proteins