Identification of potential driver mutations in glioblastoma using machine learning

Brief Bioinform. 2022 Nov 19;23(6):bbac451. doi: 10.1093/bib/bbac451.

Abstract

Glioblastoma is a fast and aggressively growing tumor in the brain and spinal cord. Mutation of amino acid residues in targets proteins, which are involved in glioblastoma, alters the structure and function and may lead to disease. In this study, we collected a set of 9386 disease-causing (drivers) mutations based on the recurrence in patient samples and experimentally annotated as pathogenic and 8728 as neutral (passenger) mutations. We observed that Arg is highly preferred at the mutant sites of drivers, whereas Met and Ile showed preferences in passengers. Inspecting neighboring residues at the mutant sites revealed that the motifs YP, CP and GRH, are preferred in drivers, whereas SI, IQ and TVI are dominant in neutral. In addition, we have computed other sequence-based features such as conservation scores, Position Specific Scoring Matrices (PSSM) and physicochemical properties, and developed a machine learning-based method, GBMDriver (GlioBlastoma Multiforme Drivers), for distinguishing between driver and passenger mutations. Our method showed an accuracy and AUC of 73.59% and 0.82, respectively, on 10-fold cross-validation and 81.99% and 0.87 in a blind set of 1809 mutants. The tool is available at https://web.iitm.ac.in/bioinfo2/GBMDriver/index.html. We envisage that the present method is helpful to prioritize driver mutations in glioblastoma and assist in identifying therapeutic targets.

Keywords: cancer; driver mutation; glioblastoma; machine learning; motifs; variants.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids
  • Glioblastoma* / genetics
  • Humans
  • Machine Learning
  • Mutation
  • Proteins / genetics

Substances

  • Proteins
  • Amino Acids