Identifying the Impact of Inframe Insertions and Deletions on Protein Function in Cancer

J Comput Biol. 2020 May;27(5):786-795. doi: 10.1089/cmb.2018.0192. Epub 2019 Aug 28.

Abstract

Inframe insertion and deletion mutations (indels) are commonly observed in cancer samples accounting for over 1% of all reported mutations. Few somatic inframe indels have been clinically documented as pathogenic and at present there are few tools to predict which indels drive cancer development. However, indels are a common feature of hereditary disease and several tools have been developed to predict the impact of inframe indels on protein function. In this study, we test whether six of the popular prediction tools can be adapted to test for cancer driver mutations and then develop a new algorithm (IndelRF) that discriminates between recurrent indels in known cancer genes and indels not associated with disease. IndelRF was developed to try and identify somatic, driver, and inframe indel mutations. Using a random forest classifier with 11 features, IndelRF achieved accuracies of 0.995 and 0.968 for insertion and deletion mutations, respectively. Finally, we use IndelRF to classify the inframe indel cancer mutations in the MOKCa database.

Keywords: cancer; inframe indels; mutations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Databases, Genetic
  • Genome, Human / genetics
  • Humans
  • INDEL Mutation / genetics*
  • Neoplasm Proteins / genetics*
  • Neoplasms / genetics*
  • Neoplasms / pathology
  • Oncogenes / genetics

Substances

  • Neoplasm Proteins