Inframe insertion and deletion mutations (indels) are commonly observed in cancer samples accounting for over 1% of all reported mutations. Few somatic inframe indels have been clinically documented as pathogenic and at present there are few tools to predict which indels drive cancer development. However, indels are a common feature of hereditary disease and several tools have been developed to predict the impact of inframe indels on protein function. In this study, we test whether six of the popular prediction tools can be adapted to test for cancer driver mutations and then develop a new algorithm (IndelRF) that discriminates between recurrent indels in known cancer genes and indels not associated with disease. IndelRF was developed to try and identify somatic, driver, and inframe indel mutations. Using a random forest classifier with 11 features, IndelRF achieved accuracies of 0.995 and 0.968 for insertion and deletion mutations, respectively. Finally, we use IndelRF to classify the inframe indel cancer mutations in the MOKCa database.
Keywords: cancer; inframe indels; mutations.