DARVIC: Dihedral angle-reliant variant impact classifier for functional prediction of missense VUS

Comput Methods Programs Biomed. 2023 Aug:238:107596. doi: 10.1016/j.cmpb.2023.107596. Epub 2023 May 11.

Abstract

Background: Of the large number of genetic variants identified, the functional impact for most of them remains unknown. Mutations in DNA damage repair genes such as MUTYH, which is involved in repairing A:8-oxoG mismatches caused by reactive oxygen species, can cause a higher risk of cancer. Mutations happening in other key genes such as TP53 also pose huge health threats and risk of cancer. The interpretation of genetic variants' functional impact is a forefront issue that needs to be addressed. Many different in silico methods based on different principles have been developed and applied in interpreting genetic variants. However, a current challenge is that many existing methods tend to overpredict the pathogenicity of benign variants. A new approach is needed to tackle this issue to improve genetic variant interpretation through the use of in silico methods.

Methods: In this study, we developed another protein structural-based approach called Dihedral angle-reliant variant impact classifier (DARVIC) to predict the deleterious impact of the coding-changing missense variants. DARVIC uses Ramachandran's principle of protein stereochemistry as the theoretical foundation and uses molecular dynamics simulations coupled with a supervised machine learning algorithm XGBoost to determine the functional impact of missense variants on protein structural stability.

Results: We characterized the features of dihedral angles in dynamic protein structures. We also tested the performance of DARVIC in MUTYH and TP53 missense variants and achieved satisfactory results in reflecting the functional impacts of the variants on protein structure. The method achieved a balanced accuracy of 84% in a functionally validated MUTYH dataset containing both benign and pathogenic missense variants, higher than other existing in silico methods. Along with that, DARVIC was able to predict 119 (47%) deleterious variants from a dataset of 254 MUTYH VUS. Further application of DARVIC to a functionally validated TP53 dataset had a balanced accuracy of 94%, topping other methods, demonstrating DARVIC's robustness.

Conclusion: DARVIC provides a valuable tool to predict the functional impacts of missense variants based on their effects on protein structural stability and motion. At its current state, DARVIC performed well in predicting the functional impact of the missense variants both in MUTYH and TP53. We expect its high potential to predict functional impact for the missense variants in other genes.

Keywords: Dihedral angles; Human MutY homolog (MUTYH); Missense variants; Molecular dynamics simulations (MD simulations); Protein structure; Ramachandran principle; Tumor protein 53 (TP53).

MeSH terms

  • Algorithms
  • Humans
  • Mutation, Missense*
  • Neoplasms*