Purpose: Disease-specific pathogenic variant prediction tools that differentiate pathogenic variants from benign have been improved through disease specificity recently. However, they have not been evaluated on disease-specific pathogenic variants compared with other diseases, which would help to prioritize disease-specific variants from several genes or novel genes. Thus, we hypothesize that features of pathogenic variants alone would provide a better model.
Methods: We developed an eye disease-specific variant prioritization tool (eyeVarP), which applied the random forest algorithm to the data set of pathogenic variants of eye diseases and other diseases. We also developed the VarP tool and generalized pipeline to filter missense and insertion-deletion variants and predict their pathogenicity from exome or genome sequencing data, thus we provide a complete computational procedure.
Results: eyeVarP outperformed pan disease-specific tools in identifying eye disease-specific pathogenic variants under the top 10. VarP outperformed 12 pathogenicity prediction tools with an accuracy of 95% in correctly identifying the pathogenicity of missense and insertion-deletion variants. The complete pipeline would help to develop disease-specific tools for other genetic disorders.
Conclusion: eyeVarP performs better in identifying eye disease-specific pathogenic variants using pathogenic variant features and gene features. Implementing such complete computational procedure would significantly improve the clinical variant interpretation for specific diseases.
Keywords: Eye disease; Machine learning; Pathogenic variants; Variant filtering; Variant prioritization.
Copyright © 2023 American College of Medical Genetics and Genomics. Published by Elsevier Inc. All rights reserved.