Machine learning-based reclassification of germline variants of unknown significance: The RENOVO algorithm

Valentina Favalli; Giulia Tini; Emanuele Bonetti; Gianluca Vozza; Alessandro Guida; Sara Gandini; Pier Giuseppe Pelicci; Luca Mazzarella

doi:10.1016/j.ajhg.2021.03.010

Machine learning-based reclassification of germline variants of unknown significance: The RENOVO algorithm

Am J Hum Genet. 2021 Apr 1;108(4):682-695. doi: 10.1016/j.ajhg.2021.03.010. Epub 2021 Mar 23.

Authors

Valentina Favalli¹, Giulia Tini¹, Emanuele Bonetti¹, Gianluca Vozza¹, Alessandro Guida², Sara Gandini¹, Pier Giuseppe Pelicci¹, Luca Mazzarella³

Affiliations

¹ Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milan, Italy.
² Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milan, Italy; Biomedical Translational Imaging Centre, Nova Scotia Health Authority and IWK Health Centre, Halifax, NS B3K 6R8, Canada.
³ Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milan, Italy. Electronic address: luca.mazzarella@ieo.it.

Abstract

The increasing scope of genetic testing allowed by next-generation sequencing (NGS) dramatically increased the number of genetic variants to be interpreted as pathogenic or benign for adequate patient management. Still, the interpretation process often fails to deliver a clear classification, resulting in either variants of unknown significance (VUSs) or variants with conflicting interpretation of pathogenicity (CIP); these represent a major clinical problem because they do not provide useful information for decision-making, causing a large fraction of genetically determined disease to remain undertreated. We developed a machine learning (random forest)-based tool, RENOVO, that classifies variants as pathogenic or benign on the basis of publicly available information and provides a pathogenicity likelihood score (PLS). Using the same feature classes recommended by guidelines, we trained RENOVO on established pathogenic/benign variants in ClinVar (training set accuracy = 99%) and tested its performance on variants whose interpretation has changed over time (test set accuracy = 95%). We further validated the algorithm on additional datasets including unreported variants validated either through expert consensus (ENIGMA) or laboratory-based functional techniques (on BRCA1/2 and SCN5A). On all datasets, RENOVO outperformed existing automated interpretation tools. On the basis of the above validation metrics, we assigned a defined PLS to all existing ClinVar VUSs, proposing a reclassification for 67% with >90% estimated precision. RENOVO provides a validated tool to reduce the fraction of uninterpreted or misinterpreted variants, tackling an area of unmet need in modern clinical genetics.

Keywords: ClinVar; VUS; machine learning; reclassification; variant interpretation.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Computer User Training
Datasets as Topic
Genes, BRCA1
Germ-Line Mutation / genetics*
Humans
Machine Learning*
Reproducibility of Results