Unsupervised Clustering of Missense Variants in HNF1A Using Multidimensional Functional Data Aids Clinical Interpretation

Am J Hum Genet. 2020 Oct 1;107(4):670-682. doi: 10.1016/j.ajhg.2020.08.016. Epub 2020 Sep 9.

Abstract

Exome sequencing in diabetes presents a diagnostic challenge because depending on frequency, functional impact, and genomic and environmental contexts, HNF1A variants can cause maturity-onset diabetes of the young (MODY), increase type 2 diabetes risk, or be benign. A correct diagnosis matters as it informs on treatment, progression, and family risk. We describe a multi-dimensional functional dataset of 73 HNF1A missense variants identified in exomes of 12,940 individuals. Our aim was to develop an analytical framework for stratifying variants along the HNF1A phenotypic continuum to facilitate diagnostic interpretation. HNF1A variant function was determined by four different molecular assays. Structure of the multi-dimensional dataset was explored using principal component analysis, k-means, and hierarchical clustering. Weights for tissue-specific isoform expression and functional domain were integrated. Functionally annotated variant subgroups were used to re-evaluate genetic diagnoses in national MODY diagnostic registries. HNF1A variants demonstrated a range of behaviors across the assays. The structure of the multi-parametric data was shaped primarily by transactivation. Using unsupervised learning methods, we obtained high-resolution functional clusters of the variants that separated known causal MODY variants from benign and type 2 diabetes risk variants and led to reclassification of 4% and 9% of HNF1A variants identified in the UK and Norway MODY diagnostic registries, respectively. Our proof-of-principle analyses facilitated informative stratification of HNF1A variants along the continuum, allowing improved evaluation of clinical significance, management, and precision medicine in diabetes clinics. Transcriptional activity appears a superior readout supporting pursuit of transactivation-centric experimental designs for high-throughput functional screens.

Keywords: HNF1A; bioinformatics; cluster analysis; diabetes; genetics; monogenic diabetes; protein function; rare variants; type 2 diabetes.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Alleles
  • Child
  • Cluster Analysis
  • Datasets as Topic
  • Diabetes Mellitus, Type 2 / diagnosis
  • Diabetes Mellitus, Type 2 / epidemiology
  • Diabetes Mellitus, Type 2 / genetics*
  • Diabetes Mellitus, Type 2 / pathology
  • Exome Sequencing
  • Female
  • Gene Expression
  • Genetic Predisposition to Disease*
  • Hepatocyte Nuclear Factor 1-alpha / genetics*
  • Humans
  • Male
  • Mutation, Missense*
  • Norway / epidemiology
  • Phenotype
  • Principal Component Analysis
  • Registries*
  • United Kingdom / epidemiology
  • Unsupervised Machine Learning*
  • Young Adult

Substances

  • HNF1A protein, human
  • Hepatocyte Nuclear Factor 1-alpha

Supplementary concepts

  • Mason-Type Diabetes