Integration of Protein Structure and Population-Scale DNA Sequence Data for Disease Gene Discovery and Variant Interpretation

Annu Rev Biomed Data Sci. 2022 Aug 10:5:141-161. doi: 10.1146/annurev-biodatasci-122220-112147. Epub 2022 May 4.

Abstract

The experimental and computational techniques for capturing information about protein structures and genetic variation within the human genome have advanced dramatically in the past 20 years, generating extensive new data resources. In this review, we discuss these advances, along with new approaches for determining the impact a genetic variant has on protein function. We focus on the potential of new methods that integrate human genetic variation into protein structures to discover relationships to disease, including the discovery of mutational hotspots in cancer-related proteins, the localization of protein-altering variants within protein regions for common complex diseases, and the assessment of variants of unknown significance for Mendelian traits. We expect that approaches that integratethese data sources will play increasingly important roles in disease gene discovery and variant interpretation.

Keywords: data integration; disease gene discovery; population genetics; protein 3D structure; variant interpretation.

Publication types

  • Review
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Genetic Association Studies
  • Genetic Variation* / genetics
  • Genome, Human* / genetics
  • Humans
  • Phenotype