State-of-the-art genome inference in the human MHC

Int J Biochem Cell Biol. 2021 Feb:131:105882. doi: 10.1016/j.biocel.2020.105882. Epub 2020 Nov 12.

Abstract

The Major Histocompatibility Complex (MHC) on the short arm of chromosome 6 is associated with more diseases than any other region of the genome; it encodes the antigen-presenting Human Leukocyte Antigen (HLA) proteins and is one of the key immunogenetic regions of the genome. Accurate genome inference and interpretation of MHC association signals have traditionally been hampered by the region's uniquely complex features, such as high levels of polymorphism; inter-gene sequence homologies; structural variation; and long-range haplotype structures. Recent algorithmic and technological advances have, however, significantly increased the accessibility of genetic variation in the MHC; these developments include (i) accurate SNP-based HLA type imputation; (ii) genome graph approaches for variation-aware genome inference from next-generation sequencing data; (iii) long-read-based diploid de novo assembly of the MHC; (iv) cost-effective targeted MHC sequencing methods. Applied to hundreds of thousands of samples over the last years, these technologies have already enabled significant biological discoveries, for example in the field of autoimmune disease genetics. Remaining challenges concern the development of integrated methods that leverage haplotype-resolved de novo assembly of the MHC for the development of improved MHC genotyping methods for short reads and the construction of improved reference panels for SNP-based imputation. Improved genome inference in the MHC can crucially contribute to an improved genetic and functional understanding of many immune-related phenotypes and diseases.

Keywords: Genome graphs; Human leukocyte antigen; Long-read sequencing; Major histocompatibility complex; Statistical genotype imputation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Alleles
  • Base Sequence
  • Chromosome Mapping / methods*
  • Computational Biology / methods
  • Genetic Heterogeneity
  • Genome, Human / immunology*
  • HLA Antigens / classification
  • HLA Antigens / genetics*
  • HLA Antigens / immunology
  • Haplotypes
  • High-Throughput Nucleotide Sequencing / statistics & numerical data
  • Histocompatibility Testing / methods*
  • Humans
  • Linkage Disequilibrium
  • Major Histocompatibility Complex / genetics*
  • Polymorphism, Single Nucleotide

Substances

  • HLA Antigens