Segmental duplications and their variation in a complete human genome

Science. 2022 Apr;376(6588):eabj6965. doi: 10.1126/science.abj6965. Epub 2022 Apr 1.

Abstract

Despite their importance in disease and evolution, highly identical segmental duplications (SDs) are among the last regions of the human reference genome (GRCh38) to be fully sequenced. Using a complete telomere-to-telomere human genome (T2T-CHM13), we present a comprehensive view of human SD organization. SDs account for nearly one-third of the additional sequence, increasing the genome-wide estimate from 5.4 to 7.0% [218 million base pairs (Mbp)]. An analysis of 268 human genomes shows that 91% of the previously unresolved T2T-CHM13 SD sequence (68.3 Mbp) better represents human copy number variation. Comparing long-read assemblies from human (n = 12) and nonhuman primate (n = 5) genomes, we systematically reconstruct the evolution and structural haplotype diversity of biomedically relevant and duplicated genes. This analysis reveals patterns of structural heterozygosity and evolutionary differences in SD organization between humans and other primates.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • DNA Copy Number Variations*
  • Evolution, Molecular
  • GTPase-Activating Proteins / genetics
  • Gene Duplication*
  • Genome, Human*
  • Humans
  • Polymorphism, Single Nucleotide
  • Proto-Oncogene Proteins / genetics
  • Segmental Duplications, Genomic*

Substances

  • GTPase-Activating Proteins
  • Proto-Oncogene Proteins
  • TBC1D3 protein, human