The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual

G3 (Bethesda). 2023 Mar 9;13(3):jkac321. doi: 10.1093/g3journal/jkac321.

Abstract

We used long-read DNA sequencing to assemble the genome of a Southern Han Chinese male. We organized the sequence into chromosomes and filled in gaps using the recently completed T2T-CHM13 genome as a guide, yielding a gap-free genome, Han1, containing 3,099,707,698 bases. Using the T2T-CHM13 annotation as a reference, we mapped all genes onto the Han1 genome and identified additional gene copies, generating a total of 60,708 putative genes, of which 20,003 are protein-coding. A comprehensive comparison between the genes revealed that 235 protein-coding genes were substantially different between the individuals, with frameshifts or truncations affecting the protein-coding sequence. Most of these were heterozygous variants in which one gene copy was unaffected. This represents the first gene-level comparison between two finished, annotated individual human genomes.

Keywords: DNA sequencing; annotation; genome assembly; reference genome; variant calling.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, N.I.H., Extramural

MeSH terms

  • East Asian People* / genetics
  • Genome, Human*
  • Humans
  • Male
  • Molecular Sequence Annotation
  • Sequence Analysis, DNA