HUPAN: a pan-genome analysis pipeline for human genomes

Genome Biol. 2019 Jul 31;20(1):149. doi: 10.1186/s13059-019-1751-y.

Abstract

The human reference genome is still incomplete, especially for those population-specific or individual-specific regions, which may have important functions. Here, we developed a HUman Pan-genome ANalysis (HUPAN) system to build the human pan-genome. We applied it to 185 deep sequencing and 90 assembled Han Chinese genomes and detected 29.5 Mb novel genomic sequences and at least 188 novel protein-coding genes missing in the human reference genome (GRCh38). It can be an important resource for the human genome-related biomedical studies, such as cancer genome analysis. HUPAN is freely available at http://cgm.sjtu.edu.cn/hupan/ and https://github.com/SJTU-CGM/HUPAN .

Keywords: Core genome; Genome assembly; Pan-genome; Population-specific variation; Presence-absence variation (PAV).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Asian People / genetics
  • Black People / genetics
  • Genome, Human*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Proteins / genetics
  • Sequence Analysis, DNA
  • Software*

Substances

  • Proteins