KoVariome: Korean National Standard Reference Variome database of whole genomes with comprehensive SNV, indel, CNV, and SV analyses

Sci Rep. 2018 Apr 4;8(1):5677. doi: 10.1038/s41598-018-23837-x.

Abstract

High-coverage whole-genome sequencing data of a single ethnicity can provide a useful catalogue of population-specific genetic variations, and provides a critical resource that can be used to more accurately identify pathogenic genetic variants. We report a comprehensive analysis of the Korean population, and present the Korean National Standard Reference Variome (KoVariome). As a part of the Korean Personal Genome Project (KPGP), we constructed the KoVariome database using 5.5 terabases of whole genome sequence data from 50 healthy Korean individuals in order to characterize the benign ethnicity-relevant genetic variation present in the Korean population. In total, KoVariome includes 12.7M single-nucleotide variants (SNVs), 1.7M short insertions and deletions (indels), 4K structural variations (SVs), and 3.6K copy number variations (CNVs). Among them, 2.4M (19%) SNVs and 0.4M (24%) indels were identified as novel. We also discovered selective enrichment of 3.8M SNVs and 0.5M indels in Korean individuals, which were used to filter out 1,271 coding-SNVs not originally removed from the 1,000 Genomes Project when prioritizing disease-causing variants. KoVariome health records were used to identify novel disease-causing variants in the Korean population, demonstrating the value of high-quality ethnic variation databases for the accurate interpretation of individual genomes and the precise characterization of genetic variations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA Copy Number Variations*
  • Databases, Genetic
  • Disease / genetics*
  • Female
  • Genetics, Population*
  • Genome, Human*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • INDEL Mutation*
  • Male
  • Mutation
  • Polymorphism, Single Nucleotide*
  • Reference Standards
  • Republic of Korea
  • Sequence Analysis, DNA
  • Whole Genome Sequencing / methods*