An integrated Asian human SNV and indel benchmark established using multiple sequencing methods

Sci Rep. 2020 Jun 17;10(1):9821. doi: 10.1038/s41598-020-66605-6.

Abstract

Sequencing technologies have been rapidly developed recently, leading to the breakthrough of sequencing-based clinical diagnosis, but accurate and complete genome variation benchmark would be required for further assessment of precision medicine applications. Despite the human cell line of NA12878 has been successfully developed to be a variation benchmark, population-specific variation benchmark is still lacking. Here, we established an Asian human variation benchmark by constructing and sequencing a stabilized cell line of a Chinese Han volunteer. By using seven different sequencing strategies, we obtained ~3.88 Tb clean data from different laboratories, hoping to reach the point of high sequencing depth and accurate variation detection. Through the combination of variations identified from different sequencing strategies and different analysis pipelines, we identified 3.35 million SNVs and 348.65 thousand indels, which were well supported by our sequencing data and passed our strict quality control, thus should be high confidence variation benchmark. Besides, we also detected 5,913 high-quality SNVs which had 969 sites were novel and located in the high homologous regions supported by long-range information in both the co-barcoding single tube Long Fragment Read (stLFR) data and PacBio HiFi CCS data. Furthermore, by using the long reads data (stLFR and HiFi CCS), we were able to phase more than 99% heterozygous SNVs, which helps to improve the benchmark to be haplotype level. Our study provided comprehensive sequencing data as well as the integrated variation benchmark of an Asian derived cell line, which would be valuable for future sequencing-based clinical development.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Asian People / genetics*
  • Benchmarking
  • Genome, Human / genetics
  • Haplotypes
  • High-Throughput Nucleotide Sequencing / standards*
  • Humans
  • INDEL Mutation / genetics*
  • Male
  • Polymorphism, Single Nucleotide / genetics*
  • Reference Standards