Whole-genome sequencing with long reads reveals complex structure and origin of structural variation in human genetic variations and somatic mutations in cancer

Genome Med. 2021 Apr 29;13(1):65. doi: 10.1186/s13073-021-00883-1.

Abstract

Background: Identification of germline variation and somatic mutations is a major issue in human genetics. However, due to the limitations of DNA sequencing technologies and computational algorithms, our understanding of genetic variation and somatic mutations is far from complete.

Methods: In the present study, we performed whole-genome sequencing using long-read sequencing technology (Oxford Nanopore) for 11 Japanese liver cancers and matched normal samples which were previously sequenced for the International Cancer Genome Consortium (ICGC). We constructed an analysis pipeline for the long-read data and identified germline and somatic structural variations (SVs).

Results: In polymorphic germline SVs, our analysis identified 8004 insertions, 6389 deletions, 27 inversions, and 32 intra-chromosomal translocations. By comparing to the chimpanzee genome, we correctly inferred events that caused insertions and deletions and found that most insertions were caused by transposons and Alu is the most predominant source, while other types of insertions, such as tandem duplications and processed pseudogenes, are rare. We inferred mechanisms of deletion generations and found that most non-allelic homolog recombination (NAHR) events were caused by recombination errors in SINEs. Analysis of somatic mutations in liver cancers showed that long reads could detect larger numbers of SVs than a previous short-read study and that mechanisms of cancer SV generation were different from that of germline deletions.

Conclusions: Our analysis provides a comprehensive catalog of polymorphic and somatic SVs, as well as their possible causes. Our software are available at https://github.com/afujimoto/CAMPHOR and https://github.com/afujimoto/CAMPHORsomatic .

Keywords: Germline SVs; Long reads; Origin of structural variations (SVs); Somatic SVs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • DNA Methylation / genetics
  • Genome, Human*
  • Genomic Structural Variation*
  • Germ-Line Mutation / genetics
  • Humans
  • INDEL Mutation / genetics
  • Mutation / genetics*
  • Neoplasms / genetics*
  • Promoter Regions, Genetic / genetics
  • Telomerase / genetics
  • Viruses / metabolism
  • Whole Genome Sequencing*

Substances

  • TERT protein, human
  • Telomerase