High-throughput single nucleotide variant discovery in E14 mouse embryonic stem cells provides a new reference genome assembly

Genomics. 2014 Aug;104(2):121-7. doi: 10.1016/j.ygeno.2014.06.007. Epub 2014 Jul 5.

Abstract

Mouse E14 embryonic stem cells (ESCs) are a well-characterized and widespread used ESC line, often employed for genome-wide studies involving next generation sequencing analysis. More than 2×10(9) sequences made on Illumina platform derived from the genome of E14 ESCs were used to build a database of about 2.7×10(6) single nucleotide variants (SNVs). The identified variants are enriched in intergenic regions, but several thousands reside in gene exons and regulatory regions, such as promoters, enhancers, splicing sites and untranslated regions of RNA, thus indicating high probability of an important functional impact on the molecular biology of these cells. We created a new E14 genome assembly reference that increases the number of mapped reads of about 5%. We performed a Reduced Representation Bisulfite Sequencing on E14 ESCs and we obtained an increase of about 120,000 called CpGs and avoided about 20,000 wrong CpG calls with respect to the mm9 genome reference.

Keywords: E14; ESC; Genome reference; Genotyping; SNV; Sequencing.

MeSH terms

  • Animals
  • Cell Line
  • DNA, Intergenic
  • Databases, Factual
  • Embryonic Stem Cells / metabolism*
  • Genome*
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing / methods*
  • Mice
  • Polymorphism, Single Nucleotide*
  • Regulatory Sequences, Nucleic Acid
  • Reproducibility of Results
  • Sequence Analysis, DNA

Substances

  • DNA, Intergenic

Associated data

  • GEO/GSE53149