novoSNP, a novel computational tool for sequence variation discovery

Genome Res. 2005 Mar;15(3):436-42. doi: 10.1101/gr.2754005.

Abstract

Technological improvements shifted sequencing from low-throughput, work-intensive, gel-based systems to high-throughput capillary systems. This resulted in a broad use of genomic resequencing to identify sequence variations in genes and regulatory, as well as extended genomic regions. We describe a software package, novoSNP, that conscientiously discovers single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs) in sequence trace files in a fast, reliable, and user-friendly way. We compared the performance of novoSNP with that of PolyPhred and PolyBayes on two data sets. The first data set comprised 1028 sequence trace files obtained from diagnostic mutation analyses of SCN1A (neuronal voltage-gated sodium channel alpha-subunit type I gene). The second data set comprised 9062 sequence trace files from a genomic resequencing project aiming at the construction of a high-density SNP map of MAPT (microtubule-associated protein tau gene). Visual inspection of these data sets had identified 38 sequence variations for SCN1A and 488 for MAPT. novoSNP automatically identified all 38 SCN1A variations including five INDELs, while for MAPT only 15 of the 488 variations were not correctly marked. PolyPhred detected far fewer SNPs as compared to novoSNP and missed nearly all INDELs. PolyBayes, designed for the sequence analysis of cloned templates, detected only a limited number of the variations present in the data set. Besides the significant improvement in the automated detection of sequence variations both in diagnostic mutation analyses and in SNP discovery projects, novoSNP also offers a user-friendly interface for inspecting possible genetic variations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • DNA / genetics
  • Genetic Variation
  • Genomics / methods*
  • Genomics / statistics & numerical data
  • Humans
  • Polymorphism, Genetic
  • Polymorphism, Single Nucleotide*
  • Sequence Analysis, DNA
  • Software*

Substances

  • DNA