Frame: detection of genomic sequencing errors

Bioinformatics. 1998;14(4):367-71. doi: 10.1093/bioinformatics/14.4.367.

Abstract

Motivation: The underlying error rate for genomic sequencing sometimes results in the introduction of artificial frameshifts and in-frame stop codons into putative protein encoding genes. Severe errors are then introduced into the inferred transcripts through mis-translation or premature termination.

Results: We describe a system for screening segments of DNA for frameshift and in-frame stop errors in coding regions. The method is based on homology matching using blastx to compare all six reading frames of the query nucleotide sequence against selected protein sequence databases. Fragments of protein matching neighbouring regions of the query DNA are united and extended laterally to define candidate open reading frames, within which, frameshifts and stops are identified. Suitable targets include prokaryotic or other intron-free genomic sequence and complementary DNAs. As an example of its use, we report here two frameshifted ORFs that deviate from the original TIGR sequence annotations for the recently released Helicobacter pylori genome.

Availability: The tool is accessible via the URL http://www.sander.ebi.ac.uk/frame/.

Contact: brown@ebi.ac.uk.

MeSH terms

  • Amino Acid Sequence
  • Computer Communication Networks
  • DNA / analysis*
  • DNA, Bacterial / analysis
  • Databases, Factual*
  • Genome
  • Information Storage and Retrieval
  • Open Reading Frames
  • Repetitive Sequences, Nucleic Acid*
  • Sequence Homology*
  • Software*
  • Transcription, Genetic*

Substances

  • DNA, Bacterial
  • DNA