Overlapping translation of nucleic acid sequences for bioinformatics applications

Med Hypotheses. 2003 May;60(5):654-9. doi: 10.1016/s0306-9877(03)00008-2.

Abstract

Summary: An alternative method to TblastX has been developed. Nucleic acids in database and query sequences were translated into overlapping protein-like sequences (overlappingly translated sequences or OTSs) before searching with BlastP. Thus, each nucleic acid sequences is represented by a single 'protein like' sequence instead of three 'proteins' in different reading frames. The 3x3 comparison of TblastX is represented by a single comparison, giving faster results. Additional advantages are: (1) it can be more sensitive to detect weak sequence similarities than either blastN or TblastX; (2) codon redundancy is eliminated; (3) the sensitivity to single nucleotide polymorphism, mutation and sequencing errors is reduced; (4) it is insensitive to frame shifts.

Results: BlastP using OTS detected about two thirds of blastN and TblastX matches but discovered additional similarities. When blastN and TblastX against nucleic acids were compared to blastP against OTS, identical matches discovered by blastP were generally longer (602, respectively. 213 letters, p<0.01), had higher scores (748 respectively 460 bits, p<0.05) and lower E values (3.16E-20 vs. 1.17E+03, p<0.01) but the percentage identity was lower (25% respectively 61%, p<0.001). A qualitative evaluation with LALIGN showed an improvement of the visualization when OTS-s were used instead of nucleic acids. Many extensive sequence similarities became better visible, for example the repeating similarity between prion protein and human insulin gene micro-satellite, and the surprising similarity between the first part of prion protein coding region and the human pro-insulin (34.4% identity and additional 17.2% similarity through 238 residues, score >295 which is expected 4.6e-18 times by chance).

MeSH terms

  • Amino Acid Sequence
  • Computational Biology*
  • Insulin / chemistry
  • Insulin / genetics
  • Molecular Sequence Data
  • Nucleic Acids / chemistry
  • Nucleic Acids / genetics*
  • Prions / chemistry
  • Prions / genetics
  • Protein Biosynthesis*
  • Sensitivity and Specificity
  • Sequence Homology, Amino Acid

Substances

  • Insulin
  • Nucleic Acids
  • Prions