Inferring an organism-specific optimal threshold for predicting protein coding regions in eukaryotes based on a bootstrapping algorithm

Biotechnol Lett. 2011 May;33(5):889-96. doi: 10.1007/s10529-011-0525-8. Epub 2011 Jan 14.

Abstract

The accuracy of prediction methods based on power spectrum analysis depends on the threshold that is used to discriminate between protein coding and non-coding sequences in the genomes of eukaryotes. Because the structure of genes vary among different eukaryotes, it is difficult to determine the best prediction threshold for a eukaryote relying only on prior biological knowledge. To improve the accuracy of prediction methods based on power spectral analysis, we developed a novel method based on a bootstrap algorithm to infer organism-specific optimal thresholds for eukaryotes. As prior information, our method requires the input of only a few annotated protein coding regions from the organism being studied. Our results show that using the calculated optimal thresholds for our test datasets, the average prediction accuracy of our method is 81%, an increase of 19% over that obtained using the same empirical threshold P=4 for all datasets. The proposed method is simple and convenient and easily applied to infer optimal thresholds that can be used to predict coding regions in the genomes of most organisms.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Eukaryota / genetics*
  • Open Reading Frames*