Ballast: blast post-processing based on locally conserved segments

Bioinformatics. 2000 Sep;16(9):750-9. doi: 10.1093/bioinformatics/16.9.750.

Abstract

Motivation: Blast programs are very efficient in finding relatively strong similarities but some very distantly related sequences are given a very high Expect value and are ranked very low in Blast results. We have developed Ballast, a program to predict local maximum segments (LMSs-i.e. sequence segments conserved relatively to their flanking regions) from a single Blast database search and to highlight these divergent homologues. The TBlastN database searches can also be processed with the help of information from a joint BlastP search.

Results: We have applied the Ballast algorithm to BlastP searches performed with sequences belonging to well described dispersed families (aminoacyl-tRNA synthetases; helicases) against the SwissProt 38 database. We show that Ballast is able to build an appropriate conservation profile and that LMSs are predicted that are consistent with the signatures and motifs described in the literature. Furthermore, by comparing the Blast, PsiBlast and Ballast results obtained on a well defined database of structurally related sequences, we show that the LMSs provide a scoring scheme that can concentrate on top ranking distant homologues better than Blast. Using the graphical user interface available on the Web, specific LMSs may be selected to detect divergent homologues sharing the corresponding properties with the query sequence without requiring any additional database search.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Sequence / genetics
  • Animals
  • Caenorhabditis elegans
  • Computational Biology / methods*
  • Conserved Sequence / genetics*
  • Databases, Factual
  • Genomics / methods*
  • Haemophilus influenzae
  • Humans
  • Internet
  • Predictive Value of Tests
  • Proteins / chemistry
  • Proteins / genetics
  • Reproducibility of Results
  • Saccharomyces cerevisiae
  • Sequence Alignment / methods*
  • Sequence Homology, Amino Acid
  • User-Computer Interface

Substances

  • Proteins