Using Grid technology for computationally intensive applied bioinformatics analyses

In Silico Biol. 2006;6(6):495-504.

Abstract

For several applications and algorithms used in applied bioinformatics, a bottle neck in terms of computational time may arise when scaled up to facilitate analyses of large datasets and databases. Re-codification, algorithm modification or sacrifices in sensitivity and accuracy may be necessary to accommodate for limited computational capacity of single work stations. Grid computing offers an alternative model for solving massive computational problems by parallel execution of existing algorithms and software implementations. We present the implementation of a Grid-aware model for solving computationally intensive bioinformatic analyses exemplified by a blastp sliding window algorithm for whole proteome sequence similarity analysis, and evaluate the performance in comparison with a local cluster and a single workstation. Our strategy involves temporary installations of the BLAST executable and databases on remote nodes at submission, accommodating for dynamic Grid environments as it avoids the need of predefined runtime environments (preinstalled software and databases at specific Grid-nodes). Importantly, the implementation is generic where the BLAST executable can be replaced by other software tools to facilitate analyses suitable for parallelisation. This model should be of general interest in applied bioinformatics. Scripts and procedures are freely available from the authors.

Publication types

  • Evaluation Study

MeSH terms

  • Algorithms
  • Computational Biology / statistics & numerical data*
  • Computer Simulation
  • Databases, Protein
  • Humans
  • Proteins / genetics
  • Proteomics / statistics & numerical data*
  • Sequence Alignment / statistics & numerical data

Substances

  • Proteins