Using Grid technology for computationally intensive applied bioinformatics analyses

Jorge Andrade; Lisa Berglund; Mathias Uhlén; Jacob Odeberg

Using Grid technology for computationally intensive applied bioinformatics analyses

In Silico Biol. 2006;6(6):495-504.

Authors

Jorge Andrade¹, Lisa Berglund, Mathias Uhlén, Jacob Odeberg

Affiliation

¹ Department of Biotechnology, Royal Institute of Technology (KTH), Stockholm, Sweden.

PMID: 17518760

Abstract

For several applications and algorithms used in applied bioinformatics, a bottle neck in terms of computational time may arise when scaled up to facilitate analyses of large datasets and databases. Re-codification, algorithm modification or sacrifices in sensitivity and accuracy may be necessary to accommodate for limited computational capacity of single work stations. Grid computing offers an alternative model for solving massive computational problems by parallel execution of existing algorithms and software implementations. We present the implementation of a Grid-aware model for solving computationally intensive bioinformatic analyses exemplified by a blastp sliding window algorithm for whole proteome sequence similarity analysis, and evaluate the performance in comparison with a local cluster and a single workstation. Our strategy involves temporary installations of the BLAST executable and databases on remote nodes at submission, accommodating for dynamic Grid environments as it avoids the need of predefined runtime environments (preinstalled software and databases at specific Grid-nodes). Importantly, the implementation is generic where the BLAST executable can be replaced by other software tools to facilitate analyses suitable for parallelisation. This model should be of general interest in applied bioinformatics. Scripts and procedures are freely available from the authors.

Publication types

Evaluation Study

MeSH terms

Algorithms
Computational Biology / statistics & numerical data*
Computer Simulation
Databases, Protein
Humans
Proteins / genetics
Proteomics / statistics & numerical data*
Sequence Alignment / statistics & numerical data

Substances

Proteins