Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format

Alhadi Bustamam; Kevin Burrage; Nicholas A Hamilton

doi:10.1109/TCBB.2011.68

Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format

IEEE/ACM Trans Comput Biol Bioinform. 2012 May-Jun;9(3):679-92. doi: 10.1109/TCBB.2011.68.

Authors

Alhadi Bustamam¹, Kevin Burrage, Nicholas A Hamilton

Affiliation

¹ Department of Mathematics, University of Indonesia, Depok 16424, Indonesia. alhadi@sci.ui.ac.id

PMID: 21483031
DOI: 10.1109/TCBB.2011.68

Abstract

Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However,with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Cluster Analysis
Computational Biology / methods
Computer Graphics*
Computer Simulation
Markov Chains
Oligonucleotide Array Sequence Analysis