Hadoop-MCC: Efficient Multiple Compound Comparison Algorithm Using Hadoop

Comb Chem High Throughput Screen. 2018;21(2):84-92. doi: 10.2174/1386207321666180102120641.

Abstract

Aim and objective: In the past decade, the drug design technologies have been improved enormously. The computer-aided drug design (CADD) has played an important role in analysis and prediction in drug development, which makes the procedure more economical and efficient. However, computation with big data, such as ZINC containing more than 60 million compounds data and GDB-13 with more than 930 million small molecules, is a noticeable issue of time-consuming problem. Therefore, we propose a novel heterogeneous high performance computing method, named as Hadoop-MCC, integrating Hadoop and GPU, to copy with big chemical structure data efficiently.

Materials and methods: Hadoop-MCC gains the high availability and fault tolerance from Hadoop, as Hadoop is used to scatter input data to GPU devices and gather the results from GPU devices. Hadoop framework adopts mapper/reducer computation model. In the proposed method, mappers response for fetching SMILES data segments and perform LINGO method on GPU, then reducers collect all comparison results produced by mappers. Due to the high availability of Hadoop, all of LINGO computational jobs on mappers can be completed, even if some of the mappers encounter problems.

Results: A comparison of LINGO is performed on each the GPU device in parallel. According to the experimental results, the proposed method on multiple GPU devices can achieve better computational performance than the CUDA-MCC on a single GPU device.

Conclusion: Hadoop-MCC is able to achieve scalability, high availability, and fault tolerance granted by Hadoop, and high performance as well by integrating computational power of both of Hadoop and GPU. It has been shown that using the heterogeneous architecture as Hadoop-MCC effectively can enhance better computational performance than on a single GPU device.

Keywords: GPU; LINGO; big data; compound comparision; hadoop; high performance computing..

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Big Data
  • Computer-Aided Design*
  • Computing Methodologies*
  • Drug Design*
  • Molecular Structure
  • Quantitative Structure-Activity Relationship