Distributed Interactive Visualization Using GPU-Optimized Spark

IEEE Trans Vis Comput Graph. 2021 Sep;27(9):3670-3684. doi: 10.1109/TVCG.2020.2990894. Epub 2021 Jul 29.

Abstract

With the advent of advances in imaging and computing technologies, large-scale data acquisition and processing have become commonplace in many science and engineering disciplines. Conventional workflows for large-scale data processing usually rely on in-house or commercial software that are designed for domain-specific computing tasks. Recent advances in MapReduce, which was originally developed for batch processing textual data via a simplified programming model of the map and reduce functions, have expanded its applications to more general tasks in big-data processing, such as scientific computing, and biomedical image processing. However, as shown in previous work, volume rendering and visualization using MapReduce is still considered challenging and impractical owing to the disk-based, batch-processing nature of its computing model. In this article, contrary to this common belief, we show that the MapReduce computing model can be effectively used for interactive visualization. Our proposed system is a novel extension of Spark, one of the most popular open-source MapReduce frameworks, which offers GPU-accelerated MapReduce computing. To minimize CPU-GPU communication and overcome slow, disk-based shuffle performance, the proposed system supports GPU in-memory caching and MPI-based direct communication between compute nodes. To allow for GPU-accelerated in-situ visualization using raster graphics in Spark, we leveraged the CUDA-OpenGL interoperability, resulting in faster processing speeds by several orders of magnitude compared to conventional MapReduce systems. We demonstrate the performance of our system via several volume processing and visualization tasks, such as direct volume rendering, iso-surface extraction, and numerical simulations with in-situ visualization.