Scalable biclustering - the future of big data exploration?

Patryk Orzechowski; Krzysztof Boryczko; Jason H Moore

doi:10.1093/gigascience/giz078

Scalable biclustering - the future of big data exploration?

Gigascience. 2019 Jul 1;8(7):giz078. doi: 10.1093/gigascience/giz078.

Authors

Patryk Orzechowski^{1

2}, Krzysztof Boryczko³, Jason H Moore¹

Affiliations

¹ Institute for Biomedical Informatics, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA.
² Department of Automatics and Robotics, AGH University of Science and Technology, al. A. Mickiewicza 30, Kraków 30-059, Poland.
³ Department of Computer Science, AGH University of Science and Technology, al. A. Mickiewicza 30, Kraków 30-059, Poland.

Abstract

Biclustering is a technique of discovering local similarities within data. For many years the complexity of the methods and parallelization issues limited its application to big data problems. With the development of novel scalable methods, biclustering has finally started to close this gap. In this paper we discuss the caveats of biclustering and present its current challenges and guidelines for practitioners. We also try to explain why biclustering may soon become one of the standards for big data analytics.

Keywords: biclustering; big data; biomarker detection; co-clustering; data mining; disease subtype identification; gene-drug interaction; parallel algorithms; precision medicine.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Big Data*
Cluster Analysis
Data Mining / methods
Genome, Human
Genomics / methods*
Genomics / standards
Humans
Sequence Alignment / methods
Sequence Alignment / standards
Sequence Analysis, DNA / methods*
Sequence Analysis, DNA / standards
Software

Grants and funding

R01 LM012601/LM/NLM NIH HHS/United States