It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data

Juan Xie; Anjun Ma; Anne Fennell; Qin Ma; Jing Zhao

doi:10.1093/bib/bby014

It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data

Brief Bioinform. 2019 Jul 19;20(4):1449-1464. doi: 10.1093/bib/bby014.

Authors

Juan Xie, Anjun Ma, Anne Fennell, Qin Ma, Jing Zhao

Abstract

Biclustering is a powerful data mining technique that allows clustering of rows and columns, simultaneously, in a matrix-format data set. It was first applied to gene expression data in 2000, aiming to identify co-expressed genes under a subset of all the conditions/samples. During the past 17 years, tens of biclustering algorithms and tools have been developed to enhance the ability to make sense out of large data sets generated in the wake of high-throughput omics technologies. These algorithms and tools have been applied to a wide variety of data types, including but not limited to, genomes, transcriptomes, exomes, epigenomes, phenomes and pharmacogenomes. However, there is still a considerable gap between biclustering methodology development and comprehensive data interpretation, mainly because of the lack of knowledge for the selection of appropriate biclustering tools and further supporting computational techniques in specific studies. Here, we first deliver a brief introduction to the existing biclustering algorithms and tools in public domain, and then systematically summarize the basic applications of biclustering for biological data and more advanced applications of biclustering for biomedical data. This review will assist researchers to effectively analyze their big data and generate valuable biological knowledge and novel insights with higher efficiency.

Keywords: biclustering; biomarker and gene signatures detection; disease subtype identification; functional annotation; gene–drug association; modularity analysis; network elucidation.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Review

MeSH terms

Algorithms
Big Data
Cluster Analysis*
Computational Biology / methods*
Data Mining / methods*
Databases, Genetic / statistics & numerical data
Disease / classification
Disease / genetics
Gene Expression / drug effects
Gene Expression Profiling / statistics & numerical data
Gene Regulatory Networks
Humans
Molecular Sequence Annotation / statistics & numerical data

Grants and funding

U01 HG007253/HG/NHGRI NIH HHS/United States