A New Binary Biclustering Algorithm Based on Weight Adjacency Difference Matrix for Analyzing Gene Expression Data

IEEE/ACM Trans Comput Biol Bioinform. 2023 Sep-Oct;20(5):2802-2809. doi: 10.1109/TCBB.2023.3283801. Epub 2023 Oct 9.

Abstract

Biclustering algorithms are essential for processing gene expression data. However, to process the dataset, most biclustering algorithms require preprocessing the data matrix into a binary matrix. Regrettably, this type of preprocessing may introduce noise or cause information loss in the binary matrix, which would reduce the biclustering algorithm's ability to effectively obtain the optimal biclusters. In this paper, we propose a new preprocessing method named Mean-Standard Deviation (MSD) to resolve the problem. Additionally, we introduce a new biclustering algorithm called Weight Adjacency Difference Matrix Binary Biclustering (W-AMBB) to effectively process datasets containing overlapping biclusters. The basic idea is to create a weighted adjacency difference matrix by applying weights to a binary matrix that is derived from the data matrix. This allows us to identify genes with significant associations in sample data by efficiently identifying similar genes that respond to specific conditions. Furthermore, the performance of the W-AMBB algorithm was tested on both synthetic and real datasets and compared with other classical biclustering methods. The experiment results demonstrate that the W-AMBB algorithm is significantly more robust than the compared biclustering methods on the synthetic dataset. Additionally, the results of the GO enrichment analysis show that the W-AMBB method possesses biological significance on real datasets.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Gene Expression
  • Gene Expression Profiling* / methods
  • Oligonucleotide Array Sequence Analysis / methods