A bi-ordering approach to linking gene expression with clinical annotations in gastric cancer

BMC Bioinformatics. 2010 Sep 23:11:477. doi: 10.1186/1471-2105-11-477.

Abstract

Background: In the study of cancer genomics, gene expression microarrays, which measure thousands of genes in a single assay, provide abundant information for the investigation of interesting genes or biological pathways. However, in order to analyze the large number of noisy measurements in microarrays, effective and efficient bioinformatics techniques are needed to identify the associations between genes and relevant phenotypes. Moreover, systematic tests are needed to validate the statistical and biological significance of those discoveries.

Results: In this paper, we develop a robust and efficient method for exploratory analysis of microarray data, which produces a number of different orderings (rankings) of both genes and samples (reflecting correlation among those genes and samples). The core algorithm is closely related to biclustering, and so we first compare its performance with several existing biclustering algorithms on two real datasets - gastric cancer and lymphoma datasets. We then show on the gastric cancer data that the sample orderings generated by our method are highly statistically significant with respect to the histological classification of samples by using the Jonckheere trend test, while the gene modules are biologically significant with respect to biological processes (from the Gene Ontology). In particular, some of the gene modules associated with biclusters are closely linked to gastric cancer tumorigenesis reported in previous literature, while others are potentially novel discoveries.

Conclusion: In conclusion, we have developed an effective and efficient method, Bi-Ordering Analysis, to detect informative patterns in gene expression microarrays by ranking genes and samples. In addition, a number of evaluation metrics were applied to assess both the statistical and biological significance of the resulting bi-orderings. The methodology was validated on gastric cancer and lymphoma datasets.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Databases, Genetic
  • Gene Expression Profiling / methods*
  • Gene Regulatory Networks
  • Humans
  • Lymphoma / genetics
  • Molecular Sequence Annotation / methods*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated / methods
  • Stomach Neoplasms / genetics*