Extracting gene expression profiles common to colon and pancreatic adenocarcinoma using simultaneous nonnegative matrix factorization

Pac Symp Biocomput. 2008:267-78.

Abstract

In this paper we introduce a clustering algorithm capable of simultaneously factorizing two distinct gene expression datasets with the aim of uncovering gene regulatory programs that are common to the two phenotypes. The siNMF algorithm simultaneously searches for two factorizations that share the same gene expression profiles. The two key ingredients of this algorithm are the nonnegativity constraint and the offset variables, which together ensure the sparseness of the factorizations. While cancer is a very heterogeneous disease, there is overwhelming recent evidence that the differences between cancer subtypes implicate entire pathways and biological processes involving large numbers of genes, rather than changes in single genes. We have applied our simultaneous factorization algorithm looking for gene expression profiles that are common between the more homogeneous pancreatic ductal adenocarcinoma (PDAC) and the more heterogeneous colon adenocarcinoma. The fact that the PDAC signature is active in a large fraction of colon adeocarcinoma suggests that the oncogenic mechanisms involved may be similar to those in PDAC, at least in this subset of colon samples. There are many approaches to uncovering common mechanisms involved in different phenotypes, but most are based on comparing gene lists. The approach presented in this paper additionally takes gene expression data into account and can thus be more sensitive.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenocarcinoma / genetics*
  • Algorithms
  • Carcinoma, Pancreatic Ductal / genetics*
  • Colonic Neoplasms / genetics*
  • Computational Biology
  • Data Interpretation, Statistical
  • Databases, Genetic
  • Gene Expression Profiling / statistics & numerical data*
  • Humans
  • Pancreatic Neoplasms / genetics*