Measures for the degree of overlap of gene signatures and applications to TCGA

Brief Bioinform. 2015 Sep;16(5):735-44. doi: 10.1093/bib/bbu049. Epub 2014 Dec 31.

Abstract

For cancer and many other complex diseases, a large number of gene signatures have been generated. In this study, we use cancer as an example and note that other diseases can be analyzed in a similar manner. For signatures generated in multiple independent studies on the same cancer type and outcome, and for signatures on different cancer types, it is of interest to evaluate their degree of overlap. Many of the existing studies simply count the number (or percentage) of overlapped genes shared by two signatures. Such an approach has serious limitations. In this study, as a demonstrating example, we consider cancer prognosis data under the Cox model. Lasso, which is representative of a large number of regularization methods, is adopted for generating gene signatures. We examine two families of measures for quantifying the degree of overlap. The first family is based on the Cox-Lasso estimates at the optimal tunings, and the second family is based on estimates across the whole solution paths. Within each family, multiple measures, which describe the overlap from different perspectives, are introduced. The analysis of TCGA (The Cancer Genome Atlas) data on five cancer types shows that the degree of overlap varies across measures, cancer types and types of (epi)genetic measurements. More investigations are needed to better describe and understand the overlaps among gene signatures.

Keywords: TCGA; cancer prognosis; degree of overlap; gene signature.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Gene Expression Profiling*
  • Humans
  • Neoplasms / genetics*
  • Prognosis
  • Proportional Hazards Models