Principles for the organization of gene-sets

Comput Biol Chem. 2015 Dec:59 Pt B:139-49. doi: 10.1016/j.compbiolchem.2015.04.005. Epub 2015 Jun 10.

Abstract

A gene-set, an important concept in microarray expression analysis and systems biology, is a collection of genes and/or their products (i.e. proteins) that have some features in common. There are many different ways to construct gene-sets, but a systematic organization of these ways is lacking. Gene-sets are mainly organized ad hoc in current public-domain databases, with group header names often determined by practical reasons (such as the types of technology in obtaining the gene-sets or a balanced number of gene-sets under a header). Here we aim at providing a gene-set organization principle according to the level at which genes are connected: homology, physical map proximity, chemical interaction, biological, and phenotypic-medical levels. We also distinguish two types of connections between genes: actual connection versus sharing of a label. Actual connections denote direct biological interactions, whereas shared label connection denotes shared membership in a group. Some extensions of the framework are also addressed such as overlapping of gene-sets, modules, and the incorporation of other non-protein-coding entities such as microRNAs.

Keywords: Biological pathways; Co-differential expression; Co-expression; Co-localization; Disease genes; Essential genes; Gene Ontology (GO); Gene families; Gene-sets; Housekeeping genes; Modules; Operon; Protein complex; Protein domains; Protein–protein interaction; Tissue-specific genes; Transcription factor target.

Publication types

  • Review

MeSH terms

  • Animals
  • Databases, Genetic*
  • Gene Regulatory Networks*
  • Genotype
  • Humans
  • Neoplasms / diagnosis
  • Neoplasms / drug therapy
  • Neoplasms / genetics*
  • Oligonucleotide Array Sequence Analysis
  • Phenotype
  • Systems Biology*