A gene-set, an important concept in microarray expression analysis and systems biology, is a collection of genes and/or their products (i.e. proteins) that have some features in common. There are many different ways to construct gene-sets, but a systematic organization of these ways is lacking. Gene-sets are mainly organized ad hoc in current public-domain databases, with group header names often determined by practical reasons (such as the types of technology in obtaining the gene-sets or a balanced number of gene-sets under a header). Here we aim at providing a gene-set organization principle according to the level at which genes are connected: homology, physical map proximity, chemical interaction, biological, and phenotypic-medical levels. We also distinguish two types of connections between genes: actual connection versus sharing of a label. Actual connections denote direct biological interactions, whereas shared label connection denotes shared membership in a group. Some extensions of the framework are also addressed such as overlapping of gene-sets, modules, and the incorporation of other non-protein-coding entities such as microRNAs.
Keywords: Biological pathways; Co-differential expression; Co-expression; Co-localization; Disease genes; Essential genes; Gene Ontology (GO); Gene families; Gene-sets; Housekeeping genes; Modules; Operon; Protein complex; Protein domains; Protein–protein interaction; Tissue-specific genes; Transcription factor target.
Copyright © 2015 Elsevier Ltd. All rights reserved.