RIP: the regulatory interaction predictor--a machine learning-based approach for predicting target genes of transcription factors

Bioinformatics. 2011 Aug 15;27(16):2239-47. doi: 10.1093/bioinformatics/btr366. Epub 2011 Jun 20.

Abstract

Motivation: Understanding transcriptional gene regulation is essential for studying cellular systems. Identifying genome-wide targets of transcription factors (TFs) provides the basis to discover the involvement of TFs and TF cooperativeness in cellular systems and pathogenesis.

Results: We present the regulatory interaction predictor (RIP), a machine learning approach that inferred 73 923 regulatory interactions (RIs) for 301 human TFs and 11 263 target genes with considerably good quality and 4516 RIs with very high quality. The inference of RIs is independent of any specific condition. Our approach employs support vector machines (SVMs) trained on a set of experimentally proven RIs from a public repository (TRANSFAC). Features of RIs for the learning process are based on a correlation meta-analysis of 4064 gene expression profiles from 76 studies, in silico predictions of transcription factor binding sites (TFBSs) and combinations of these employing knowledge about co-regulation of genes by a common TF (TF-module). The trained SVMs were applied to infer new RIs for a large set of TFs and genes. In a case study, we employed the inferred RIs to analyze an independent microarray dataset. We identified key TFs regulating the transcriptional response upon interferon alpha stimulation of monocytes, most prominently interferon-stimulated gene factor 3 (ISGF3). Furthermore, predicted TF-modules were highly associated to their functionally related pathways.

Conclusion: Descriptors of gene expression, TFBS predictions, experimentally verified binding information and statistical combination of this enabled inferring RIs on a genome-wide scale for human genes with considerably good precision serving as a good basis for expression profiling studies.

Contact: r.koenig@dkfz.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites
  • Gene Expression Profiling*
  • Gene Expression Regulation*
  • Genomics
  • Humans
  • Support Vector Machine*
  • Transcription Factors / metabolism*

Substances

  • Transcription Factors