A global clustering algorithm to identify long intergenic non-coding RNA--with applications in mouse macrophages

PLoS One. 2011;6(9):e24051. doi: 10.1371/journal.pone.0024051. Epub 2011 Sep 30.

Abstract

Identification of diffuse signals from the chromatin immunoprecipitation and high-throughput massively parallel sequencing (ChIP-Seq) technology poses significant computational challenges, and there are few methods currently available. We present a novel global clustering approach to enrich diffuse CHIP-Seq signals of RNA polymerase II and histone 3 lysine 4 trimethylation (H3K4Me3) and apply it to identify putative long intergenic non-coding RNAs (lincRNAs) in macrophage cells. Our global clustering method compares favorably to the local clustering method SICER that was also designed to identify diffuse CHIP-Seq signals. The validity of the algorithm is confirmed at several levels. First, 8 out of a total of 11 selected putative lincRNA regions in primary macrophages respond to lipopolysaccharides (LPS) treatment as predicted by our computational method. Second, the genes nearest to lincRNAs are enriched with biological functions related to metabolic processes under resting conditions but with developmental and immune-related functions under LPS treatment. Third, the putative lincRNAs have conserved promoters, modestly conserved exons, and expected secondary structures by prediction. Last, they are enriched with motifs of transcription factors such as PU.1 and AP.1, previously shown to be important lineage determining factors in macrophages, and 83% of them overlap with distal enhancers markers. In summary, GCLS based on RNA polymerase II and H3K4Me3 CHIP-Seq method can effectively detect putative lincRNAs that exhibit expected characteristics, as exemplified by macrophages in the study.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Animals
  • Chromatin Immunoprecipitation / methods
  • Cluster Analysis
  • Computational Biology / methods*
  • Genome
  • Histones / chemistry
  • Humans
  • Lipopolysaccharides / metabolism
  • Macrophages / metabolism
  • Mice
  • Oligonucleotide Array Sequence Analysis
  • RNA Polymerase II / metabolism
  • RNA, Untranslated / genetics*
  • Sequence Analysis, DNA

Substances

  • Histones
  • Lipopolysaccharides
  • RNA, Untranslated
  • RNA Polymerase II