Non-redundant compendium of human ncRNA genes in GeneCards

Bioinformatics. 2013 Jan 15;29(2):255-61. doi: 10.1093/bioinformatics/bts676. Epub 2012 Nov 19.

Abstract

Motivation: Non-coding RNA (ncRNA) genes are increasingly acknowledged for their importance in the human genome. However, there is no comprehensive non-redundant database for all such human genes.

Results: We leveraged the effective platform of GeneCards, the human gene compendium, together with the power of fRNAdb and additional primary sources, to judiciously unify all ncRNA gene entries obtainable from 15 different primary sources. Overlapping entries were clustered to unified locations based on an algorithm employing genomic coordinates. This allowed GeneCards' gamut of relevant entries to rise ∼5-fold, resulting in ∼80,000 human non-redundant ncRNAs, belonging to 14 classes. Such 'grand unification' within a regularly updated data structure will assist future ncRNA research.

Availability and implementation: All of these non-coding RNAs are included among the ∼122,500 entries in GeneCards V3.09, along with pertinent annotation, automatically mined by its built-in pipeline from 100 data sources. This information is available at www.genecards.org.

Contact: Frida.Belinky@weizmann.ac.il

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Databases, Genetic*
  • Genes
  • Genome, Human
  • Genomics
  • Humans
  • Internet
  • Molecular Sequence Annotation
  • RNA, Untranslated / genetics*

Substances

  • RNA, Untranslated