Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE

Hum Genomics. 2016 Oct 28;10(1):35. doi: 10.1186/s40246-016-0090-2.

Abstract

Background: Our understanding of the transcriptional potential of the genome and its functional consequences has undergone a significant change in the last decade. This has been largely contributed by the improvements in technology which could annotate and in many cases functionally characterize a number of novel gene loci in the human genome. Keeping pace with advancements in this dynamic environment and being able to systematically annotate a compendium of genes and transcripts is indeed a formidable task. Of the many databases which attempted to systematically annotate the genome, GENCODE has emerged as one of the largest and popular compendium for human genome annotations.

Results: The analysis of various versions of GENCODE revealed that there was a constant upgradation of transcripts for both protein-coding and long noncoding RNA (lncRNAs) leading to conflicting annotations. The GENCODE version 24 accounts for 4.18 % of the human genome to be transcribed which is an increase of 1.58 % from its first version. Out of 2,51,614 transcripts annotated across GENCODE versions, only 21.7 % had consistency. We also examined GENCODE consortia categorized transcripts into 70 biotypes out of which only 17 remained stable throughout.

Conclusions: In this report, we try to review the impact on the dynamicity with respect to gene annotations, specifically (lncRNA) annotations in GENCODE over the years. Our analysis suggests a significant dynamism in gene annotations, reflective of the evolution and consensus in nomenclature of genes. While a progressive change in annotations and timely release of the updates make the resource reliable in the community, the dynamicity with each release poses unique challenges to its users. Taking cues from other experiments with bio-curation, we propose potential avenues and methods to mend the gap.

Keywords: Annotations; GENCODE; Long noncoding RNAs; Transcripts.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Genetic*
  • Genome, Human
  • Humans
  • Molecular Sequence Annotation
  • Open Reading Frames*
  • RNA, Long Noncoding / genetics*

Substances

  • RNA, Long Noncoding