Genome-wide identification and predictive modeling of lincRNAs polyadenylation in cancer genome

Comput Biol Chem. 2014 Oct:52:1-8. doi: 10.1016/j.compbiolchem.2014.07.001. Epub 2014 Jul 27.

Abstract

Long noncoding RNAs (lncRNAs) play essential regulatory roles in the human cancer genome. Many identified lncRNAs are transcribed by RNA polymerase II in which they are polyadenylated, whereby the long intervening noncoding RNAs (lincRNAs) have been widely used for the researches of lncRNAs. To date, the mechanism of lincRNAs polyadenylation related to cancer is rarely fully understood yet. In this paper, first we reported a comprehensive map of global lincRNAs polyadenylation sites (PASs) in five human cancer genomes; second we proposed a grouping method based on the pattern of genes expression and the manner of alternative polyadenylation (APA); third we investigated the distribution of motifs surrounding PASs. Our analysis reveals that about 70% of PASs are located in the sense strand of lincRNAs. Also more than 90% PASs in the antisense strand of lincRNAs are located in the intron regions. In addition, around 40% of lincRNA genes with PASs has APA sites. Four obvious motifs i.e., AATAAA, TTTTTTTT, CCAGSCTGG, and RGYRYRGTGG were detected in the sequences surrounding PASs in the normal and cancer tissues. Furthermore, a novel algorithm was proposed to recognize the lincRNAs PASs of tumor tissues based on support vector machine (SVM). The algorithm can achieve the accuracies up to 96.55% and 89.48% for identification the tumor lincRNAs PASs from the non-polyadenylation sites and the non-lincRNA PASs, respectively.

Keywords: Long intervening noncoding RNA; Polyadenylation sites; Support vector machine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast / metabolism
  • Colon / metabolism
  • Female
  • Genome, Human
  • Humans
  • Kidney / metabolism
  • Liver / metabolism
  • Lung / metabolism
  • Neoplasms / genetics*
  • Polyadenylation*
  • RNA, Long Noncoding*
  • Support Vector Machine

Substances

  • RNA, Long Noncoding