Improving N(6)-methyladenosine site prediction with heuristic selection of nucleotide physical-chemical properties

Anal Biochem. 2016 Sep 1:508:104-13. doi: 10.1016/j.ab.2016.06.001. Epub 2016 Jun 11.

Abstract

N(6)-methyladenosine (m(6)A) is one of the most common and abundant post-transcriptional RNA modifications found in viruses and most eukaryotes. m(6)A plays an essential role in many vital biological processes to regulate gene expression. Because of its widespread distribution across the genomes, the identification of m(6)A sites from RNA sequences is of significant importance for better understanding the regulatory mechanism of m(6)A. Although progress has been achieved in m(6)A site prediction, challenges remain. This article aims to further improve the performance of m(6)A site prediction by introducing a new heuristic nucleotide physical-chemical property selection (HPCS) algorithm. The proposed HPCS algorithm can effectively extract an optimized subset of nucleotide physical-chemical properties under the prescribed feature representation for encoding an RNA sequence into a feature vector. We demonstrate the efficacy of the proposed HPCS algorithm under different feature representations, including pseudo dinucleotide composition (PseDNC), auto-covariance (AC), and cross-covariance (CC). Based on the proposed HPCS algorithm, we implemented an m(6)A site predictor, called M6A-HPCS, which is freely available at http://csbio.njust.edu.cn/bioinf/M6A-HPCS. Experimental results over rigorous jackknife tests on benchmark datasets demonstrated that the proposed M6A-HPCS achieves higher success rates and outperforms existing state-of-the-art sequence-based m(6)A site predictors.

Keywords: Feature representation; N(6)-methyladenosine; Physical–chemical property selection; RNA sequence; Support vector machine.

MeSH terms

  • Adenosine / analogs & derivatives*
  • Adenosine / chemistry
  • Algorithms*
  • Binding Sites
  • Heuristics
  • Nucleotides / chemistry*

Substances

  • Nucleotides
  • N-methyladenosine
  • Adenosine