iRNA-ac4C: A novel computational method for effectively detecting N4-acetylcytidine sites in human mRNA

Int J Biol Macromol. 2023 Feb 1:227:1174-1181. doi: 10.1016/j.ijbiomac.2022.11.299. Epub 2022 Dec 5.

Abstract

RNA N4-acetylcytidine (ac4C) is the acetylation of cytidine at the nitrogen-4 position, which is a highly conserved RNA modification and involves a variety of biological processes. Hence, accurate identification of genome-wide ac4C sites is vital for understanding regulation mechanism of gene expression. In this work, a novel predictor, named iRNA-ac4C, was established to identify ac4C sites in human mRNA based on three feature extraction methods, including nucleotide composition, nucleotide chemical property, and accumulated nucleotide frequency. Subsequently, minimum-Redundancy-Maximum-Relevance combined with incremental feature selection strategies was utilized to select the optimal feature subset. According to the optimal feature subset, the best ac4C classification model was trained by gradient boosting decision tree with 10-fold cross-validation. The results of independent testing set indicated that our proposed method could produce encouraging generalization capabilities. For the convenience of other researchers, we established a user-friendly web server which is freely available at http://lin-group.cn/server/iRNA-ac4C/. We hope that the tool could provide guide for wet-experimental scholars.

Keywords: Feature selection; Gradient boosting decision tree; Machine learning; N4-acetylcytidine.

MeSH terms

  • Cytidine* / genetics
  • Cytidine* / metabolism
  • Humans
  • Nucleotides
  • RNA* / chemistry
  • RNA, Messenger / metabolism

Substances

  • RNA, Messenger
  • N-acetylcytidine
  • Cytidine
  • RNA
  • Nucleotides