DPred_3S: identifying dihydrouridine (D) modification on three species epitranscriptome based on multiple sequence-derived features

Front Genet. 2023 Dec 15:14:1334132. doi: 10.3389/fgene.2023.1334132. eCollection 2023.

Abstract

Introduction: Dihydrouridine (D) is a conserved modification of tRNA among all three life domains. D modification enhances the flexibility of a single nucleotide base in the spatial structure and is disease- and evolution-associated. Recent studies have also suggested the presence of dihydrouridine on mRNA. Methods: To identify D in epitranscriptome, we provided a prediction framework named "DPred_3S" based on the machine learning approach for three species D epitranscriptome, which used epitranscriptome sequencing data as training data for the first time. Results: The optimal features were evaluated by the F-score and integration of different features; our model achieved area under the receiver operating characteristic curve (AUROC) scores 0.955, 0.946, and 0.905 for Saccharomyces cerevisiae, Escherichia coli, and Schizosaccharomyces pombe, respectively. The performances of different machine learning algorithms were also compared in this study. Discussion: The high performances of our model suggest the D sites can be distinguished based on their surrounding sequence, but the lower performance of cross-species prediction may be limited by technique preferences.

Keywords: Escherichia coli; Saccharomyces cerevisiae; Schizosaccharomyces pombe; dihydrouridine; machine learning.

Grants and funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This research was funded by the Scientific Research Foundation for Advanced Talents of Fujian Medical University (XRCZX2020012).