Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods

Brief Bioinform. 2020 Jul 15;21(4):1425-1436. doi: 10.1093/bib/bbz080.

Abstract

Identification of disease-associated circular RNAs (circRNAs) is of critical importance, especially with the dramatic increase in the amount of circRNAs. However, the availability of experimentally validated disease-associated circRNAs is limited, which restricts the development of effective computational methods. To our knowledge, systematic approaches for the prediction of disease-associated circRNAs are still lacking. In this study, we propose the use of deep forests combined with positive-unlabeled learning methods to predict potential disease-related circRNAs. In particular, a heterogeneous biological network involving 17 961 circRNAs, 469 miRNAs, and 248 diseases was constructed, and then 24 meta-path-based topological features were extracted. We applied 5-fold cross-validation on 15 disease data sets to benchmark the proposed approach and other competitive methods and used Recall@k and PRAUC@k to evaluate their performance. In general, our method performed better than the other methods. In addition, the performance of all methods improved with the accumulation of known positive labels. Our results provided a new framework to investigate the associations between circRNA and disease and might improve our understanding of its functions.

Keywords: biological networks; circular RNAs; deep forests; disease; positive-unlabeled learning; topological features.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • Disease / genetics*
  • Humans
  • RNA, Circular / genetics*

Substances

  • RNA, Circular