COL, a pipeline for identifying putatively functional back-splicing

bioRxiv [Preprint]. 2023 Nov 13:2023.11.08.566217. doi: 10.1101/2023.11.08.566217.

Abstract

Circular RNAs (circRNAs) are a class of generally non-coding RNAs produced by back-splicing. Although the vast majority of circRNAs are likely to be products of splicing error and thereby confer no benefits to organisms, a small number of circRNAs have been found to be functional. Identifying other functional circRNAs from the sea of mostly non-functional circRNAs is an important but difficult task. Because available experimental methods for this purpose are of low throughput or versality and existing computational methods have limited reliability or applicability, new methods are needed. We hypothesize that functional back-splicing events that generate functional circRNAs (i) exhibit substantially higher back-splicing rates than expected from the total splicing amounts, (ii) have conserved splicing motifs, and (iii) show unusually high back-splicing levels. We confirm these features in back-splicing shared among human, macaque, and mouse, which should enrich functional back-splicing. Integrating the three features, we design a computational pipeline named COL for identifying putatively functional back-splicing. Using experimentally verified functional back-splicing as a benchmark, we find COL to outperform a commonly used computational method with a similar data requirement. We conclude that COL is an efficient and versatile method for rapid identification of putatively functional back-splicing and circRNAs that can be experimentally validated.

Publication types

  • Preprint