Distinguishing mirtrons from canonical miRNAs with data exploration and machine learning methods

Grzegorz Rorbach; Olgierd Unold; Bogumil M Konopka

doi:10.1038/s41598-018-25578-3

Distinguishing mirtrons from canonical miRNAs with data exploration and machine learning methods

Sci Rep. 2018 May 15;8(1):7560. doi: 10.1038/s41598-018-25578-3.

Authors

Grzegorz Rorbach¹, Olgierd Unold¹, Bogumil M Konopka²

Affiliations

¹ Department of Computer Engineering, Faculty of Electronics, Wroclaw University of Science and Technology, Wroclaw, Poland.
² Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland. bogumil.konopka@pwr.edu.pl.

Abstract

Mirtrons are non-canonical microRNAs encoded in introns the biogenesis of which starts with splicing. They are not processed by Drosha and enter the canonical pathway at the Exportin-5 level. Mirtrons are much less evolutionary conserved than canonical miRNAs. Due to the differences, canonical miRNA predictors are not applicable to mirtron prediction. Identification of differences is important for designing mirtron prediction algorithms and may help to improve the understanding of mirtron functioning. So far, only simple, single-feature comparisons were reported. These are insensitive to complex feature relations. We quantified miRNAs with 25 features and showed that it is impossible to distinguish the two miRNA species using simple thresholds on any single feature. However, when using the Principal Component Analysis mirtrons and canonical miRNAs are grouped separately. Moreover, several methodologically diverse machine learning classifiers delivered high classification performance. Using feature selection algorithms we found features (e.g. bulges in the stem region), previously reported divergent in two classes, that did not contribute to improving classification accuracy, which suggests that they are not biologically meaningful. Finally, we proposed a combination of the most important features (including Guanine content, hairpin free energy and hairpin length) which convey a specific pattern, crucial for identifying mirtrons.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Animals
Base Composition
Computational Biology / methods*
Databases, Genetic
Humans
Introns
Machine Learning
Mice
MicroRNAs / chemistry*
MicroRNAs / genetics*
Models, Molecular
Nucleic Acid Conformation
Principal Component Analysis

Substances

MicroRNAs