ClassifyTE: a stacking-based prediction of hierarchical classification of transposable elements

Bioinformatics. 2021 Sep 9;37(17):2529-2536. doi: 10.1093/bioinformatics/btab146.

Abstract

Motivation: Transposable Elements (TEs) or jumping genes are DNA sequences that have an intrinsic capability to move within a host genome from one genomic location to another. Studies show that the presence of a TE within or adjacent to a functional gene may alter its expression. TEs can also cause an increase in the rate of mutation and can even mediate duplications and large insertions and deletions in the genome, promoting gross genetic rearrangements. The proper classification of identified jumping genes is important for analyzing their genetic and evolutionary effects. An effective classifier, which can explain the role of TEs in germline and somatic evolution more accurately, is needed. In this study, we examine the performance of a variety of machine learning (ML) techniques and propose a robust method, ClassifyTE, for the hierarchical classification of TEs with high accuracy, using a stacking-based ML method.

Results: We propose a stacking-based approach for the hierarchical classification of TEs. When trained on three different benchmark datasets, our proposed system achieved 4%, 10.68% and 10.13% average percentage improvement (using the hF measure) compared to several state-of-the-art methods. We developed an end-to-end automated hierarchical classification tool based on the proposed approach, ClassifyTE, to classify TEs up to the super-family level. We further evaluated our method on a new TE library generated by a homology-based classification method and found relatively high concordance at higher taxonomic levels. Thus, ClassifyTE paves the way for a more accurate analysis of the role of TEs.

Availability and implementation: The source code and data are available at https://github.com/manisa/ClassifyTE.

Supplementary information: Supplementary data are available at Bioinformatics online.