ICAnnoLncRNA: A Snakemake Pipeline for a Long Non-Coding-RNA Search and Annotation in Transcriptomic Sequences

Genes (Basel). 2023 Jun 24;14(7):1331. doi: 10.3390/genes14071331.

Abstract

Long non-coding RNAs (lncRNAs) are RNA molecules longer than 200 nucleotides that do not encode proteins. Experimental studies have shown the diversity and importance of lncRNA functions in plants. To expand knowledge about lncRNAs in other species, computational pipelines that allow for standardised data-processing steps in a mode that does not require user control up until the final result were actively developed recently. These advancements enable wider functionality for lncRNA data identification and analysis. In the present work, we propose the ICAnnoLncRNA pipeline for the automatic identification, classification and annotation of plant lncRNAs in assembled transcriptomic sequences. It uses the LncFinder software for the identification of lncRNAs and allows the adjustment of recognition parameters using genomic data for which lncRNA annotation is available. The pipeline allows the prediction of lncRNA candidates, alignment of lncRNA sequences to the reference genome, filtering of erroneous/noise transcripts and probable transposable elements, lncRNA classification by genome location, comparison with sequences from external databases and analysis of lncRNA structural features and expression. We used transcriptomic sequences from 15 maize libraries assembled by Trinity and Hisat2/StringTie to demonstrate the application of the ICAnnoLncRNA pipeline.

Keywords: annotation; automatic pipeline; classification; long non-coding RNA; maize; transcriptome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Expression Profiling*
  • RNA, Long Noncoding / genetics
  • Software
  • Transcriptome
  • Zea mays / genetics

Substances

  • RNA, Long Noncoding

Grants and funding

The work was supported by the Budget Project #FWNR-2022-0006 of the Ministry of Science and Higher Education of The Russian Federation(transcriptome analysis) and by the Kurchatov Genomic Centre of the Institute of Cytology and Genetics, SB RAS (No. 075-15-2019-1662) (pipeline development).