Genome ARTIST_v2-An Autonomous Bioinformatics Tool for Annotation of Natural Transposons in Sequenced Genomes

Int J Mol Sci. 2022 Oct 21;23(20):12686. doi: 10.3390/ijms232012686.

Abstract

The annotation of transposable elements (transposons) is a very dynamic field of genomics and various tools assigned to support this bioinformatics endeavor have been developed and described. Genome ARTIST v1.19 (GA_v1.19) software was conceived for mapping artificial transposons mobilized during insertional mutagenesis projects, but the new functions of GA_v2 qualify it as a tool for the mapping and annotation of natural transposons (NTs) in long reads, contigs and assembled genomes. The tabular export of mapping and annotation data for high-throughput data analysis, the generation of a list of flanking sequences around the coordinates of insertion or around the target site duplications and the computing of a consensus sequence for the flanking sequences are all key assets of GA_v2. Additionally, we developed a set of scripts that enable the user to annotate NTs, to harness annotations offered by FlyBase for Drosophila melanogaster genome, to convert sequence files from .fasta to .raw, and to extract junction query sequences essential for NTs mapping. Herein, we present the applicability of GA_v2 for a preliminary annotation of P-element and hobo class II NTs and copia retrotransposon in the genome of D. melanogaster strain Horezu_LaPeri (Horezu), Romania, which was sequenced with Nanopore technology in our laboratory. We used contigs assembled with Flye tool and a Q10 quality filter of the reads. Our results suggest that GA_v2 is a reliable autonomous tool able to perform mapping and annotation of NTs in genomes sequenced by long sequencing technology. GA_v2 is open-source software compatible with Linux, Mac OS and Windows and is available at GitHub repository and dedicated website.

Keywords: Drosophila melanogaster; Genome ARTIST; bioinformatics; genome sequencing; insertion mapping; natural transposons.

MeSH terms

  • Animals
  • Computational Biology*
  • DNA Transposable Elements / genetics
  • Drosophila Proteins* / genetics
  • Drosophila melanogaster / genetics
  • Molecular Sequence Annotation
  • Peptide Hydrolases / genetics
  • Retroelements
  • Sequence Analysis, DNA / methods
  • Software

Substances

  • Retroelements
  • DNA Transposable Elements
  • Peptide Hydrolases
  • Drosophila Proteins

Grants and funding

This research received no external funding. The publication fees have been supported by the projects C1.2.PFE-CDI.2021-587/ Contract no. 41PFE/30.12.2021, CNFIS-FDI-2022-0675 and UEFISCDI-PN-III-P4-PCE2021-1797.