TRASH: Tandem Repeat Annotation and Structural Hierarchy

Piotr Wlodzimierz; Michael Hong; Ian R Henderson

doi:10.1093/bioinformatics/btad308

TRASH: Tandem Repeat Annotation and Structural Hierarchy

Bioinformatics. 2023 May 4;39(5):btad308. doi: 10.1093/bioinformatics/btad308.

Authors

Piotr Wlodzimierz¹, Michael Hong², Ian R Henderson¹

Affiliations

¹ Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom.
² Department of Genetics, University of Cambridge, Cambridge CB2 3EA, United Kingdom.

Abstract

Motivation: The advent of long-read DNA sequencing is allowing complete assembly of highly repetitive genomic regions for the first time, including the megabase-scale satellite repeat arrays found in many eukaryotic centromeres. The assembly of such repetitive regions creates a need for their de novo annotation, including patterns of higher order repetition. To annotate tandem repeats, methods are required that can be widely applied to diverse genome sequences, without prior knowledge of monomer sequences.

Results: Tandem Repeat Annotation and Structural Hierarchy (TRASH) is a tool that identifies and maps tandem repeats in nucleotide sequence, without prior knowledge of repeat composition. TRASH analyses a fasta assembly file, identifies regions occupied by repeats and then precisely maps them and their higher order structures. To demonstrate the applicability and scalability of TRASH for centromere research, we apply our method to the recently published Col-CEN genome of Arabidopsis thaliana and the complete human CHM13 genome.

Availability and implementation: TRASH is freely available at:https://github.com/vlothec/TRASH and supported on Linux.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Base Sequence
Centromere / genetics
Genomics / methods
Humans
Repetitive Sequences, Nucleic Acid*
Sequence Analysis, DNA / methods
Tandem Repeat Sequences*

Associated data

figshare/10.6084/m9.figshare.22250326
figshare/10.6084/m9.figshare.22250209
figshare/10.6084/m9.figshare.22250185
figshare/10.6084/m9.figshare.22250191

Abstract

Publication types

MeSH terms

Associated data

Grants and funding