Revising transcriptome assemblies with phylogenetic information

PLoS One. 2021 Jan 12;16(1):e0244202. doi: 10.1371/journal.pone.0244202. eCollection 2021.

Abstract

A common transcriptome assembly error is to mistake different transcripts of the same gene as transcripts from multiple closely related genes. This error is difficult to identify during assembly, but in a phylogenetic analysis such errors can be diagnosed from gene phylogenies where they appear as clades of tips from the same species with improbably short branch lengths. treeinform is a method that uses phylogenetic information across species to refine transcriptome assemblies within species. It identifies transcripts of the same gene that were incorrectly assigned to multiple genes and reassign them as transcripts of the same gene. The treeinform method is implemented in Agalma, available at https://bitbucket.org/caseywdunn/agalma, and the general approach is relevant in a variety of other contexts.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Animals
  • Cluster Analysis
  • Cnidaria / classification
  • Cnidaria / genetics
  • Models, Theoretical
  • Phylogeny
  • Transcriptome*
  • User-Computer Interface*