Identification of annotation artifacts concerning the chalcone synthase (CHS)

BMC Res Notes. 2023 Jun 20;16(1):109. doi: 10.1186/s13104-023-06386-z.

Abstract

Objective: Chalcone synthase (CHS) catalyzes the initial step of the flavonoid biosynthesis. The CHS encoding gene is well studied in numerous plant species. Rapidly growing sequence databases contain hundreds of CHS entries that are the result of automatic annotation. In this study, we evaluated apparent multiplication of CHS domains in CHS gene models of four plant species.

Main findings: CHS genes with an apparent triplication of the CHS domain encoding part were discovered through database searches. Such genes were found in Macadamia integrifolia, Musa balbisiana, Musa troglodytarum, and Nymphaea colorata. A manual inspection of the CHS gene models in these four species with massive RNA-seq data suggests that these gene models are the result of artificial fusions in the annotation process. While there are hundreds of seemingly correct CHS records in the databases, it is not clear why these annotation artifacts appeared.

Keywords: Annotation error; Bioinformatics; Chalcone synthase; Domain composition; Flavonoid biosynthesis; RNA-seq mapping.

MeSH terms

  • Acyltransferases* / genetics
  • Artifacts*
  • Plants

Substances

  • flavanone synthetase
  • Acyltransferases