Manual Annotation Studio (MAS): a collaborative platform for manual functional annotation of viral and microbial genomes

BMC Genomics. 2021 Oct 9;22(1):733. doi: 10.1186/s12864-021-08029-8.

Abstract

Background: Functional genome annotation is the process of labelling functional genomic regions with descriptive information. Manual curation can produce higher quality genome annotations than fully automated methods. Manual annotation efforts are time-consuming and complex; however, software can help reduce these drawbacks.

Results: We created Manual Annotation Studio (MAS) to improve the efficiency of the process of manual functional annotation prokaryotic and viral genomes. MAS allows users to upload unannotated genomes, provides an interface to edit and upload annotations, tracks annotation history and progress, and saves data to a relational database. MAS provides users with pertinent information through a simple point and click interface to execute and visualize results for multiple homology search tools (blastp, rpsblast, and HHsearch) against multiple databases (Swiss-Prot, nr, CDD, PDB, and an internally generated database). MAS was designed to accept connections over the local area network (LAN) of a lab or organization so multiple users can access it simultaneously. MAS can take advantage of high-performance computing (HPC) clusters by interfacing with SGE or SLURM and data can be exported from MAS in a variety of formats (FASTA, GenBank, GFF, and excel).

Conclusions: MAS streamlines and provides structure to manual functional annotation projects. MAS enhances the ability of users to generate, interpret, and compare results from multiple tools. The structure that MAS provides can improve project organization and reduce annotation errors. MAS is ideal for team-based annotation projects because it facilitates collaboration.

Keywords: Bioinformatics; Functional annotation; Gene annotation; Genome annotation; High-performance computing; Manual annotation; Microbial genomics; Phage; Phage annotation.

MeSH terms

  • Databases, Genetic*
  • Databases, Protein
  • Genome, Microbial*
  • Genome, Viral
  • Software