Shetti, a simple tool to parse, manipulate and search large datasets of sequences

Microb Genom. 2015 Nov 6;1(5):e000035. doi: 10.1099/mgen.0.000035. eCollection 2015 Nov.

Abstract

Parsing and manipulating long and/or multiple protein or gene sequences can be a challenging process for experimental biologists and microbiologists lacking prior knowledge of bioinformatics and programming. Here we present a simple, easy, user-friendly and versatile tool to parse, manipulate and search within large datasets of long and multiple protein or gene sequences. The Shetti tool can be used to search for a sequence, species, protein/gene or pattern/motif. Moreover, it can also be used to construct a universal consensus or molecular signatures for proteins based on their physical characteristics. Shetti is an efficient and fast tool that can deal with large sets of long sequences efficiently. Shetti parses UniProt Knowledgebase and NCBI GenBank flat files and visualizes them as a table.

Keywords: comparative genomics; consensus pattern; functional motif/domain; protein/gene sequences; sequence manipulation.