BDBM 1.0: A Desktop Application for Efficient Retrieval and Processing of High-Quality Sequence Data and Application to the Identification of the Putative Coffea S-Locus

Interdiscip Sci. 2019 Mar;11(1):57-67. doi: 10.1007/s12539-019-00320-3. Epub 2019 Feb 2.

Abstract

Nowadays, bioinformatics is one of the most important areas in modern biology and the creation of high-quality scientific software supporting this recent research area is one of the core activities of many researchers. In this context, high-quality sequence datasets are needed to perform inferences on the evolution of species, genes, and gene families, or to get evidence for adaptive amino acid evolution, among others. Nevertheless, sequence data are very often spread over several databases, many useful genomes and transcriptomes are non-annotated, the available annotation is not for the desired coding sequence isoform, and/or is unlikely to be accurate. Moreover, although the FASTA text-based format is quite simple and usable by most software applications, there are a number of issues that may be critical depending on the software used to analyse such files. Therefore, researchers without training in informatics often use a fraction of all available data. The above issues can be addressed using already available software applications, but there is no easy-to-use single piece of software that allows performing all these tasks within the same graphical interface, such as the one here presented, named BDBM (Blast DataBase Manager). BDBM can be used to efficiently get gene sequences from annotated and non-annotated genomes and transcriptomes. Moreover, it can be used to look for alternatives to existing annotations and to easily create reliable custom databases. Such databases are essential to prepare high-quality datasets. The analyses that we have performed on the Coffea canephora genome using BDBM aimed at the identification of the S-locus region (that harbours the genes involved in gametophytic self-incompatibility) led to the conclusion that there are two likely regions, one on chromosome 2 (around region 6600000-6650000), and another on chromosome 5 (around 15830000-15930000). Such findings are discussed in the context of the Rubiaceae gametophytic self-incompatibility evolution.

Keywords: BDBM; Coffea canephora; Docker containers; Gametophytic self-incompatibility; Graphical user interface; Nucleotide; Protein; S-Locus.

MeSH terms

  • Coffea / genetics*
  • Computational Biology*
  • Databases, Genetic*
  • Sequence Analysis, DNA
  • Software*