On the way to plant data commons - a genotyping use case

J Integr Bioinform. 2022 Sep 5;19(4):20220033. doi: 10.1515/jib-2022-0033. eCollection 2022 Dec 1.

Abstract

Over the last years it has been observed that the progress in data collection in life science has created increasing demand and opportunities for advanced bioinformatics. This includes data management as well as the individual data analysis and often covers the entire data life cycle. A variety of tools have been developed to store, share, or reuse the data produced in the different domains such as genotyping. Especially imputation, as a subfield of genotyping, requires good Research Data Management (RDM) strategies to enable use and re-use of genotypic data. To aim for sustainable software, it is necessary to develop tools and surrounding ecosystems, which are reusable and maintainable. Reusability in the context of streamlined tools can e.g. be achieved by standardizing the input and output of the different tools and adapting to open and broadly used file formats. By using such established file formats, the tools can also be connected with others, improving the overall interoperability of the software. Finally, it is important to build strong communities that maintain the tools by developing and contributing new features and maintenance updates. In this article, concepts for this will be presented for an imputation service.

Keywords: biodiversity; cloud computing; imputation; plants; research data commons.

MeSH terms

  • Computational Biology*
  • Ecosystem*
  • Genotype
  • Software