Genomic Variations Explorer (GenVarX): a toolset for annotating promoter and CNV regions using genotypic and phenotypic differences

Front Genet. 2023 Oct 9:14:1251382. doi: 10.3389/fgene.2023.1251382. eCollection 2023.

Abstract

The rapid growth of sequencing technology and its increasing popularity in biology-related research over the years has made whole genome re-sequencing (WGRS) data become widely available. A large amount of WGRS data can unlock the knowledge gap between genomics and phenomics through gaining an understanding of the genomic variations that can lead to phenotype changes. These genomic variations are usually comprised of allele and structural changes in DNA, and these changes can affect the regulatory mechanisms causing changes in gene expression and altering the phenotypes of organisms. In this research work, we created the GenVarX toolset, that is backed by transcription factor binding sequence data in promoter regions, the copy number variations data, SNPs and Indels data, and phenotypes data which can potentially provide insights about phenotypic differences and solve compelling questions in plant research. Analytics-wise, we have developed strategies to better utilize the WGRS data and mine the data using efficient data processing scripts, libraries, tools, and frameworks to create the interactive and visualization-enhanced GenVarX toolset that encompasses both promoter regions and copy number variation analysis components. The main capabilities of the GenVarX toolset are to provide easy-to-use interfaces for users to perform queries, visualize data, and interact with the data. Based on different input windows on the user interface, users can provide inputs corresponding to each field and submit the information as a query. The data returned on the results page is usually displayed in a tabular fashion. In addition, interactive figures are also included in the toolset to facilitate the visualization of statistical results or tool outputs. Currently, the GenVarX toolset supports soybean, rice, and Arabidopsis. The researchers can access the soybean GenVarX toolset from SoyKB via https://soykb.org/SoybeanGenVarX/, rice GenVarX toolset, and Arabidopsis GenVarX toolset from KBCommons web portal with links https://kbcommons.org/system/tools/GenVarX/Osativa and https://kbcommons.org/system/tools/GenVarX/Athaliana, respectively.

Keywords: Indels; SNPs; copy number variation; genomic variations; phenotypes; promoter; transcription factor; whole genome re-sequencing data.

Grants and funding

The research was supported using Missouri soybean farmers’ checkoff dollars provided by the United Soybean Board (USB). The principal investigators, project title, and grant ID information are as below: 1. Dr. Trupti Joshi and Dr. Kristin Bilyeu; Applied Genomics to Improve Soybean Seed Protein; #1920-152-0131-C. 2. Dr. Trupti Joshi and Dr. Kristin Bilyeu; Enhancing Soybean Applied Genomics Tools for Improving Soybean; #2220-152-0202. 3. Dr. Trupti Joshi and Dr. Kristin Bilyeu; Leveraging Genomics to Enhance The US Soybean Quality Reputation; #2332-201-0101.