The Allele Catalog Tool: a web-based interactive tool for allele discovery and analysis

BMC Genomics. 2023 Mar 10;24(1):107. doi: 10.1186/s12864-023-09161-3.

Abstract

Background: The advancement of sequencing technologies today has made a plethora of whole-genome re-sequenced (WGRS) data publicly available. However, research utilizing the WGRS data without further configuration is nearly impossible. To solve this problem, our research group has developed an interactive Allele Catalog Tool to enable researchers to explore the coding region allelic variation present in over 1,000 re-sequenced accessions each for soybean, Arabidopsis, and maize.

Results: The Allele Catalog Tool was designed originally with soybean genomic data and resources. The Allele Catalog datasets were generated using our variant calling pipeline (SnakyVC) and the Allele Catalog pipeline (AlleleCatalog). The variant calling pipeline is developed to parallelly process raw sequencing reads to generate the Variant Call Format (VCF) files, and the Allele Catalog pipeline takes VCF files to perform imputations, functional effect predictions, and assemble alleles for each gene to generate curated Allele Catalog datasets. Both pipelines were utilized to generate the data panels (VCF files and Allele Catalog files) in which the accessions of the WGRS datasets were collected from various sources, currently representing over 1,000 diverse accessions for soybean, Arabidopsis, and maize individually. The main features of the Allele Catalog Tool include data query, visualization of results, categorical filtering, and download functions. Queries are performed from user input, and results are a tabular format of summary results by categorical description and genotype results of the alleles for each gene. The categorical information is specific to each species; additionally, available detailed meta-information is provided in modal popups. The genotypic information contains the variant positions, reference or alternate genotypes, the functional effect classes, and the amino-acid changes of each accession. Besides that, the results can also be downloaded for other research purposes.

Conclusions: The Allele Catalog Tool is a web-based tool that currently supports three species: soybean, Arabidopsis, and maize. The Soybean Allele Catalog Tool is hosted on the SoyKB website ( https://soykb.org/SoybeanAlleleCatalogTool/ ), while the Allele Catalog Tool for Arabidopsis and maize is hosted on the KBCommons website ( https://kbcommons.org/system/tools/AlleleCatalogTool/Zmays and https://kbcommons.org/system/tools/AlleleCatalogTool/Athaliana ). Researchers can use this tool to connect variant alleles of genes with meta-information of species.

Keywords: Allele Catalog Pipeline; Allele Catalog Tool; Alleles in Gene; Data Visualization; Variant Calling Pipeline.

MeSH terms

  • Alleles*
  • Amino Acid Substitution
  • Arabidopsis* / genetics
  • Data Mining* / methods
  • Data Visualization
  • Datasets as Topic*
  • Gene Frequency
  • Genes, Plant / genetics
  • Genotype
  • Glycine max* / genetics
  • Internet*
  • Metadata
  • Mutation
  • Pigmentation / genetics
  • Plant Dormancy / genetics
  • Software*
  • Zea mays* / genetics

Substances

  • DOG1 protein, Arabidopsis