Natural and artificial selection of multiple alleles revealed through genomic analyses

Front Genet. 2024 Jan 8:14:1320652. doi: 10.3389/fgene.2023.1320652. eCollection 2023.

Abstract

Genome-to-phenome research in agriculture aims to improve crops through in silico predictions. Genome-wide association study (GWAS) is potent in identifying genomic loci that underlie important traits. As a statistical method, increasing the sample quantity, data quality, or diversity of the GWAS dataset positively impacts GWAS power. For more precise breeding, concrete candidate genes with exact functional variants must be discovered. Many post-GWAS methods have been developed to narrow down the associated genomic regions and, ideally, to predict candidate genes and causative mutations (CMs). Historical natural selection and breeding-related artificial selection both act to change the frequencies of different alleles of genes that control phenotypes. With higher diversity and more extensive GWAS datasets, there is an increased chance of multiple alleles with independent CMs in a single causal gene. This can be caused by the presence of samples from geographically isolated regions that arose during natural or artificial selection. This simple fact is a complicating factor in GWAS-driven discoveries. Currently, none of the existing association methods address this issue and need to identify multiple alleles and, more specifically, the actual CMs. Therefore, we developed a tool that computes a score for a combination of variant positions in a single candidate gene and, based on the highest score, identifies the best number and combination of CMs. The tool is publicly available as a Python package on GitHub, and we further created a web-based Multiple Alleles discovery (MADis) tool that supports soybean and is hosted in SoyKB (https://soykb.org/SoybeanMADisTool/). We tested and validated the algorithm and presented the utilization of MADis in a pod pigmentation L1 gene case study with multiple CMs from natural or artificial selection. Finally, we identified a candidate gene for the pod color L2 locus and predicted the existence of multiple alleles that potentially cause loss of pod pigmentation. In this work, we show how a genomic analysis can be employed to explore the natural and artificial selection of multiple alleles and, thus, improve and accelerate crop breeding in agriculture.

Keywords: GWAS; alleles; breeding; causal gene; causative mutation; genetic variation; soybean.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. The project was funded by two entities—the United Soybean Board (USB) and the Palacký University Internal Grant Agency (IGA). The principal investigators, project titles, and project numbers are listed as follows: 1) TJ and KB: Applied Genomics to Improve Soybean Seed protein, #1920-152-0131-C; 2) TJ and KB: Enhancing Soybean Applied Genomics Tools for Improving Soybean, #2220–152-0202; 3) TJ, KB, and MŠ: Leveraging Genomics to Enhance the US Soybean Quality Reputation, #2332-201-0101; 4) TJ and KB: USA Soybean Quality Reputation Enhanced with Genomics Integration; #24-201-S-B-1-A/2432-201-0101; 5) MŠ IGA (MS: Palacký University Internal Grant Agency #IGA_2020_013; IGA_PrF_2021_015; #IGA_PrF_2022_025; IGA_PrF_2023_022). The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.