Prediction of evolutionary constraint by genomic annotations improves functional prioritization of genomic variants in maize

Genome Biol. 2022 Sep 1;23(1):183. doi: 10.1186/s13059-022-02747-2.

Abstract

Background: Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at high resolution, within fewer than hundreds of base pairs. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we use genomic annotations to accurately predict nucleotide conservation across angiosperms, as a proxy for fitness effect of mutations.

Results: Using only sequence analysis, we annotate nonsynonymous mutations in 25,824 maize gene models, with information from bioinformatics and deep learning. Our predictions are validated by experimental information: within-species conservation, chromatin accessibility, and gene expression. According to gene ontology and pathway enrichment analyses, predicted nucleotide conservation points to genes in central carbon metabolism. Importantly, it improves genomic prediction for fitness-related traits such as grain yield, in elite maize panels, by stringent prioritization of fewer than 1% of single-site variants.

Conclusions: Our results suggest that predicting nucleotide conservation across angiosperms may effectively prioritize sites most likely to impact fitness-related traits in crops, without being limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Our approach-Prediction of mutation Impact by Calibrated Nucleotide Conservation (PICNC)-could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing. The trained PICNC models and predicted nucleotide conservation at protein-coding SNPs in maize are publicly available in CyVerse ( https://doi.org/10.25739/hybz-2957 ).

Keywords: Comparative genomics; Genomic prediction; Machine learning; Quantitative genetics; Zea mays.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Genome
  • Genomics* / methods
  • Nucleotides
  • Phenotype
  • Polymorphism, Single Nucleotide
  • Zea mays* / genetics

Substances

  • Nucleotides