Predicting gene expression responses to environment in Arabidopsis thaliana using natural variation in DNA sequence

bioRxiv [Preprint]. 2024 Apr 28:2024.04.25.591174. doi: 10.1101/2024.04.25.591174.

Abstract

The evolution of gene expression responses are a critical component of adaptation to variable environments. Predicting how DNA sequence influences expression is challenging because the genotype to phenotype map is not well resolved for cis regulatory elements, transcription factor binding, regulatory interactions, and epigenetic features, not to mention how these factors respond to environment. We tested if flexible machine learning models could learn some of the underlying cis-regulatory genotype to phenotype map. We tested this approach using cold-responsive transcriptome profiles in 5 diverse Arabidopsis thaliana accessions. We first tested for evidence that cis regulation plays a role in environmental response, finding 14 and 15 motifs that were significantly enriched within the up- and down-stream regions of cold-responsive differentially regulated genes (DEGs). We next applied convolutional neural networks (CNNs), which learn de novo cis-regulatory motifs in DNA sequences to predict expression response to environment. We found that CNNs predicted differential expression with moderate accuracy, with evidence that predictions were hindered by biological complexity of regulation and the large potential regulatory code. Overall, DEGs between specific environments can be predicted based on variation in cis-regulatory sequences, although more information needs to be incorporated and better models may be required.

Keywords: evolution; gene expression prediction; genotype to phenotype map; low frequency variants; machine learning; regulatory elements.

Publication types

  • Preprint