Leveraging existing resources in studied species to predict gene functions has the potential to rapidly expand understanding of annotated genes in other, less well-studied, species with assembled genomes. However, orthology is not a reliable predictor for the transcriptional responses of genes to stress. Machine learning methods can quantitatively estimate expression patterns and gene functions using known annotations and collections of features describing each gene. In this chapter, we describe a supervised machine learning framework to predict stress-responsive genes across species using only features derived from nucleotide sequences, using the example of cold stress-responsive genes in different Panicoid grass species.
Keywords: Dinucleotide frequency; Gene annotation; Grasses; Machine learning; Random forest; Transfer learning.
© 2023. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.