Interspecific comparison of gene expression profiles using machine learning

PLoS Comput Biol. 2023 Jan 10;19(1):e1010743. doi: 10.1371/journal.pcbi.1010743. eCollection 2023 Jan.

Abstract

Interspecific gene comparisons are the keystones for many areas of biological research and are especially important for the translation of knowledge from model organisms to economically important species. Currently they are hampered by the low resolution of methods based on sequence analysis and by the complex evolutionary history of eukaryotic genes. This is especially critical for plants, whose genomes are shaped by multiple whole genome duplications and subsequent gene loss. This requires the development of new methods for comparing the functions of genes in different species. Here, we report ISEEML (Interspecific Similarity of Expression Evaluated using Machine Learning)-a novel machine learning-based algorithm for interspecific gene classification. In contrast to previous studies focused on sequence similarity, our algorithm focuses on functional similarity inferred from the comparison of gene expression profiles. We propose novel metrics for expression pattern similarity-expression score (ES)-that is suitable for species with differing morphologies. As a proof of concept, we compare detailed transcriptome maps of Arabidopsis thaliana, the model species, Zea mays (maize) and Fagopyrum esculentum (common buckwheat), which are species that represent distant clades within flowering plants. The classifier resulted in an AUC of 0.91; under the ES threshold of 0.5, the specificity was 94%, and sensitivity was 72%.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Arabidopsis* / genetics
  • Biological Evolution
  • Gene Expression Regulation, Plant / genetics
  • Transcriptome* / genetics
  • Zea mays / genetics

Grants and funding

This work was supported by the Russian Science Foundation, project #17-14-01315 – conceptualization (to ASK, AVK, MDL, AAP), by the Institute for Information Transmission Problems (Laboratory of Plant Genomics), project # FFNU-2022-0037 – development of ISEEML tool (to ASK, AVK, AVM, AAP), and by the Ministry of Science and Higher Education, project #075-15-2021-1064 - data analysis (to ASK, AVK, MDL, AAP). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.