Novel phylogenetic methods are needed for understanding gene function in the era of mega-scale genome sequencing

Nucleic Acids Res. 2020 Mar 18;48(5):2209-2219. doi: 10.1093/nar/gkz1241.

Abstract

Ongoing large-scale genome sequencing projects are forecasting a data deluge that will almost certainly overwhelm current analytical capabilities of evolutionary genomics. In contrast to population genomics, there are no standardized methods in evolutionary genomics for extracting evolutionary and functional (e.g. gene-trait association) signal from genomic data. Here, we examine how current practices of multi-species comparative genomics perform in this aspect and point out that many genomic datasets are under-utilized due to the lack of powerful methodologies. As a result, many current analyses emphasize gene families for which some functional data is already available, resulting in a growing gap between functionally well-characterized genes/organisms and the universe of unknowns. This leaves unknown genes on the 'dark side' of genomes, a problem that will not be mitigated by sequencing more and more genomes, unless we develop tools to infer functional hypotheses for unknown genes in a systematic manner. We provide an inventory of recently developed methods capable of predicting gene-gene and gene-trait associations based on comparative data, then argue that realizing the full potential of whole genome datasets requires the integration of phylogenetic comparative methods into genomics, a rich but underutilized toolbox for looking into the past.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Cellulase / classification
  • Cellulase / genetics
  • Cellulase / metabolism
  • Computational Biology / methods*
  • Cytochrome P-450 Enzyme System / classification
  • Cytochrome P-450 Enzyme System / genetics
  • Cytochrome P-450 Enzyme System / metabolism
  • Databases, Genetic
  • Datasets as Topic
  • Dictyostelium / enzymology
  • Dictyostelium / genetics
  • Epistasis, Genetic*
  • Fungi / classification
  • Fungi / enzymology
  • Fungi / genetics
  • Gene Dosage
  • Genetic Loci
  • Genome*
  • High-Throughput Nucleotide Sequencing / statistics & numerical data
  • Multigene Family*
  • Phascolarctidae / genetics
  • Phascolarctidae / metabolism
  • Phylogeny*
  • Plants / classification
  • Plants / genetics
  • Plants / metabolism

Substances

  • Cytochrome P-450 Enzyme System
  • Cellulase