Lexical Landscapes as large in silico data for examining advanced properties of fitness landscapes

PLoS One. 2019 Aug 12;14(8):e0220891. doi: 10.1371/journal.pone.0220891. eCollection 2019.

Abstract

In silico approaches have served a central role in the development of evolutionary theory for generations. This especially applies to the concept of the fitness landscape, one of the most important abstractions in evolutionary genetics, and one which has benefited from the presence of large empirical data sets only in the last decade or so. In this study, we propose a method that allows us to generate enormous data sets that walk the line between in silico and empirical: word usage frequencies as catalogued by the Google ngram corpora. These data can be codified or analogized in terms of a multidimensional empirical fitness landscape towards the examination of advanced concepts-adaptive landscape by environment interactions, clonal competition, higher-order epistasis and countless others. We argue that the greater Lexical Landscapes approach can serve as a platform that offers an astronomical number of fitness landscapes for exploration (at least) or theoretical formalism (potentially) in evolutionary biology.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Biological Evolution*
  • Computer Simulation
  • Datasets as Topic
  • Genetic Association Studies
  • Genetic Fitness*
  • Genetics, Population*
  • Linguistics
  • Models, Genetic

Grants and funding

CBO was funded by NSF RII Track-2 FEC (Award Number: 1736253).