Lexical Landscapes as large in silico data for examining advanced properties of fitness landscapes

Victor A Meszaros; Miles D Miller-Dickson; C Brandon Ogbunugafor

doi:10.1371/journal.pone.0220891

Lexical Landscapes as large in silico data for examining advanced properties of fitness landscapes

PLoS One. 2019 Aug 12;14(8):e0220891. doi: 10.1371/journal.pone.0220891. eCollection 2019.

Authors

Victor A Meszaros¹, Miles D Miller-Dickson¹, C Brandon Ogbunugafor¹

Affiliation

¹ Department of Ecology and Evolutionary Biology - Brown University, Providence, Rhode Island, United States of America.

Abstract

In silico approaches have served a central role in the development of evolutionary theory for generations. This especially applies to the concept of the fitness landscape, one of the most important abstractions in evolutionary genetics, and one which has benefited from the presence of large empirical data sets only in the last decade or so. In this study, we propose a method that allows us to generate enormous data sets that walk the line between in silico and empirical: word usage frequencies as catalogued by the Google ngram corpora. These data can be codified or analogized in terms of a multidimensional empirical fitness landscape towards the examination of advanced concepts-adaptive landscape by environment interactions, clonal competition, higher-order epistasis and countless others. We argue that the greater Lexical Landscapes approach can serve as a platform that offers an astronomical number of fitness landscapes for exploration (at least) or theoretical formalism (potentially) in evolutionary biology.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Biological Evolution*
Computer Simulation
Datasets as Topic
Genetic Association Studies
Genetic Fitness*
Genetics, Population*
Linguistics
Models, Genetic

Grants and funding

CBO was funded by NSF RII Track-2 FEC (Award Number: 1736253).