Text analysis tools for identification of emerging topics and research gaps in conservation science

Conserv Biol. 2015 Dec;29(6):1606-14. doi: 10.1111/cobi.12605. Epub 2015 Nov 2.

Abstract

Keeping track of conceptual and methodological developments is a critical skill for research scientists, but this task is increasingly difficult due to the high rate of academic publication. As a crisis discipline, conservation science is particularly in need of tools that facilitate rapid yet insightful synthesis. We show how a common text-mining method (latent Dirichlet allocation, or topic modeling) and statistical tests familiar to ecologists (cluster analysis, regression, and network analysis) can be used to investigate trends and identify potential research gaps in the scientific literature. We tested these methods on the literature on ecological surrogates and indicators. Analysis of topic popularity within this corpus showed a strong emphasis on monitoring and management of fragmented ecosystems, while analysis of research gaps suggested a greater role for genetic surrogates and indicators. Our results show that automated text analysis methods need to be used with care, but can provide information that is complementary to that given by systematic reviews and meta-analyses, increasing scientists' capacity for research synthesis.

Keywords: asignación Dirichlet latente; hot topics; indicadores; indicators; latent Dirichlet allocation; surrogates; sustitutos; synthesis; síntesis; temas sobresalientes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Conservation of Natural Resources / methods*
  • Data Mining*
  • Ecosystem
  • Invertebrates / genetics
  • Plants / genetics
  • Statistics as Topic*
  • Vertebrates / genetics