Unveiling the species-rank abundance distribution by generalizing the Good-Turing sample coverage theory

Ecology. 2015 May;96(5):1189-201. doi: 10.1890/14-0550.1.

Abstract

Based on a sample of individuals, we focus on inferring the vector of species relative abundance of an entire assemblage and propose a novel estimator of the complete species-rank abundance distribution (RAD). Nearly all previous estimators of the RAD use the conventional "plug-in" estimator Pi (sample relative abundance) of the true relative abundance pi of species i. Because most biodiversity samples are incomplete, the plug-in estimators are applied only to the subset of species that are detected in the sample. Using the concept of sample coverage and its generalization, we propose a new statistical framework to estimate the complete RAD by separately adjusting the sample relative abundances for the set of species detected in the sample and estimating the relative abundances for the set of species undetected in the sample but inferred to be present in the assemblage. We first show that P, is a positively biased estimator of pi for species detected in the sample, and that the degree of bias increases with increasing relative rarity of each species. We next derive a method to adjust the sample relative abundance to reduce the positive bias inherent in j. The adjustment method provides a nonparametric resolution to the longstanding challenge of characterizing the relationship between the true relative abundance in the entire assemblage and the observed relative abundance in a sample. Finally, we propose a method to estimate the true relative abundances of the undetected species based on a lower bound of the number of undetected species. We then combine the adjusted RAD for the detected species and the estimated RAD for the undetected species to obtain the complete RAD estimator. Simulation results show that the proposed RAD curve can unveil the true RAD and is more accurate than the empirical RAD. We also extend our method to incidence data. Our formulas and estimators are illustrated using empirical data sets from surveys of forest spiders (for abundance data) and soil ciliates (for incidence data). The proposed RAD estimator is also applicable to estimating various diversity measures and should be widely useful to analyses of biodiversity and community structure.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Computer Simulation*
  • Models, Biological*
  • Models, Statistical
  • Population Density
  • Selection Bias