Using Inverse Probability Bootstrap Sampling to Eliminate Sample Induced Bias in Model Based Analysis of Unequal Probability Samples

PLoS One. 2015 Jun 30;10(6):e0131765. doi: 10.1371/journal.pone.0131765. eCollection 2015.

Abstract

In ecology, as in other research fields, efficient sampling for population estimation often drives sample designs toward unequal probability sampling, such as in stratified sampling. Design based statistical analysis tools are appropriate for seamless integration of sample design into the statistical analysis. However, it is also common and necessary, after a sampling design has been implemented, to use datasets to address questions that, in many cases, were not considered during the sampling design phase. Questions may arise requiring the use of model based statistical tools such as multiple regression, quantile regression, or regression tree analysis. However, such model based tools may require, for ensuring unbiased estimation, data from simple random samples, which can be problematic when analyzing data from unequal probability designs. Despite numerous method specific tools available to properly account for sampling design, too often in the analysis of ecological data, sample design is ignored and consequences are not properly considered. We demonstrate here that violation of this assumption can lead to biased parameter estimates in ecological research. In addition, to the set of tools available for researchers to properly account for sampling design in model based analysis, we introduce inverse probability bootstrapping (IPB). Inverse probability bootstrapping is an easily implemented method for obtaining equal probability re-samples from a probability sample, from which unbiased model based estimates can be made. We demonstrate the potential for bias in model-based analyses that ignore sample inclusion probabilities, and the effectiveness of IPB sampling in eliminating this bias, using both simulated and actual ecological data. For illustration, we considered three model based analysis tools--linear regression, quantile regression, and boosted regression tree analysis. In all models, using both simulated and actual ecological data, we found inferences to be biased, sometimes severely, when sample inclusion probabilities were ignored, while IPB sampling effectively produced unbiased parameter estimates.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Ecology
  • Models, Statistical
  • Regression Analysis
  • Research Design / statistics & numerical data*
  • Sampling Studies*
  • Selection Bias*

Grants and funding

This research is supported by the Northwest Fisheries Science Center-National Oceanic and Atmospheric Administration (NOAA) (http://www.noaa.gov/) and the Bonneville Power Administration (BPA, Projects 2003-017-00 and 2011-006-00) (http://www.bpa.gov). The funders had no role in study design, data collection, and analysis, decision to publish, or preparation of the manuscript. South Fork Research, Inc. provided support in the form of salaries for authors (MN, CV), but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.