Inverse probability weighting is an effective method to address selection bias during the analysis of high dimensional data

Genet Epidemiol. 2021 Sep;45(6):593-603. doi: 10.1002/gepi.22418. Epub 2021 Jun 15.

Abstract

Omics studies frequently use samples collected during cohort studies. Conditioning on sample availability can cause selection bias if sample availability is nonrandom. Inverse probability weighting (IPW) is purported to reduce this bias. We evaluated IPW in an epigenome-wide analysis testing the association between DNA methylation (261,435 probes) and age in healthy adolescent subjects (n = 114). We simulated age and sex to be correlated with sample selection and then evaluated four conditions: complete population/no selection bias (all subjects), naïve selection bias (no adjustment), and IPW selection bias (selection bias with IPW adjustment). Assuming the complete population condition represented the "truth," we compared each condition to the complete population condition. Bias or difference in associations between age and methylation was reduced in the IPW condition versus the naïve condition. However, genomic inflation and type 1 error were higher in the IPW condition relative to the naïve condition. Postadjustment using bacon, type 1 error and inflation were similar across all conditions. Power was higher under the IPW condition compared with the naïve condition before and after inflation adjustment. IPW methods can reduce bias in genome-wide analyses. Genomic inflation is a potential concern that can be minimized using methods that adjust for inflation.

Keywords: DAISY; DNA methylation; inverse probability weighting; selection bias.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adolescent
  • Bias
  • Cohort Studies
  • Genome-Wide Association Study*
  • Humans
  • Probability
  • Selection Bias