Penalized logistic regression with low prevalence exposures beyond high dimensional settings

PLoS One. 2019 May 20;14(5):e0217057. doi: 10.1371/journal.pone.0217057. eCollection 2019.

Abstract

Estimating and selecting risk factors with extremely low prevalences of exposure for a binary outcome is a challenge because classical standard techniques, markedly logistic regression, often fail to provide meaningful results in such settings. While penalized regression methods are widely used in high-dimensional settings, we were able to show their usefulness in low-dimensional settings as well. Specifically, we demonstrate that Firth correction, ridge, the lasso and boosting all improve the estimation for low-prevalence risk factors. While the methods themselves are well-established, comparison studies are needed to assess their potential benefits in this context. This is done here using the dataset of a large unmatched case-control study from France (2005-2008) about the relationship between prescription medicines and road traffic accidents and an accompanying simulation study. Results show that the estimation of risk factors with prevalences below 0.1% can be drastically improved by using Firth correction and boosting in particular, especially for ultra-low prevalences. When a moderate number of low prevalence exposures is available, we recommend the use of penalized techniques.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Accidents, Traffic*
  • Adolescent
  • Adult
  • Case-Control Studies
  • Computer Simulation
  • Data Interpretation, Statistical
  • Female
  • France
  • Humans
  • Likelihood Functions
  • Logistic Models
  • Male
  • Middle Aged
  • Pharmaceutical Preparations*
  • Prevalence
  • Regression Analysis
  • Risk Factors
  • Young Adult

Substances

  • Pharmaceutical Preparations

Grants and funding

The CESIR-A project was funded by the Afssaps, the French National Research Agency (ANR, DAA n° 0766CO204), the French Medical Research Foundation (Equipe FRM), the French National Medical Research Institute (Equipe INSERM Avenir) and the French Direction Générale de la Santé (DGS) to EL. The article processing charge was funded by the German Research Foundation (DFG) and the University of Freiburg in the funding program Open Access Publishing.