Robust mislabel logistic regression without modeling mislabel probabilities

Biometrics. 2018 Mar;74(1):145-154. doi: 10.1111/biom.12726. Epub 2017 May 10.

Abstract

Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression.

Keywords: Classification; Logistic regression; Minimum divergence estimation; Mislabeled response; Robust M-estimation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bias*
  • Classification
  • Computer Simulation
  • Humans
  • Logistic Models*
  • Probability*