Semi-supervised inference for nonparametric logistic regression

Stat Med. 2023 Jul 10;42(15):2573-2589. doi: 10.1002/sim.9737. Epub 2023 May 10.

Abstract

We consider the problem of estimating the nonparametric function in nonparametric logistic regression under semi-supervised framework, where a relatively small size labeled data set collected by case-control sampling and a relatively large size of unlabeled data containing only observations of predictors are available. This problem arises in various applications when the outcome variable is expensive or difficult to be observed directly. A two-stage nonparametric semi-supervised estimator based on spline method is proposed to estimate the target regression function by maximizing the likelihood function of the labeled case-control data. The unlabeled data are used in the first stage for estimating the density function that involves in the likelihood function. The consistency and functional asymptotic normality of the semi-supervised two-stage estimator are established under mild conditions. The proposed method, by making use of the unlabeled data, produces more efficient estimation of the target function than the traditional supervised counterpart. The performance of the proposed method is evaluated through extensive simulation studies. An application is illustrated with an analysis of a skin segmentation data.

Keywords: case-control studies; nonparametric logistic regression; semi-supervised inference.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Humans
  • Likelihood Functions
  • Logistic Models