Estimating the area under the ROC curve when transporting a prediction model to a target population

Biometrics. 2023 Sep;79(3):2382-2393. doi: 10.1111/biom.13796. Epub 2022 Nov 25.

Abstract

We propose methods for estimating the area under the receiver operating characteristic (ROC) curve (AUC) of a prediction model in a target population that differs from the source population that provided the data used for original model development. If covariates that are associated with model performance, as measured by the AUC, have a different distribution in the source and target populations, then AUC estimators that only use data from the source population will not reflect model performance in the target population. Here, we provide identification results for the AUC in the target population when outcome and covariate data are available from the sample of the source population, but only covariate data are available from the sample of the target population. In this setting, we propose three estimators for the AUC in the target population and show that they are consistent and asymptotically normal. We evaluate the finite-sample performance of the estimators using simulations and use them to estimate the AUC in a nationally representative target population from the National Health and Nutrition Examination Survey for a lung cancer risk prediction model developed using source population data from the National Lung Screening Trial.

Keywords: U-processes; covariate shift; domain adaptation; importance weighting; model performance; prediction models; transportability.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural

MeSH terms

  • Area Under Curve
  • Models, Statistical*
  • Nutrition Surveys
  • ROC Curve