Transporting a Prediction Model for Use in a New Target Population

Am J Epidemiol. 2023 Feb 1;192(2):296-304. doi: 10.1093/aje/kwac128.

Abstract

We considered methods for transporting a prediction model for use in a new target population, both when outcome and covariate data for model development are available from a source population that has a different covariate distribution compared with the target population and when covariate data (but not outcome data) are available from the target population. We discuss how to tailor the prediction model to account for differences in the data distribution between the source population and the target population. We also discuss how to assess the model's performance (e.g., by estimating the mean squared prediction error) in the target population. We provide identifiability results for measures of model performance in the target population for a potentially misspecified prediction model under a sampling design where the source and the target population samples are obtained separately. We introduce the concept of prediction error modifiers that can be used to reason about tailoring measures of model performance to the target population. We illustrate the methods in simulated data and apply them to transport a prediction model for lung cancer diagnosis from the National Lung Screening Trial to the nationally representative target population of trial-eligible individuals in the National Health and Nutrition Examination Survey.

Keywords: covariate shift; domain adaptation; generalizability; model performance; prediction error modifier; transportability.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Lung Neoplasms / diagnosis
  • Models, Theoretical*
  • Nutrition Surveys*