Purpose: Treatment decisions about localized prostate cancer depend on accurate estimation of the patient's life expectancy. Current cancer and noncancer survival models use a limited number of predefined variables, which could restrict their predictive capability. We explored a technique to create more comprehensive survival prediction models using insurance claims data from a large administrative data set. These data contain substantial information about medical diagnoses and procedures, and thus may provide a broader reflection of each patient's health.
Methods: We identified 57,011 Medicare beneficiaries with localized prostate cancer diagnosed between 2004 and 2009. We constructed separate cancer survival and noncancer survival prediction models using a training data set and assessed performance on a test data set. Potential model inputs included clinical and demographic covariates, and 8,971 distinct insurance claim codes describing comorbid diseases, procedures, surgeries, and diagnostic tests. We used a least absolute shrinkage and selection operator technique to identify predictive variables in the final survival models. Each model's predictive capacity was compared with existing survival models with a metric of explained randomness (ρ2) ranging from 0 to 1, with 1 indicating an ideal prediction.
Results: Our noncancer survival model included 143 covariates and had improved survival prediction (ρ2 = 0.60) compared with the Charlson comorbidity index (ρ2 = 0.26) and Elixhauser comorbidity index (ρ2 = 0.26). Our cancer-specific survival model included nine covariates, and had similar survival predictions (ρ2 = 0.71) to the Memorial Sloan Kettering prediction model (ρ2 = 0.68).
Conclusion: Survival prediction models using high-dimensional variable selection techniques applied to claims data show promise, particularly with noncancer survival prediction. After further validation, these analyses could inform clinical decisions for men with prostate cancer.