Spatially varying effects of predictors for the survival prediction of nonmetastatic colorectal Cancer

BMC Cancer. 2018 Nov 8;18(1):1084. doi: 10.1186/s12885-018-4985-2.

Abstract

Background: An increasing number of studies have identified spatial differences in colorectal cancer survival. However, little is known about the spatially varying effects of predictors in survival prediction modeling studies of colorectal cancer that have focused on estimating the absolute survival risk for patients from a wide range of populations. This study aimed to demonstrate the spatially varying effects of predictors of survival for nonmetastatic colorectal cancer patients.

Methods: Patients diagnosed with nonmetastatic colorectal cancer from 2004 to 2013 who were followed up through the end of 2013 were extracted from the Surveillance Epidemiology End Results registry (Patients: 128061). The log-rank test and the restricted mean survival time were used to evaluate survival outcome differences among spatial clusters corresponding to a widely used clinical predictor: stage determined by AJCC 7th edition staging system. The heterogeneity test, which is used in meta-analyses, revealed the spatially varying effects of single predictors. Then, considering the above predictors in a standard survival prediction model based on spatially clustered data, the spatially varying coefficients of these models revealed that some covariate effects may not be constant across the geographic regions of the study. Then, two types of survival prediction models (a statistical model and a machine learning model) were built; these models considered the predictors and enabled survival prediction for patients from a wide range of geographic regions.

Results: Based on univariate and multivariate analysis, some prognostic factors, such as "TNM stage", "tumor size" and "age at diagnosis," have significant spatially varying effects among different regions. When considering these spatially varying effects, machine learning models have fewer assumption constraints (such as proportional hazard assumptions) and better predictive performance compared with statistical models. Upon comparing the concordance indexes of these two models, the machine learning model was found to be more accurate (0.898[0.895,0.902]) than the statistical model (0.732 [0.726, 0.738]).

Conclusions: Based on this study, it's recommended that the spatially varying effect of predictors should be considered when building survival prediction models involving large-scale and multicenter research data. Machine learning models that are not limited by the requirement of a statistical hypothesis are promising alternative models.

Keywords: Colorectal cancer; SEER; Spatially varying effects; Survival prediction model; TNM staging system.

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Colorectal Neoplasms / epidemiology
  • Colorectal Neoplasms / mortality*
  • Colorectal Neoplasms / pathology
  • Female
  • Humans
  • Kaplan-Meier Estimate
  • Machine Learning
  • Male
  • Middle Aged
  • Multivariate Analysis
  • Neoplasm Staging
  • Prognosis
  • Proportional Hazards Models
  • SEER Program
  • Spatial Analysis
  • United States / epidemiology