Double-blind evaluation and benchmarking of survival models in a multi-centre study

Comput Biol Med. 2007 Aug;37(8):1108-20. doi: 10.1016/j.compbiomed.2006.10.001. Epub 2006 Dec 20.

Abstract

Accurate modelling of time-to-event data is of particular importance for both exploratory and predictive analysis in cancer, and can have a direct impact on clinical care. This study presents a detailed double-blind evaluation of the accuracy in out-of-sample prediction of mortality from two generic non-linear models, using artificial neural networks benchmarked against a partial logistic spline, log-normal and COX regression models. A data set containing 2880 samples was shared over the Internet using a purpose-built secure environment called GEOCONDA (www.geoconda.com). The evaluation was carried out in three parts. The first was a comparison between the predicted survival estimates for each of the four survival groups defined by the TNM staging system, against the empirical estimates derived by the Kaplan-Meier method. The second approach focused on the accurate prediction of survival over time, quantified with the time dependent C index (C(td)). Finally, calibration plots were obtained over the range of follow-up and tested using a generalization of the Hosmer-Lemeshow test. All models showed satisfactory performance, with values of C(td) of about 0.7. None of the models showed a systematic tendency towards over/under estimation of the observed survival at tau=3 and 5 years. At tau=10 years, all models underestimated the observed survival, except for COX regression which returned an overestimate. The study presents a robust and unbiased benchmarking methodology using a bespoke web facility. It was concluded that powerful, recent flexible modelling algorithms show a comparative predictive performance to that of more established methods from the medical and biological literature, for the reference data set.

Publication types

  • Comparative Study
  • Evaluation Study
  • Multicenter Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking
  • Computer Simulation*
  • Databases, Factual
  • Double-Blind Method
  • Female
  • Humans
  • Kaplan-Meier Estimate
  • Linear Models
  • Male
  • Melanoma / mortality
  • Middle Aged
  • Neural Networks, Computer
  • Nonlinear Dynamics
  • Proportional Hazards Models
  • Survival Analysis*
  • United Kingdom / epidemiology
  • Uveal Neoplasms / mortality