Double-blind evaluation and benchmarking of survival models in a multi-centre study

A Taktak; L Antolini; M Aung; P Boracchi; I Campbell; B Damato; E Ifeachor; N Lama; P Lisboa; C Setzkorn; V Stalbovskaya; E Biganzoli

doi:10.1016/j.compbiomed.2006.10.001

Double-blind evaluation and benchmarking of survival models in a multi-centre study

Comput Biol Med. 2007 Aug;37(8):1108-20. doi: 10.1016/j.compbiomed.2006.10.001. Epub 2006 Dec 20.

Authors

A Taktak¹, L Antolini, M Aung, P Boracchi, I Campbell, B Damato, E Ifeachor, N Lama, P Lisboa, C Setzkorn, V Stalbovskaya, E Biganzoli

Affiliation

¹ Department of Clinical Engineering, Royal Liverpool University Hospital, Liverpool, UK. afgt@liv.ac.uk

PMID: 17184760
DOI: 10.1016/j.compbiomed.2006.10.001

Abstract

Accurate modelling of time-to-event data is of particular importance for both exploratory and predictive analysis in cancer, and can have a direct impact on clinical care. This study presents a detailed double-blind evaluation of the accuracy in out-of-sample prediction of mortality from two generic non-linear models, using artificial neural networks benchmarked against a partial logistic spline, log-normal and COX regression models. A data set containing 2880 samples was shared over the Internet using a purpose-built secure environment called GEOCONDA (www.geoconda.com). The evaluation was carried out in three parts. The first was a comparison between the predicted survival estimates for each of the four survival groups defined by the TNM staging system, against the empirical estimates derived by the Kaplan-Meier method. The second approach focused on the accurate prediction of survival over time, quantified with the time dependent C index (C(td)). Finally, calibration plots were obtained over the range of follow-up and tested using a generalization of the Hosmer-Lemeshow test. All models showed satisfactory performance, with values of C(td) of about 0.7. None of the models showed a systematic tendency towards over/under estimation of the observed survival at tau=3 and 5 years. At tau=10 years, all models underestimated the observed survival, except for COX regression which returned an overestimate. The study presents a robust and unbiased benchmarking methodology using a bespoke web facility. It was concluded that powerful, recent flexible modelling algorithms show a comparative predictive performance to that of more established methods from the medical and biological literature, for the reference data set.

Publication types

Comparative Study
Evaluation Study
Multicenter Study
Research Support, Non-U.S. Gov't

MeSH terms

Benchmarking
Computer Simulation*
Databases, Factual
Double-Blind Method
Female
Humans
Kaplan-Meier Estimate
Linear Models
Male
Melanoma / mortality
Middle Aged
Neural Networks, Computer
Nonlinear Dynamics
Proportional Hazards Models
Survival Analysis*
United Kingdom / epidemiology
Uveal Neoplasms / mortality