DWCox: A density-weighted Cox model for outlier-robust prediction of prostate cancer survival

F1000Res. 2016 Dec 1:5:2806. doi: 10.12688/f1000research.9434.1. eCollection 2016.

Abstract

Reliable predictions on the risk and survival time of prostate cancer patients based on their clinical records can help guide their treatment and provide hints about the disease mechanism. The Cox regression is currently a commonly accepted approach for such tasks in clinical applications. More complex methods, like ensemble approaches, have the potential of reaching better prediction accuracy at the cost of increased training difficulty and worse result interpretability. Better performance on a specific data set may also be obtained by extensive manual exploration in the data space, but such developed models are subject to overfitting and usually not directly applicable to a different data set. We propose DWCox, a density-weighted Cox model that has improved robustness against outliers and thus can provide more accurate predictions of prostate cancer survival. DWCox assigns weights to the training data according to their local kernel density in the feature space, and incorporates those weights into the partial likelihood function. A linear regression is then used to predict the actual survival times from the predicted risks. In the 2015 Prostate Cancer DREAM Challenge, DWCox obtained the best average ranking in prediction accuracy on the risk and survival time. The success of DWCox is remarkable given that it is one of the smallest and most interpretable models submitted to the challenge. In simulations, DWCox performed consistently better than a standard Cox model when the training data contained many sparsely distributed outliers. Although developed for prostate cancer patients, DWCox can be easily re-trained and applied to other survival analysis problems. DWCox is implemented in R and can be downloaded from https://github.com/JinfengXiao/DWCox.

Keywords: Cox model; DREAM; Prostate cancer.

Grants and funding

Research was sponsored in part by grant 1U54GM114838 awarded by NIGMS through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative ( www.bd2k.nih.gov), National Science Foundation IIS-1320617, IIS-1354329 and IIS 16-18481, and the U.S. Army Research Lab. under Cooperative Agreement No. W911NF-09-2-0053 (NSCTA). The views and conclusions contained in this document are those of the author(s) and should not be interpreted as representing the opinions of the funding agencies.