A downsampling strategy to assess the predictive value of radiomic features

Sci Rep. 2019 Nov 28;9(1):17869. doi: 10.1038/s41598-019-54190-2.

Abstract

Many studies are devoted to the design of radiomic models for a prediction task. When no effective model is found, it is often difficult to know whether the radiomic features do not include information relevant to the task or because of insufficient data. We propose a downsampling method to answer that question when considering a classification task into two groups. Using two large patient cohorts, several experimental configurations involving different numbers of patients were created. Univariate or multivariate radiomic models were designed from each configuration. Their performance as reflected by the Youden index (YI) and Area Under the receiver operating characteristic Curve (AUC) was compared to the stable performance obtained with the highest number of patients. A downsampling method is described to predict the YI and AUC achievable with a large number of patients. Using the multivariate models involving machine learning, YI and AUC increased with the number of patients while they decreased for univariate models. The downsampling method better estimated YI and AUC obtained with the largest number of patients than the YI and AUC obtained using the number of available patients and identifies the lack of information relevant to the classification task when no such information exists.