Robust RNA-seq data analysis using an integrated method of ROC curve and Kolmogorov-Smirnov test

Commun Stat Simul Comput. 2022;51(12):7444-7457. doi: 10.1080/03610918.2020.1837165. Epub 2020 Oct 25.

Abstract

It is a common approach to dichotomize a continuous biomarker in clinical setting for the convenience of application. Analytically, results from using a dichotomized biomarker are often more reliable and resistant to outliers, bi-modal and other unknown distributions. There are two commonly used methods for selecting the best cut-off value for dichotomization of a continuous biomarker, using either maximally selected chi-square statistic or a ROC curve, specifically the Youden Index. In this paper, we explained that in many situations, it is inappropriate to use the former. By using the Maximum Absolute Youden Index (MAYI), we demonstrated that the integration of a MAYI and the Kolmogorov-Smirnov test is not only a robust non-parametric method, but also provides more meaningful p value for selecting the cut-off value than using a Mann-Whitney test. In addition, our method can be applied directly in clinical settings.

Keywords: Kolmogorov-Smirnov test; RNA-seq; Youden Index; the best cut-off.