Feature selection translates drug response predictors from cell lines to patients

Front Genet. 2023 Jul 14:14:1217414. doi: 10.3389/fgene.2023.1217414. eCollection 2023.

Abstract

Targeted therapies and chemotherapies are prevalent in cancer treatment. Identification of predictive markers to stratify cancer patients who will respond to these therapies remains challenging because patient drug response data are limited. As large amounts of drug response data have been generated by cell lines, methods to efficiently translate cell-line-trained predictors to human tumors will be useful in clinical practice. Here, we propose versatile feature selection procedures that can be combined with any classifier. For demonstration, we combined the feature selection procedures with a (linear) logit model and a (non-linear) K-nearest neighbor and trained these on cell lines to result in LogitDA and KNNDA, respectively. We show that LogitDA/KNNDA significantly outperforms existing methods, e.g., a logistic model and a deep learning method trained by thousands of genes, in prediction AUC (0.70-1.00 for seven of the ten drugs tested) and is interpretable. This may be due to the fact that sample sizes are often limited in the area of drug response prediction. We further derive a novel adjustment on the prediction cutoff for LogitDA to yield a prediction accuracy of 0.70-0.93 for seven drugs, including erlotinib and cetuximab, whose pathways relevant to anti-cancer therapies are also uncovered. These results indicate that our methods can efficiently translate cell-line-trained predictors into tumors.

Keywords: cancer; domain adaptation; drug response; feature selection; machine learning; prediction.

Grants and funding

This research was supported by Academia Sinica, Taiwan (Tukey Grant to GSS and SY, Mathematics in Biology, AS-104-TP-A07, AS-SUMMIT) and the National Science and Technology Council, Taiwan, Republic of China (NSTC 109-2118-M-001-001-MY2 and 111-2118-M-001-009-MY2 to GSS and NSTC 106-2314-B-001-005 to SY).