A Comprehensive Investigation of Active Learning Strategies for Conducting Anti-Cancer Drug Screening

Priyanka Vasanthakumari; Yitan Zhu; Thomas Brettin; Alexander Partin; Maulik Shukla; Fangfang Xia; Oleksandr Narykov; Michael Ryan Weil; Rick L Stevens

doi:10.3390/cancers16030530

A Comprehensive Investigation of Active Learning Strategies for Conducting Anti-Cancer Drug Screening

Cancers (Basel). 2024 Jan 26;16(3):530. doi: 10.3390/cancers16030530.

Authors

Priyanka Vasanthakumari¹, Yitan Zhu¹, Thomas Brettin², Alexander Partin¹, Maulik Shukla¹, Fangfang Xia¹, Oleksandr Narykov¹, Michael Ryan Weil³, Rick L Stevens^{2

4}

Affiliations

¹ Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA.
² Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA.
³ Cancer Research Technology Program, Cancer Data Science Initiatives, Frederick National Laboratory for Cancer Research, Frederick, MD 21701, USA.
⁴ Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA.

Abstract

It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models. In this study, we investigate various active learning strategies of selecting experiments to generate response data for the purposes of (1) improving the performance of drug response prediction models built on the data and (2) identifying effective treatments. Here, we focus on constructing drug-specific response prediction models for cancer cell lines. Various approaches have been designed and applied to select cell lines for screening, including a random, greedy, uncertainty, diversity, combination of greedy and uncertainty, sampling-based hybrid, and iteration-based hybrid approach. All of these approaches are evaluated and compared using two criteria: (1) the number of identified hits that are selected experiments validated to be responsive, and (2) the performance of the response prediction model trained on the data of selected experiments. The analysis was conducted for 57 drugs and the results show a significant improvement on identifying hits using active learning approaches compared with the random and greedy sampling method. Active learning approaches also show an improvement on response prediction performance for some of the drugs and analysis runs compared with the greedy sampling method.

Keywords: active learning; cancer; drug discovery; drug response prediction; machine learning.

Grants and funding

Cancer Moonshot Task Order No. 75N91019F00134/CA/NCI NIH HHS/United States