Predicting drug sensitivity of cancer cells based on DNA methylation levels

Sofia P Miranda; Fernanda A Baião; Julia L Fleck; Stephen R Piccolo

doi:10.1371/journal.pone.0238757

Predicting drug sensitivity of cancer cells based on DNA methylation levels

PLoS One. 2021 Sep 10;16(9):e0238757. doi: 10.1371/journal.pone.0238757. eCollection 2021.

Authors

Sofia P Miranda¹, Fernanda A Baião¹, Julia L Fleck², Stephen R Piccolo³

Affiliations

¹ Department of Industrial Engineering, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil.
² Mines Saint-Etienne, Univ Clermont Auvergne, CNRS, UMR 6158 LIMOS, Centre CIS, Saint-Etienne, France.
³ Department of Biology, Brigham Young University, Provo, Utah, United States of America.

Abstract

Cancer cell lines, which are cell cultures derived from tumor samples, represent one of the least expensive and most studied preclinical models for drug development. Accurately predicting drug responses for a given cell line based on molecular features may help to optimize drug-development pipelines and explain mechanisms behind treatment responses. In this study, we focus on DNA methylation profiles as one type of molecular feature that is known to drive tumorigenesis and modulate treatment responses. Using genome-wide, DNA methylation profiles from 987 cell lines in the Genomics of Drug Sensitivity in Cancer database, we used machine-learning algorithms to evaluate the potential to predict cytotoxic responses for eight anti-cancer drugs. We compared the performance of five classification algorithms and four regression algorithms representing diverse methodologies, including tree-, probability-, kernel-, ensemble-, and distance-based approaches. We artificially subsampled the data to varying degrees, aiming to understand whether training based on relatively extreme outcomes would yield improved performance. When using classification or regression algorithms to predict discrete or continuous responses, respectively, we consistently observed excellent predictive performance when the training and test sets consisted of cell-line data. Classification algorithms performed best when we trained the models using cell lines with relatively extreme drug-response values, attaining area-under-the-receiver-operating-characteristic-curve values as high as 0.97. The regression algorithms performed best when we trained the models using the full range of drug-response values, although this depended on the performance metrics we used. Finally, we used patient data from The Cancer Genome Atlas to evaluate the feasibility of classifying clinical responses for human tumors based on models derived from cell lines. Generally, the algorithms were unable to identify patterns that predicted patient responses reliably; however, predictions by the Random Forests algorithm were significantly correlated with Temozolomide responses for low-grade gliomas.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Antineoplastic Agents
DNA Methylation*
Humans
Machine Learning*

Substances

Antineoplastic Agents

Grants and funding

This work was supported in part by the Coordination for the Improvement of Higher Education Personnel (CAPES) - Finance Code 001. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.