Investigation of model stacking for drug sensitivity prediction

Kevin Matlock; Carlos De Niz; Raziur Rahman; Souparno Ghosh; Ranadip Pal

doi:10.1186/s12859-018-2060-2

Investigation of model stacking for drug sensitivity prediction

BMC Bioinformatics. 2018 Mar 21;19(Suppl 3):71. doi: 10.1186/s12859-018-2060-2.

Authors

Kevin Matlock¹, Carlos De Niz¹, Raziur Rahman¹, Souparno Ghosh², Ranadip Pal³

Affiliations

¹ Department of Electrical and Computer Engineering, Texas Tech University, 1012 Boston Ave, Lubbock, 79409, TX, USA.
² Department of Mathematics and Statistics, Texas Tech University, 1108 Memorial Circle, Lubbock, 79409, TX, USA.
³ Department of Electrical and Computer Engineering, Texas Tech University, 1012 Boston Ave, Lubbock, 79409, TX, USA. ranadip.pal@ttu.edu.

Abstract

Background: A significant problem in precision medicine is the prediction of drug sensitivity for individual cancer cell lines. Predictive models such as Random Forests have shown promising performance while predicting from individual genomic features such as gene expressions. However, accessibility of various other forms of data types including information on multiple tested drugs necessitates the examination of designing predictive models incorporating the various data types.

Results: We explore the predictive performance of model stacking and the effect of stacking on the predictive bias and squared error. In addition we discuss the analytical underpinnings supporting the advantages of stacking in reducing squared error and inherent bias of random forests in prediction of outliers. The framework is tested on a setup including gene expression, drug target, physical properties and drug response information for a set of drugs and cell lines.

Conclusion: The performance of individual and stacked models are compared. We note that stacking models built on two heterogeneous datasets provide superior performance to stacking different models built on the same dataset. It is also noted that stacking provides a noticeable reduction in the bias of our predictors when the dominant eigenvalue of the principle axis of variation in the residuals is significantly higher than the remaining eigenvalues.

Keywords: Bias; Drug sensitivity prediction; Stacking.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Area Under Curve
Bias
Cell Line, Tumor
Deep Learning
Drug Screening Assays, Antitumor*
Humans
Models, Biological*
Neoplasms / drug therapy
Precision Medicine

Grants and funding

R01 GM122084/GM/NIGMS NIH HHS/United States