Evaluation of statistical approaches for association testing in noisy drug screening data

Petr Smirnov; Ian Smith; Zhaleh Safikhani; Wail Ba-Alawi; Farnoosh Khodakarami; Eva Lin; Yihong Yu; Scott Martin; Janosch Ortmann; Tero Aittokallio; Marc Hafner; Benjamin Haibe-Kains

doi:10.1186/s12859-022-04693-z

Evaluation of statistical approaches for association testing in noisy drug screening data

BMC Bioinformatics. 2022 May 18;23(1):188. doi: 10.1186/s12859-022-04693-z.

Authors

Petr Smirnov^#^{1

2}, Ian Smith^#^{1

2}, Zhaleh Safikhani², Wail Ba-Alawi², Farnoosh Khodakarami², Eva Lin³, Yihong Yu³, Scott Martin³, Janosch Ortmann⁴, Tero Aittokallio^{5

6

7

8}, Marc Hafner⁹, Benjamin Haibe-Kains^{10

11

12}

Affiliations

¹ Department of Medical Biophysics, University of Toronto, Toronto, Canada.
² Princess Margaret Cancer Center, University Health Network, Toronto, Canada.
³ Department of Discovery Oncology, Genentech Inc., South San Francisco, USA.
⁴ Département d'analytique, opérations et technologies de l'information, École des sciences de la gestion, Université du Québec à Montréal, Montréal, Canada.
⁵ Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland.
⁶ Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.
⁷ Oslo Centre for Biostatistics and Epidemiology, University of Oslo, Oslo, Norway.
⁸ iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland.
⁹ Department of Oncology Bioinformatics, Genentech Inc., South San Francisco, USA.
¹⁰ Department of Medical Biophysics, University of Toronto, Toronto, Canada. bhaibeka@uhnresearch.ca.
¹¹ Princess Margaret Cancer Center, University Health Network, Toronto, Canada. bhaibeka@uhnresearch.ca.
¹² Vector Institute, Toronto, Canada. bhaibeka@uhnresearch.ca.

^# Contributed equally.

Abstract

Background: Identifying associations among biological variables is a major challenge in modern quantitative biological research, particularly given the systemic and statistical noise endemic to biological systems. Drug sensitivity data has proven to be a particularly challenging field for identifying associations to inform patient treatment.

Results: To address this, we introduce two semi-parametric variations on the commonly used concordance index: the robust concordance index and the kernelized concordance index (rCI, kCI), which incorporate measurements about the noise distribution from the data. We demonstrate that common statistical tests applied to the concordance index and its variations fail to control for false positives, and introduce efficient implementations to compute p-values using adaptive permutation testing. We then evaluate the statistical power of these coefficients under simulation and compare with Pearson and Spearman correlation coefficients. Finally, we evaluate the various statistics in matching drugs across pharmacogenomic datasets.

Conclusions: We observe that the rCI and kCI are better powered than the concordance index in simulation and show some improvement on real data. Surprisingly, we observe that the Pearson correlation was the most robust to measurement noise among the different metrics.

Keywords: Association testing; Biomarker; Drug sensitivity; Non-parametric statistics; Pharmacogenomics; Power analysis; Statistics.

MeSH terms

Computer Simulation
Drug Evaluation, Preclinical
Humans
Models, Statistical*