Diverse approaches to predicting drug-induced liver injury using gene-expression profiles

Biol Direct. 2020 Jan 15;15(1):1. doi: 10.1186/s13062-019-0257-6.

Abstract

Background: Drug-induced liver injury (DILI) is a serious concern during drug development and the treatment of human disease. The ability to accurately predict DILI risk could yield significant improvements in drug attrition rates during drug development, in drug withdrawal rates, and in treatment outcomes. In this paper, we outline our approach to predicting DILI risk using gene-expression data from Build 02 of the Connectivity Map (CMap) as part of the 2018 Critical Assessment of Massive Data Analysis CMap Drug Safety Challenge.

Results: First, we used seven classification algorithms independently to predict DILI based on gene-expression values for two cell lines. Similar to what other challenge participants observed, none of these algorithms predicted liver injury on a consistent basis with high accuracy. In an attempt to improve accuracy, we aggregated predictions for six of the algorithms (excluding one that had performed exceptionally poorly) using a soft-voting method. This approach also failed to generalize well to the test set. We investigated alternative approaches-including a multi-sample normalization method, dimensionality-reduction techniques, a class-weighting scheme, and expanding the number of hyperparameter combinations used as inputs to the soft-voting method. We met limited success with each of these solutions.

Conclusions: We conclude that alternative methods and/or datasets will be necessary to effectively predict DILI in patients based on RNA expression levels in cell lines.

Reviewers: This article was reviewed by Paweł P Labaj and Aleksandra Gruca (both nominated by David P Kreil).

Keywords: Cell lines; Classification; Drug development; Machine learning; Precision medicine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Chemical and Drug Induced Liver Injury / genetics*
  • Gene Expression Profiling / methods*
  • Humans
  • Models, Biological
  • Risk Assessment
  • Transcriptome*