A machine learning-based treatment prediction model using whole genome variants of hepatitis C virus

PLoS One. 2020 Nov 5;15(11):e0242028. doi: 10.1371/journal.pone.0242028. eCollection 2020.

Abstract

In recent years, the development of diagnostics using artificial intelligence (AI) has been remarkable. AI algorithms can go beyond human reasoning and build diagnostic models from a number of complex combinations. Using next-generation sequencing technology, we identified hepatitis C virus (HCV) variants resistant to directing-acting antivirals (DAA) by whole genome sequencing of full-length HCV genomes, and applied these variants to various machine-learning algorithms to evaluate a preliminary predictive model. HCV genomic RNA was extracted from serum from 173 patients (109 with subsequent sustained virological response [SVR] and 64 without) before DAA treatment. HCV genomes from the 109 SVR and 64 non-SVR patients were randomly divided into a training data set (57 SVR and 29 non-SVR) and a validation-data set (52 SVR and 35 non-SVR). The training data set was subject to nine machine-learning algorithms selected to identify the optimized combination of functional variants in relation to SVR status following DAA therapy. Subsequently, the prediction model was tested by the validation-data set. The most accurate learning method was the support vector machine (SVM) algorithm (validation accuracy, 0.95; kappa statistic, 0.90; F-value, 0.94). The second-most accurate learning algorithm was Multi-layer perceptron. Unfortunately, Decision Tree, and Naive Bayes algorithms could not be fitted with our data set due to low accuracy (< 0.8). Conclusively, with an accuracy rate of 95.4% in the generalization performance evaluation, SVM was identified as the best algorithm. Analytical methods based on genomic analysis and the construction of a predictive model by machine-learning may be applicable to the selection of the optimal treatment for other viral infections and cancer.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Algorithms
  • Antiviral Agents / therapeutic use
  • Artificial Intelligence
  • Bayes Theorem
  • Drug Therapy, Combination / methods
  • Female
  • Genetic Variation / genetics*
  • Genome, Viral / genetics*
  • Hepacivirus / drug effects
  • Hepacivirus / genetics*
  • Hepatitis C / drug therapy
  • Hepatitis C / virology
  • Humans
  • Machine Learning
  • Male
  • Neural Networks, Computer
  • RNA, Viral / genetics
  • Support Vector Machine
  • Sustained Virologic Response

Substances

  • Antiviral Agents
  • RNA, Viral

Grants and funding

This research was supported by Japan Agency for Medical Research and Development (AMED) under Grant Number JP20fk0210072s0401, JP20fk0210058s2102 and JP20fk0210047s0702.