Are machine learning based methods suited to address complex biological problems? Lessons from CAGI-5 challenges

Hum Mutat. 2019 Sep;40(9):1455-1462. doi: 10.1002/humu.23784. Epub 2019 Jun 18.

Abstract

In silico approaches are routinely adopted to predict the effects of genetic variants and their relation to diseases. The critical assessment of genome interpretation (CAGI) has established a common framework for the assessment of available predictors of variant effects on specific problems and our group has been an active participant of CAGI since its first edition. In this paper, we summarize our experience and lessons learned from the last edition of the experiment (CAGI-5). In particular, we analyze prediction performances of our tools on five CAGI-5 selected challenges grouped into three different categories: prediction of variant effects on protein stability, prediction of variant pathogenicity, and prediction of complex functional effects. For each challenge, we analyze in detail the performance of our tools, highlighting their potentialities and drawbacks. The aim is to better define the application boundaries of each tool.

Keywords: CAGI; genetic variants; machine learning; prediction of protein stability change upon variations; prediction of variant effects; variant pathogenicity prediction.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Computer Simulation
  • Databases, Genetic
  • Genetic Predisposition to Disease
  • Genetic Variation*
  • Humans
  • Machine Learning
  • Phenotype
  • Protein Stability
  • Proteins / chemistry*
  • Proteins / genetics*

Substances

  • Proteins