Machine learning based disease prediction from genotype data

Biol Chem. 2021 Jul 5;402(8):871-885. doi: 10.1515/hsz-2021-0109. Print 2021 Jul 27.

Abstract

Using results from genome-wide association studies for understanding complex traits is a current challenge. Here we review how genotype data can be used with different machine learning (ML) methods to predict phenotype occurrence and severity from genotype data. We discuss common feature encoding schemes and how studies handle the often small number of samples compared to the huge number of variants. We compare which ML methods are being applied, including recent results using deep neural networks. Further, we review the application of methods for feature explanation and interpretation.

Keywords: deep neural networks; disease prediction; machine learning.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Genome-Wide Association Study*
  • Genotype*
  • Humans
  • Machine Learning