Machine learning based disease prediction from genotype data

Nikoletta Katsaouni; Araek Tashkandi; Lena Wiese; Marcel H Schulz

doi:10.1515/hsz-2021-0109

Machine learning based disease prediction from genotype data

Biol Chem. 2021 Jul 5;402(8):871-885. doi: 10.1515/hsz-2021-0109. Print 2021 Jul 27.

Authors

Nikoletta Katsaouni¹, Araek Tashkandi², Lena Wiese³, Marcel H Schulz^{1

4

5}

Affiliations

¹ Institute for Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany.
² Institute of Computer Sciences and Engineering, University of Jeddah, 21959 Jeddah, Saudi Arabia.
³ Institute of Computer Science, Goethe University, 60629 Frankfurt am Main, Germany.
⁴ German Center for Cardiovascular Research (DZHK), Partner Site RheinMain, 60590 Frankfurt am Main, Germany.
⁵ Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany.

PMID: 34218544
DOI: 10.1515/hsz-2021-0109

Abstract

Using results from genome-wide association studies for understanding complex traits is a current challenge. Here we review how genotype data can be used with different machine learning (ML) methods to predict phenotype occurrence and severity from genotype data. We discuss common feature encoding schemes and how studies handle the often small number of samples compared to the huge number of variants. We compare which ML methods are being applied, including recent results using deep neural networks. Further, we review the application of methods for feature explanation and interpretation.

Keywords: deep neural networks; disease prediction; machine learning.

Publication types

Research Support, Non-U.S. Gov't
Review

MeSH terms

Genome-Wide Association Study*
Genotype*
Humans
Machine Learning