Machine Learning to Advance Human Genome-Wide Association Studies

Genes (Basel). 2023 Dec 25;15(1):34. doi: 10.3390/genes15010034.

Abstract

Machine learning, including deep learning, reinforcement learning, and generative artificial intelligence are revolutionising every area of our lives when data are made available. With the help of these methods, we can decipher information from larger datasets while addressing the complex nature of biological systems in a more efficient way. Although machine learning methods have been introduced to human genetic epidemiological research as early as 2004, those were never used to their full capacity. In this review, we outline some of the main applications of machine learning to assigning human genetic loci to health outcomes. We summarise widely used methods and discuss their advantages and challenges. We also identify several tools, such as Combi, GenNet, and GMSTool, specifically designed to integrate these methods for hypothesis-free analysis of genetic variation data. We elaborate on the additional value and limitations of these tools from a geneticist's perspective. Finally, we discuss the fast-moving field of foundation models and large multi-modal omics biobank initiatives.

Keywords: genome-wide association; human genetics; machine learning.

Publication types

  • Review

MeSH terms

  • Artificial Intelligence*
  • Genetic Loci
  • Genetic Research
  • Genome-Wide Association Study*
  • Humans
  • Machine Learning

Grants and funding

Diabetes UK (20/0006307), LONGITOOLS (H2020-SC1-2019-874739), WCRF UK/Intl (2017/1641).