The construction of cross-population polygenic risk scores using transfer learning

Am J Hum Genet. 2022 Nov 3;109(11):1998-2008. doi: 10.1016/j.ajhg.2022.09.010. Epub 2022 Oct 13.

Abstract

As most existing genome-wide association studies (GWASs) were conducted in European-ancestry cohorts, and as the existing polygenic risk score (PRS) models have limited transferability across ancestry groups, PRS research on non-European-ancestry groups needs to make efficient use of available data until we attain large sample sizes across all ancestry groups. Here we propose a PRS method using transfer learning techniques. Our approach, TL-PRS, uses gradient descent to fine-tune the baseline PRS model from an ancestry group with large sample GWASs to the dataset of target ancestry. In our application of constructing PRS for seven quantitative and two dichotomous traits for 10,285 individuals of South Asian ancestry and 8,168 individuals of African ancestry in UK Biobank, TL-PRS using PRS-CS as a baseline method obtained 25% average relative improvement for South Asian samples and 29% for African samples compared to the standard PRS-CS method in terms of predicted R2. Our approach increases the transferability of PRSs across ancestries and thereby helps reduce existing inequities in genetics research.

Keywords: BioBank Japan; PRS; UK Biobank; cross population; summary statistics; transfer learning.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Genetic Predisposition to Disease
  • Genome-Wide Association Study*
  • Humans
  • Machine Learning
  • Multifactorial Inheritance* / genetics
  • Polymorphism, Single Nucleotide / genetics
  • Risk Factors