Haplotype function score improves biological interpretation and cross-ancestry polygenic prediction of human complex traits

Elife. 2024 Apr 19:12:RP92574. doi: 10.7554/eLife.92574.

Abstract

We propose a new framework for human genetic association studies: at each locus, a deep learning model (in this study, Sei) is used to calculate the functional genomic activity score for two haplotypes per individual. This score, defined as the Haplotype Function Score (HFS), replaces the original genotype in association studies. Applying the HFS framework to 14 complex traits in the UK Biobank, we identified 3619 independent HFS-trait associations with a significance of p < 5 × 10-8. Fine-mapping revealed 2699 causal associations, corresponding to a median increase of 63 causal findings per trait compared with single-nucleotide polymorphism (SNP)-based analysis. HFS-based enrichment analysis uncovered 727 pathway-trait associations and 153 tissue-trait associations with strong biological interpretability, including 'circadian pathway-chronotype' and 'arachidonic acid-intelligence'. Lastly, we applied least absolute shrinkage and selection operator (LASSO) regression to integrate HFS prediction score with SNP-based polygenic risk scores, which showed an improvement of 16.1-39.8% in cross-ancestry polygenic prediction. We concluded that HFS is a promising strategy for understanding the genetic basis of human complex traits.

Keywords: computational biology; deep learning; fine-mapping; functional genomics; genetics; genome-wide association study; genomics; human; polygenic prediction; systems biology.

Plain language summary

Scattered throughout the human genome are variations in the genetic code that make individuals more or less likely to develop certain traits. To identify these variants, scientists carry out Genome-wide association studies (GWAS) which compare the DNA variants of large groups of people with and without the trait of interest. This method has been able to find the underlying genes for many human diseases, but it has limitations. For instance, some variations are linked together due to where they are positioned within DNA, which can result in GWAS falsely reporting associations between genetic variants and traits. This phenomenon, known as linkage equilibrium, can be avoided by analyzing functional genomics which looks at the multiple ways a gene’s activity can be influenced by a variation. For instance, how the gene is copied and decoded in to proteins and RNA molecules, and the rate at which these products are generated. Researchers can now use an artificial intelligence technique called deep learning to generate functional genomic data from a particular DNA sequence. Here, Song et al. used one of these deep learning models to calculate the functional genomics of haplotypes, groups of genetic variants inherited from one parent. The approach was applied to DNA samples from over 350 thousand individuals included in the UK BioBank. An activity score, defined as the haplotype function score (or HFS for short), was calculated for at least two haplotypes per individual, and then compared to various complex traits like height or bone density. Song et al. found that the HFS framework was better at finding links between genes and specific traits than existing methods. It also provided more information on the biology that may be underpinning these outcomes. Although more work is needed to reduce the computer processing times required to calculate the HFS, Song et al. believe that their new method has the potential to improve the way researchers identify links between genes and human traits.

MeSH terms

  • Genome-Wide Association Study
  • Haplotypes
  • Humans
  • Multifactorial Inheritance* / genetics
  • Phenotype
  • Polymorphism, Single Nucleotide
  • Quantitative Trait Loci*