Comparison of methods for estimating genetic correlation between complex traits using GWAS summary statistics

Brief Bioinform. 2021 Sep 2;22(5):bbaa442. doi: 10.1093/bib/bbaa442.

Abstract

Genetic correlation is the correlation of phenotypic effects by genetic variants across the genome on two phenotypes. It is an informative metric to quantify the overall genetic similarity between complex traits, which provides insights into their polygenic genetic architecture. Several methods have been proposed to estimate genetic correlation based on data collected from genome-wide association studies (GWAS). Due to the easy access of GWAS summary statistics and computational efficiency, methods only requiring GWAS summary statistics as input have become more popular than methods utilizing individual-level genotype data. Here, we present a benchmark study for different summary-statistics-based genetic correlation estimation methods through simulation and real data applications. We focus on two major technical challenges in estimating genetic correlation: marker dependency caused by linkage disequilibrium (LD) and sample overlap between different studies. To assess the performance of different methods in the presence of these two challenges, we first conducted comprehensive simulations with diverse LD patterns and sample overlaps. Then we applied these methods to real GWAS summary statistics for a wide spectrum of complex traits. Based on these experiments, we conclude that methods relying on accurate LD estimation are less robust in real data applications due to the imprecision of LD obtained from reference panels. Our findings offer guidance on how to choose appropriate methods for genetic correlation estimation in post-GWAS analysis.

Keywords: GWAS summary statistics; benchmarking; complex traits; genetic correlation.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Benchmarking / methods
  • Cohort Studies
  • Computer Simulation
  • Data Accuracy
  • Gene Frequency
  • Genome, Human
  • Genome-Wide Association Study / statistics & numerical data*
  • Genotype
  • Humans
  • Linkage Disequilibrium*
  • Multifactorial Inheritance*
  • Phenotype*
  • Polymorphism, Single Nucleotide*