Korea4K: whole genome sequences of 4,157 Koreans with 107 phenotypes derived from extensive health check-ups

Gigascience. 2024 Jan 2:13:giae014. doi: 10.1093/gigascience/giae014.

Abstract

Background: Phenome-wide association studies (PheWASs) have been conducted on Asian populations, including Koreans, but many were based on chip or exome genotyping data. Such studies have limitations regarding whole genome-wide association analysis, making it crucial to have genome-to-phenome association information with the largest possible whole genome and matched phenome data to conduct further population-genome studies and develop health care services based on population genomics.

Results: Here, we present 4,157 whole genome sequences (Korea4K) coupled with 107 health check-up parameters as the largest genomic resource of the Korean Genome Project. It encompasses most of the variants with allele frequency >0.001 in Koreans, indicating that it sufficiently covered most of the common and rare genetic variants with commonly measured phenotypes for Koreans. Korea4K provides 45,537,252 variants, and half of them were not present in Korea1K (1,094 samples). We also identified 1,356 new genotype-phenotype associations that were not found by the Korea1K dataset. Phenomics analyses further revealed 24 significant genetic correlations, 14 pleiotropic associations, and 127 causal relationships based on Mendelian randomization among 37 traits. In addition, the Korea4K imputation reference panel, the largest Korean variants reference to date, showed a superior imputation performance to Korea1K across all allele frequency categories.

Conclusions: Collectively, Korea4K provides not only the largest Korean genome data but also corresponding health check-up parameters and novel genome-phenome associations. The large-scale pathological whole genome-wide omics data will become a powerful set for genome-phenome level association studies to discover causal markers for the prediction and diagnosis of health conditions in future studies.

Keywords: Korean Genome Project; genome; phenome; population genomics; variome.

MeSH terms

  • Gene Frequency
  • Genetic Association Studies
  • Genome-Wide Association Study*
  • Genotype
  • Humans
  • Phenotype
  • Polymorphism, Single Nucleotide*
  • Republic of Korea

Grants and funding