On the analysis of a repeated measure design in genome-wide association analysis

Int J Environ Res Public Health. 2014 Nov 28;11(12):12283-303. doi: 10.3390/ijerph111212283.

Abstract

Longitudinal data enables detecting the effect of aging/time, and as a repeated measures design is statistically more efficient compared to cross-sectional data if the correlations between repeated measurements are not large. In particular, when genotyping cost is more expensive than phenotyping cost, the collection of longitudinal data can be an efficient strategy for genetic association analysis. However, in spite of these advantages, genome-wide association studies (GWAS) with longitudinal data have rarely been analyzed taking this into account. In this report, we calculate the required sample size to achieve 80% power at the genome-wide significance level for both longitudinal and cross-sectional data, and compare their statistical efficiency. Furthermore, we analyzed the GWAS of eight phenotypes with three observations on each individual in the Korean Association Resource (KARE). A linear mixed model allowing for the correlations between observations for each individual was applied to analyze the longitudinal data, and linear regression was used to analyze the first observation on each individual as cross-sectional data. We found 12 novel genome-wide significant disease susceptibility loci that were then confirmed in the Health Examination cohort, as well as some significant interactions between age/sex and SNPs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aging
  • Asian People
  • Epigenesis, Genetic
  • Female
  • Gene Expression Regulation
  • Genome-Wide Association Study / economics
  • Genome-Wide Association Study / methods*
  • Genotype*
  • Humans
  • Linkage Disequilibrium
  • Longitudinal Studies
  • Male
  • Middle Aged
  • Polymorphism, Single Nucleotide*
  • Republic of Korea
  • Research Design
  • Sex Factors