Optimization of sampling designs for pedigrees and association studies

Biometrics. 2022 Sep;78(3):1056-1066. doi: 10.1111/biom.13476. Epub 2021 May 3.

Abstract

In many studies, related individuals are phenotyped in order to infer how their genotype contributes to their phenotype, through the estimation of parameters such as breeding values or locus effects. When it is not possible to phenotype all the individuals, it is important to properly sample the population to improve the precision of the statistical analysis. This article studies how to optimize such sampling designs for pedigrees and association studies. Two sampling methods are developed, stratified sampling and D optimality. It is found that it is important to take account of mutation when sampling pedigrees with many generations: as the size of mutation effects increases, optimized designs sample more individuals in late generations. Optimized designs for association studies tend to improve the joint estimation of breeding values and locus effects, all the more as sample size is low and the genetic architecture of the trait is simple. When the trait is determined by few loci, they are reminiscent of classical experimental designs for regression models and tend to select homozygous individuals. When the trait is determined by many loci, locus effects may be difficult to estimate, even if an optimized design is used.

Keywords: Bayesian statistics; genetic algorithm; high-dimensional statistics; optimal designs; quantitative genetics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genotype
  • Models, Genetic*
  • Pedigree
  • Phenotype
  • Quantitative Trait Loci*