Preparation and Curation of Multiyear, Multilocation, Multitrait Datasets

Methods Mol Biol. 2022:2481:83-104. doi: 10.1007/978-1-0716-2237-7_6.

Abstract

Genome-wide association studies (GWAS) are a powerful approach to dissect genotype-phenotype associations and identify causative regions. However, this power is highly influenced by the accuracy of the phenotypic data. To obtain accurate phenotypic values, the phenotyping should be achieved through multienvironment trials (METs). In order to avoid any technical errors, the required time needs to be spent on exploring, understanding, curating and adjusting the phenotypic data in each trial before combining them using an appropriate linear mixed model (LMM). The LMM is chosen to minimize as much as possible any effect that can lead to misestimation of the phenotypic values. The purpose of this chapter is to explain a series of important steps to explore and analyze data from METs used to characterize an association panel. Two datasets are used to illustrate two different scenarios.

Keywords: Adjusted phenotype per trial; Analysis of residuals; Combined phenotype across trials; Descriptive statistics; Design diagnostics; Experimental design; Genotype × environment; Genotype–phenotype association; Linear mixed model; Multienvironment trials; Outliers; Raw phenotype per trial.

MeSH terms

  • Genetic Association Studies
  • Genome-Wide Association Study*
  • Linear Models