Preparation and Curation of Phenotypic Datasets

Methods Mol Biol. 2022:2481:13-27. doi: 10.1007/978-1-0716-2237-7_2.

Abstract

Based on case studies, in this chapter we discuss the extent to which the number and identity of quantitative trait loci (QTL) identified from genome-wide association studies (GWAS) are affected by curation and analysis of phenotypic data. The chapter demonstrates through examples the impact of (1) cleaning of outliers, and of (2) the choice of statistical method for estimating genotypic mean values of phenotypic inputs in GWAS. No cleaning of outliers resulted in the highest number of dubious QTL, especially at loci with highly unbalanced allelic frequencies. A trade-off was identified between the risk of false positives and the risk of missing interesting, yet rare alleles. The choice of the statistical method to estimate genotypic mean values also affected the output of GWAS analysis, with reduced QTL overlap between methods. Using mixed models that capture spatial trends, among other features, increased the narrow-sense heritability of traits, the number of identified QTL and the overall power of GWAS analysis. Cleaning and choosing robust statistical models for estimating genotypic mean values should be included in GWAS pipelines to decrease both false positive and false negative rates of QTL detection.

Keywords: False QTL; Outliers; Statistical models; Statistical power.

MeSH terms

  • Alleles
  • Gene Frequency
  • Genome-Wide Association Study* / methods
  • Polymorphism, Single Nucleotide*
  • Quantitative Trait Loci