Data-driven trait heritability-based extraction of human facial phenotypes

bioRxiv [Preprint]. 2023 Aug 14:2023.08.13.553129. doi: 10.1101/2023.08.13.553129.

Abstract

A genome-wide association study (GWAS) of a complex, multi-dimensional morphological trait, such as the human face, typically relies on predefined and simplified phenotypic measurements, such as inter-landmark distances and angles. These measures are predominantly designed by human experts based on perceived biological or clinical knowledge. To avoid use handcrafted phenotypes (i.e., a priori expert-identified phenotypes), alternative automatically extracted phenotypic descriptors, such as features derived from dimension reduction techniques (e.g., principal component analysis), are employed. While the features generated by such computational algorithms capture the geometric variations of the biological shape, they are not necessarily genetically relevant. Therefore, genetically informed data-driven phenotyping is desirable. Here, we propose an approach where phenotyping is done through a data-driven optimization of trait heritability, defined as the degree of variation in a phenotypic trait in a population that is due to genetic variation. The resulting phenotyping process consists of two steps: 1) constructing a feature space that models shape variations using dimension reduction techniques, and 2) searching for directions in the feature space exhibiting high trait heritability using a genetic search algorithm (i.e., heuristic inspired by natural selection). We show that the phenotypes resulting from the proposed trait heritability-optimized training differ from those of principal components in the following aspects: 1) higher trait heritability, 2) higher SNP heritability, and 3) identification of the same number of independent genetic loci with a smaller number of effective traits. Our results demonstrate that data-driven trait heritability-based optimization enables the automatic extraction of genetically relevant phenotypes, as shown by their increased power in genome-wide association scans.

Publication types

  • Preprint