A population-based approach for gene prioritization in understanding complex traits

Massimo Mezzavilla; Massimiliano Cocca; Francesca Guidolin; Paolo Gasparini

doi:10.1007/s00439-020-02152-4

A population-based approach for gene prioritization in understanding complex traits

Hum Genet. 2020 May;139(5):647-655. doi: 10.1007/s00439-020-02152-4. Epub 2020 Mar 30.

Authors

Massimo Mezzavilla¹, Massimiliano Cocca², Francesca Guidolin³, Paolo Gasparini^{2

4}

Affiliations

¹ Institute for Maternal and Child Health IRCCS Burlo Garofolo, Via dell'Istria 65/1, 34137, Trieste, Italy. massimo.mezzavilla@burlo.trieste.it.
² Institute for Maternal and Child Health IRCCS Burlo Garofolo, Via dell'Istria 65/1, 34137, Trieste, Italy.
³ University of Padova, Padua, Italy.
⁴ Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, Italy.

PMID: 32232557
DOI: 10.1007/s00439-020-02152-4

Abstract

Gene prioritization is the process of determining which variants and genes identified in genetic analyses are likely to cause a disease or a variation in a phenotype. For many genes, neither in vitro nor in vivo testing is available, thus assessing their pathogenic role could be challenging, leading to false-positive or false-negative results. In this paper, we propose an innovative score of gene prioritization based on the population of interest. We introduce the concept of singleton-cohort variants (SC variant), a variant that has allele count equal to one in the cohort under study. The difference between the normalized count of SC variants in the coding region and the normalized count of SC variants in the non-coding region should give a hint regarding the level of constraints for that gene in a specific population. This scoring system is negative when there are constraints that allow the presence of SC variants only in the non-coding region; on the contrary, it is positive when there are no constraints. A complimentary score is the sum of SC variants normalized count in both coding and non-coding regions, which could be used as a proxy of positive or strong purifying selection in a specific population. Our methodology showed a high level of constraining for genes such as USP34 in all subpopulations tested (1000 G dataset). In contrast, some genes showed a high negative score only in specific populations, e.g., MYT1L in Europeans, UBR5 in East Asians, and FBXO11 in Africans.

MeSH terms

Ethnicity / genetics*
Genetic Markers*
Genetic Variation*
Genetics, Population*
Genome-Wide Association Study
Haplotypes
Humans
Models, Theoretical*
Multifactorial Inheritance / genetics*
Phenotype
Signal Transduction

Substances

Genetic Markers