A multi-threaded approach to genotype pattern mining for detecting digenic disease genes

Front Genet. 2023 Aug 24:14:1222517. doi: 10.3389/fgene.2023.1222517. eCollection 2023.

Abstract

To locate disease-causing DNA variants on the human gene map, the customary approach has been to carry out a genome-wide association study for one variant after another by testing for genotype frequency differences between individuals affected and unaffected with disease. So-called digenic traits are due to the combined effects of two variants, often on different chromosomes, while individual variants may have little or no effect on disease. Machine learning approaches have been developed to find variant pairs underlying digenic traits. However, many of these methods have large memory requirements so that only small datasets can be analyzed. The increasing availability of desktop computers with large numbers of processors and suitable programming to distribute the workload evenly over all processors in a machine make a new and relatively straightforward approach possible, that is, to evaluate all existing variant and genotype pairs for disease association. We present a prototype of such a method with two components, Vpairs and Gpairs, and demonstrate its advantages over existing implementations of such well-known algorithms as Apriori and FP-growth. We apply these methods to published case-control datasets on age-related macular degeneration and Parkinson disease and construct an ROC curve for a large set of genotype patterns.

Keywords: digenic trait; genetic association; genetic variant; genotype pair; single-nucleotide polymorphism.

Grants and funding

We gratefully acknowledge grant support by the Natural Sciences and Engineering Research Council of Canada (NSERC) through Discovery grant RGPIN-2018-05147 (QZ). The work (of TP) was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education (NRF-2022R1A2C1092497).