The fractal dimension as a measure for characterizing genetic variation of the human genome

Comput Biol Chem. 2020 Jun 6:87:107278. doi: 10.1016/j.compbiolchem.2020.107278. Online ahead of print.

Abstract

Motivated by the characteristics of highly clustered single nucleotide polymorphism (SNP) across the human genome, we propose a set of chromosome-wise fractal dimensions as a measure for identifying an individual for human polymorphism. The fractal dimension quantifies the degree of clustered distribution of SNPs and represents parsimoniously the genetic variation in a chromosome. In this sense, the proposed scheme projects the SNP genotype data into a new space which is simpler and lower in dimension. As an illustrative example, we estimate the chromosome-wise fractal dimensions of SNPs that are extracted from the HapMap of Phase III data set. To determine the validity of the proposed measure, we apply principal component analysis (PCA) to the set of estimated fractal dimensions and demonstrate that the set more or less described the population structure of 11 global populations. We also use multidimensional scaling to relate the genetic distances based on PCA to the geographical distances between global populations. This shows that, similar to the SNP genotype data, the fractal dimensions also has a role in genetic distance in the population structure. In addition, we apply the proposed measure to a signature for the classification of global populations by developing a support vector machine model. The selected feature model predicts the global population with a balanced accuracy of about 77%. These results support that the fractal dimension is an efficient way to describe the genetic variation of global populations.

Keywords: Fractal dimension; Single nucleotide polymorphism; genetic variation; human genome; population structure.