Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data

PLoS Genet. 2017 Sep 29;13(9):e1007021. doi: 10.1371/journal.pgen.1007021. eCollection 2017 Sep.

Abstract

Knowledge of biological relatedness between samples is important for many genetic studies. In large-scale human genetic association studies, the estimated kinship is used to remove cryptic relatedness, control for family structure, and estimate trait heritability. However, estimation of kinship is challenging for sparse sequencing data, such as those from off-target regions in target sequencing studies, where genotypes are largely uncertain or missing. Existing methods often assume accurate genotypes at a large number of markers across the genome. We show that these methods, without accounting for the genotype uncertainty in sparse sequencing data, can yield a strong downward bias in kinship estimation. We develop a computationally efficient method called SEEKIN to estimate kinship for both homogeneous samples and heterogeneous samples with population structure and admixture. Our method models genotype uncertainty and leverages linkage disequilibrium through imputation. We test SEEKIN on a whole exome sequencing dataset (WES) of Singapore Chinese and Malays, which involves substantial population structure and admixture. We show that SEEKIN can accurately estimate kinship coefficient and classify genetic relatedness using off-target sequencing data down sampled to ~0.15X depth. In application to the full WES dataset without down sampling, SEEKIN also outperforms existing methods by properly analyzing shallow off-target data (~0.75X). Using both simulated and real phenotypes, we further illustrate how our method improves estimation of trait heritability for WES studies.

MeSH terms

  • Asian People / genetics
  • Computational Biology
  • Databases, Genetic*
  • Exome
  • Genetic Association Studies
  • Genetics, Population / methods*
  • Genome, Human*
  • Genotype
  • Genotyping Techniques
  • Humans
  • Linkage Disequilibrium
  • Models, Genetic
  • Sequence Analysis, DNA*
  • Software

Grants and funding

This project is funded by the Agency for Science, Technology and Research, Singapore (https://www.a-star.edu.sg/), and by Merck Sharp & Dohme Corp., Whitehouse Station, NJ USA (http://www.merck.com). The MEC study is funded by the Biomedical Research Council (BMRC 03/1/27/18/216), National Medical Research Council (0838/2004), National Research Foundation (through BMRC 05/1/21/19/425 and 11/1/21/19/678) and the Ministry of Health, Singapore. The SH2012 study is funded by the Ministry of Health, Singapore. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.