Estimation of the relatedness coefficients from biallelic markers, application in plant mating designs

Biometrics. 2017 Sep;73(3):885-894. doi: 10.1111/biom.12634. Epub 2017 Jan 12.

Abstract

The problem of inferring the relatedness distribution between two individuals from biallelic marker data is considered. This problem can be cast as an estimation task in a mixture model: at each marker the latent variable is the relatedness state, and the observed variable is the genotype of the two individuals. In this model, only the prior proportions are unknown, and can be obtained via ML estimation using the EM algorithm. When the markers are biallelic and the data unphased, the identifiability of the model is known not to be guaranteed. In this article, model identifiability is investigated in the case of phased data generated from a crossing design, a classical situation in plant genetics. It is shown that identifiability can be guaranteed under some conditions on the crossing design. The adapted ML estimator is implemented in an R package called Relatedness. The performance of the ML estimator is evaluated and compared to that of the benchmark moment estimator, both on simulated and real data. Compared to its competitor, the ML estimator is shown to be more robust and to provide more realistic estimates.

Keywords: Mixture models; Model identifiability; Relatedness inference.

MeSH terms

  • Algorithms
  • Genotype
  • Plants*