Variances and covariances of linear summary statistics of segregating sites

Theor Popul Biol. 2022 Jun:145:95-108. doi: 10.1016/j.tpb.2022.03.005. Epub 2022 Apr 4.

Abstract

Each mutation in a population sample of DNA sequences can be classified by the number of sequences that inherit the mutant nucleotide, the resulting frequencies are known as mutations of different sizes or site frequency spectrum. Many summary statistics can be defined as a linear function of these frequencies. A flexible class of such linear summary statistics is explored analytically in this paper which include several well-known quantities, such as the number of segregating sizes and the mean number of nucleotide differences between two sequences. Some asymptotic variances and covariances are obtained while the analytical formulas for the variances and covariances of nine such linear summary statistics are derived, most of which are unknown to date. This study not only provides some theoretical foundations for exploring linear summary statistics, but also provides some newlinear summary statistics that may be utilized for analyzing sample polymorphism. Furthermore it is showed that a newly developed linear summary statistics has a smaller variance almost uniformly than Watterson's estimator, and that a class of linear summary statistics given too heavy weights on mutations of smaller sizes result in asymptotically non-zero variance.

Keywords: -statistics; Coalescent; Linear summary statistics; Mutation size; Segregating sites; Variance and covariance.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Models, Genetic
  • Mutation
  • Nucleotides*
  • Polymorphism, Genetic*

Substances

  • Nucleotides