A novel test for independence derived from an exact distribution of ith nearest neighbours

PLoS One. 2014 Oct 2;9(9):e107955. doi: 10.1371/journal.pone.0107955. eCollection 2014.

Abstract

Dependence measures and tests for independence have recently attracted a lot of attention, because they are the cornerstone of algorithms for network inference in probabilistic graphical models. Pearson's product moment correlation coefficient is still by far the most widely used statistic yet it is largely constrained to detecting linear relationships. In this work we provide an exact formula for the [Formula: see text]th nearest neighbor distance distribution of rank-transformed data. Based on that, we propose two novel tests for independence. An implementation of these tests, together with a general benchmark framework for independence testing, are freely available as a CRAN software package (http://cran.r-project.org/web/packages/knnIndep). In this paper we have benchmarked Pearson's correlation, Hoeffding's D, dcor, Kraskov's estimator for mutual information, maximal information criterion and our two tests. We conclude that no particular method is generally superior to all other methods. However, dcor and Hoeffding's D are the most powerful tests for many different types of dependence.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Models, Statistical*
  • World Health Organization

Grants and funding

AT was supported by the Bundesministerium für Bildung und Forschung (BMBF) e: Bio Syscore grant and by a Jeff Schell professorship from the Max Planck Institute for Plant Breeding Research and the University of Cologne. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.