Classifiers as a model-free group comparison test

Behav Res Methods. 2018 Feb;50(1):416-426. doi: 10.3758/s13428-017-0880-z.

Abstract

The conventional statistical methods to detect group differences assume correct model specification, including the origin of difference. Researchers should be able to identify a source of group differences and choose a corresponding method. In this paper, we propose a new approach of group comparison without model specification using classification algorithms in machine learning. In this approach, the classification accuracy is evaluated against a binomial distribution using Independent Validation. As an application example, we examined false-positive errors and statistical power of support vector machines to detect group differences in comparison to conventional statistical tests such as t test, Levene's test, K-S test, Fisher's z-transformation, and MANOVA. The SVMs detected group differences regardless of their origins (mean, variance, distribution shape, and covariance), and showed comparably consistent power across conditions. When a group difference originated from a single source, the statistical power of SVMs was lower than the most appropriate conventional test of the study condition; however, the power of SVMs increased when differences originated from multiple sources. Moreover, SVMs showed substantially improved performance with more variables than with fewer variables. Most importantly, SVMs were applicable to any types of data without sophisticated model specification. This study demonstrates a new application of classification algorithms as an alternative or complement to the conventional group comparison test. With the proposed approach, researchers can test two-sample data even when they are not certain which statistical test to use or when data violates the statistical assumptions of conventional methods.

Keywords: Classifiers; Group comparison; Independent validation; K-fold cross validation; Support vector machine.

MeSH terms

  • Algorithms
  • Data Interpretation, Statistical*
  • Group Structure*
  • Humans
  • Multivariate Analysis
  • Support Vector Machine