A new goodness-of-fit measure for probit models: Surrogate R2

Br J Math Stat Psychol. 2023 Feb;76(1):192-210. doi: 10.1111/bmsp.12289. Epub 2022 Oct 17.

Abstract

Probit models are used extensively for inferential purposes in the social sciences as discrete data are prevalent in a vast body of social studies. Among many accompanying model inference problems, a critical question remains unsettled: how to develop a goodness-of-fit measure that resembles the ordinary least square (OLS) R2 used for linear models. Such a measure has long been sought to achieve 'comparability' of different empirical models across multiple samples addressing similar social questions. To this end, we propose a novel R2 measure for probit models using the notion of surrogacy - simulating a continuous variable S as a surrogate of the original discrete response (Liu & Zhang, Journal of the American Statistical Association, 113, 845 and 2018). The proposed R2 is the proportion of the variance of the surrogate response explained by explanatory variables through a linear model, and we call it a surrogate R2 . This paper shows both theoretically and numerically that the surrogate R2 approximates the OLS R2 based on the latent continuous variable, preserves the interpretation of explained variation, and maintains monotonicity between nested models. As no other pseudo R2 , McKelvey and Zavoina's and McFadden's included, can meet all the three criteria simultaneously, our measure fills this crucial void in probit model inference.

Keywords: OLS R2; categorical data; model comparison; probit analysis; pseudo R2; surrogate residual.

MeSH terms

  • Linear Models
  • Models, Statistical*