A new goodness-of-fit measure for probit models: Surrogate R2

Dungang Liu; Xiaorui Zhu; Brandon Greenwell; Zewei Lin

doi:10.1111/bmsp.12289

A new goodness-of-fit measure for probit models: Surrogate R²

Br J Math Stat Psychol. 2023 Feb;76(1):192-210. doi: 10.1111/bmsp.12289. Epub 2022 Oct 17.

Authors

Dungang Liu¹, Xiaorui Zhu^{1

2}, Brandon Greenwell¹, Zewei Lin¹

Affiliations

¹ Department of Operations, Business Analytics and Information Systems, University of Cincinnati Carl H. Lindner College of Business, Cincinnati, Ohio, USA.
² Department of Business Analytics and Technology Management, College of Business and Economics, Towson University, Towson, Maryland, USA.

Abstract

Probit models are used extensively for inferential purposes in the social sciences as discrete data are prevalent in a vast body of social studies. Among many accompanying model inference problems, a critical question remains unsettled: how to develop a goodness-of-fit measure that resembles the ordinary least square (OLS) R² used for linear models. Such a measure has long been sought to achieve 'comparability' of different empirical models across multiple samples addressing similar social questions. To this end, we propose a novel R² measure for probit models using the notion of surrogacy - simulating a continuous variable $S$ as a surrogate of the original discrete response (Liu & Zhang, Journal of the American Statistical Association, 113, 845 and 2018). The proposed R² is the proportion of the variance of the surrogate response explained by explanatory variables through a linear model, and we call it a surrogate R² . This paper shows both theoretically and numerically that the surrogate R² approximates the OLS R² based on the latent continuous variable, preserves the interpretation of explained variation, and maintains monotonicity between nested models. As no other pseudo R² , McKelvey and Zavoina's and McFadden's included, can meet all the three criteria simultaneously, our measure fills this crucial void in probit model inference.

Keywords: OLS R2; categorical data; model comparison; probit analysis; pseudo R2; surrogate residual.

MeSH terms

Linear Models
Models, Statistical*