Field- and time-normalization of data with many zeros: an empirical analysis using citation and Twitter data

Robin Haunschild; Lutz Bornmann

doi:10.1007/s11192-018-2771-1

Field- and time-normalization of data with many zeros: an empirical analysis using citation and Twitter data

Scientometrics. 2018;116(2):997-1012. doi: 10.1007/s11192-018-2771-1. Epub 2018 May 19.

Authors

Robin Haunschild¹, Lutz Bornmann²

Affiliations

¹ 1Max Planck Institute for Solid State Research, Heisenbergstr. 1, 70569 Stuttgart, Germany.
² 2Division for Science and Innovation Studies, Administrative Headquarters of the Max Planck Society, Hofgartenstr. 8, 80539 Munich, Germany.

Abstract

Thelwall (J Informetr 11(1):128-151, 2017a. 10.1016/j.joi.2016.12.002; Web indicators for research evaluation: a practical guide. Morgan and Claypool, London, 2017b) proposed a new family of field- and time-normalized indicators, which is intended for sparse data. These indicators are based on units of analysis (e.g., institutions) rather than on the paper level. They compare the proportion of mentioned papers (e.g., on Twitter) of a unit with the proportion of mentioned papers in the corresponding fields and publication years. We propose a new indicator (Mantel-Haenszel quotient, MHq) for the indicator family. The MHq is rooted in the Mantel-Haenszel (MH) analysis. This analysis is an established method, which can be used to pool the data from several 2 × 2 cross tables based on different subgroups. We investigate using citations and assessments by peers whether the indicator family can distinguish between quality levels defined by the assessments of peers. Thus, we test the convergent validity. We find that the MHq is able to distinguish between quality levels in most cases while other indicators of the family are not. Since our study approves the MHq as a convergent valid indicator, we apply the MHq to four different Twitter groups as defined by the company Altmetric. Our results show that there is a weak relationship between the Twitter counts of all four Twitter groups and scientific quality, much weaker than between citations and scientific quality. Therefore, our results discourage the use of Twitter counts in research evaluation.

Keywords: Altmetrics; Citation counts; Data with many zeros; Equalized mean-based normalized proportion cited (EMNPC); Mantel–Haenszel quotient (MHq); Mean-based normalized proportion cited (MNPC); Twitter.