Evidence of questionable research practices in clinical prediction models

Nicole White; Rex Parsons; Gary Collins; Adrian Barnett

doi:10.1186/s12916-023-03048-6

Evidence of questionable research practices in clinical prediction models

BMC Med. 2023 Sep 4;21(1):339. doi: 10.1186/s12916-023-03048-6.

Authors

Nicole White¹, Rex Parsons¹, Gary Collins², Adrian Barnett³

Affiliations

¹ Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, Kelvin Grove, Queensland, Australia.
² Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, University of Oxford, Oxford, UK.
³ Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, Kelvin Grove, Queensland, Australia. a.barnett@qut.edu.au.

Abstract

Background: Clinical prediction models are widely used in health and medical research. The area under the receiver operating characteristic curve (AUC) is a frequently used estimate to describe the discriminatory ability of a clinical prediction model. The AUC is often interpreted relative to thresholds, with "good" or "excellent" models defined at 0.7, 0.8 or 0.9. These thresholds may create targets that result in "hacking", where researchers are motivated to re-analyse their data until they achieve a "good" result.

Methods: We extracted AUC values from PubMed abstracts to look for evidence of hacking. We used histograms of the AUC values in bins of size 0.01 and compared the observed distribution to a smooth distribution from a spline.

Results: The distribution of 306,888 AUC values showed clear excesses above the thresholds of 0.7, 0.8 and 0.9 and shortfalls below the thresholds.

Conclusions: The AUCs for some models are over-inflated, which risks exposing patients to sub-optimal clinical decision-making. Greater modelling transparency is needed, including published protocols, and data and code sharing.

Keywords: Area under curve; Diagnosis; Hacking; Prediction model; Prognosis; Receiver operating characteristic; Statistics.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Biomedical Research*
Humans
Models, Statistical*
Prognosis
ROC Curve

Abstract

Publication types

MeSH terms

Grants and funding