Creation of standardized tools to evaluate reporting in health research: Population Reporting Of Gender, Race, Ethnicity & Sex (PROGRES)

PLOS Glob Public Health. 2023 Sep 7;3(9):e0002227. doi: 10.1371/journal.pgph.0002227. eCollection 2023.

Abstract

Despite increasing diversity in research recruitment, research finding reporting by gender, race, ethnicity, and sex has remained up to the discretion of authors. This study developped and piloted tools to standardize the inclusive reporting of gender, race, ethnicity, and sex in health research. A modified Delphi approach was used to develop standardized tools for the inclusive reporting of gender, race, ethnicity, and sex in health research. Health research, social epidemiology, sociology, and medical anthropology experts from 11 different universities participated in the Delphi process. The tools were pilot tested on 85 health research manuscripts in top health research journals to determine inter-rater reliability of the tools. The tools each spanned five dimensions for both sex and gender as well as race and ethnicity: Author inclusiveness, Participant inclusiveness, Nomenclature reporting, Descriptive reporting, and Outcomes reporting for each subpopulation. The sex and gender tool had a median score of 6 and a range of 1-15 out of 16 possible points. The percent agreement between reviewers piloting the sex and gender tool was 82%. The interrater reliability or average Cohen's Kappa was 0.54 with a standard deviation of 0.33 demonstrating moderate agreement. The race and ethnicity tool had a median score of 1 and a range of 0-15 out of 16 possible points. Race and ethnicity were both reported in only 25.8% of studies evaluated. Most studies that reported race reported only the largest subgroups; White, Black, and Latinx. The percent agreement between reviewers piloting the race and ethnicity tool was 84 and average Cohen's Kappa was 0.61 with a standard deviation of 0.38 demonstrating substantial agreement. While the overall dimension scores were low (indicating low inclusivity), the interrater reliability measures indicated moderate to substantial agreement for the respective tools. Efforts in recruitment alone will not provide more inclusive literature without improving reporting.

Grants and funding

The authors received no specific funding for this work.