Adoption of Internal Medicine Milestone Ratings and Changes in Bias Against Black, Latino, and Asian Internal Medicine Residents

Ann Intern Med. 2024 Jan;177(1):70-82. doi: 10.7326/M23-1588. Epub 2023 Dec 26.

Abstract

Background: The 2014 adoption of the Milestone ratings system may have affected evaluation bias against minoritized groups.

Objective: To assess bias in internal medicine (IM) residency knowledge ratings against Black or Latino residents-who are underrepresented in medicine (URiM)-and Asian residents before versus after Milestone adoption in 2014.

Design: Cross-sectional and interrupted time-series comparisons.

Setting: U.S. IM residencies.

Participants: 59 835 IM residents completing residencies during 2008 to 2013 and 2015 to 2020.

Intervention: Adoption of the Milestone ratings system.

Measurements: Pre-Milestone (2008 to 2013) and post-Milestone (2015 to 2020) bias was estimated as differences in standardized knowledge ratings between U.S.-born and non-U.S.-born minoritized groups versus non-Latino U.S.-born White (NLW) residents, with adjustment for performance on the American Board of Internal Medicine IM certification examination and other physician characteristics. Interrupted time-series analysis measured deviations from pre-Milestone linear bias trends.

Results: During the pre-Milestone period, ratings biases against minoritized groups were large (-0.40 SDs [95% CI, -0.48 to -0.31 SDs; P < 0.001] for URiM residents, -0.24 SDs [CI, -0.30 to -0.18 SDs; P < 0.001] for U.S.-born Asian residents, and -0.36 SDs [CI, -0.45 to -0.27 SDs; P < 0.001] for non-U.S.-born Asian residents). These estimates decreased to less than -0.15 SDs after adoption of Milestone ratings for all groups except U.S.-born Black residents, among whom substantial (though lower) bias persisted (-0.26 SDs [CI, -0.36 to -0.17 SDs; P < 0.001]). Substantial deviations from pre-Milestone linear bias trends coincident with adoption of Milestone ratings were also observed.

Limitations: Unobserved variables correlated with ratings bias and Milestone ratings adoption, changes in identification of race/ethnicity, and generalizability to Milestones 2.0.

Conclusion: Knowledge ratings bias against URiM and Asian residents was ameliorated with the adoption of the Milestone ratings system. However, substantial ratings bias against U.S.-born Black residents persisted.

Primary funding source: None.

MeSH terms

  • Asian
  • Bias*
  • Black or African American
  • Certification
  • Clinical Competence*
  • Cross-Sectional Studies
  • Hispanic or Latino
  • Humans
  • Internship and Residency*
  • United States