Accuracy of rating scale interval values used in multiple mini-interviews: a mixed methods study

Philippe Bégin; Robert Gagnon; Jean-Michel Leduc; Béatrice Paradis; Jean-Sébastien Renaud; Jacinthe Beauchamp; Richard Rioux; Marie-Pier Carrier; Claire Hudon; Marc Vautour; Annie Ouellet; Martine Bourget; Christian Bourdy

doi:10.1007/s10459-020-09970-1

Accuracy of rating scale interval values used in multiple mini-interviews: a mixed methods study

Adv Health Sci Educ Theory Pract. 2021 Mar;26(1):37-51. doi: 10.1007/s10459-020-09970-1. Epub 2020 May 6.

Authors

Philippe Bégin^{1

2}, Robert Gagnon³, Jean-Michel Leduc³, Béatrice Paradis³, Jean-Sébastien Renaud⁴, Jacinthe Beauchamp^{5

6}, Richard Rioux⁷, Marie-Pier Carrier⁸, Claire Hudon⁴, Marc Vautour^{5

6}, Annie Ouellet⁵, Martine Bourget⁴, Christian Bourdy³

Affiliations

¹ Faculty of Medicine, Université de Montréal, Montreal, Canada. philippe.begin@umontreal.ca.
² CHU Sainte-Justine, 3175 Chemin de la Côte-Sainte-Catherine, Montreal, QC, H3T 1C5, Canada. philippe.begin@umontreal.ca.
³ Faculty of Medicine, Université de Montréal, Montreal, Canada.
⁴ Faculty of Medicine, Université Laval, Quebec City, Canada.
⁵ Faculty of Medicine, Université Sherbrooke, Sherbrooke, Canada.
⁶ Centre de Formation Médicale du Nouveau-Brunswick, Moncton, Canada.
⁷ Faculty of Social Science, Université du Québec à Montréal, Montreal, Canada.
⁸ Faculty of Education Sciences, Université Laval, Quebec City, Canada.

PMID: 32378151
DOI: 10.1007/s10459-020-09970-1

Abstract

When determining the score given to candidates in multiple mini-interview (MMI) stations, raters have to translate a narrative judgment to an ordinal rating scale. When adding individual scores to calculate final ranking, it is generally presumed that the values of possible scores on the evaluation grid are separated by constant intervals, following a linear function, although this assumption is seldom validated with raters themselves. Inaccurate interval values could lead to systemic bias that could potentially distort candidates' final cumulative scores. The aim of this study was to establish rating scale values based on rater's intent, to validate these with an independent quantitative method, to explore their impact on final score, and to appraise their meaning according to experienced MMI interviewers. A 4-round consensus-group exercise was independently conducted with 42 MMI interviewers who were asked to determine relative values for the 6-point rating scale (from A to F) used in the Canadian integrated French MMI (IFMMI). In parallel, relative values were also calculated for each option of the scale by comparing the average scores concurrently given to the same individual in other stations every time that option was selected during three consecutive IFMMI years. Data from the same three cohorts was used to simulate the impact of using new score values on final rankings. Comments from the consensus group exercise were reviewed independently by two authors to explore raters' rationale for choosing specific values. Relative to the maximum (A = 100%) and minimum (F = 0%), experienced raters concluded to values of 86.7% (95% CI 86.3-87.1), 69.5% (68.9-70.1), 51.2% (50.6-51.8), and 29.3% (28.1-30.5), for scores of B, C, D and E respectively. The concurrent score approach was based on 43,412 IFMMI stations performed by 4345 medical school applicants. It provided quasi-identical values of 87.1% (82.4-91.5), 70.4% (66.1-74.7), 51.2% (47.1-55.3) and 31.8% (27.9-35.7), respectively. Qualitative analysis explained that while high scores are usually based on minor details of relatively low importance, low scores are usually attributed for more serious offenses and were assumed by the raters to carry more weight in the final score. Individual drop or increase in final MMI ranking with the use of new scale values ranged from - 21 to + 5 percentiles, with the average candidate changing by ± 1.4 percentiles. Consulting with experienced interviewers is a simple and effective approach to establish rating scale values that truly reflects raters' intent in MMI, thus improving the accuracy of the instrument and contributing to the general fairness of the process.

Keywords: Admission; Bias; Evaluation criteria; Grading; Interval; Interview; Likert scale; MMI; Medical school; Rating; Rubrics.

MeSH terms

Canada
Humans
Interviews as Topic / standards*
Male
Observer Variation
Reproducibility of Results
School Admission Criteria*
Schools, Medical / organization & administration*
Schools, Medical / standards